Skip to content

Commit 21bd0a1

Browse files
committed
feat: add ftester utility for function testing and debugging
- Implemented ftester utility for testing individual functions and AI agent behaviors. - Updated Dockerfile to build ftester and included it in the final image. - Enhanced README.md with detailed usage instructions for ftester, including command line and interactive modes. - Introduced new mock implementations for testing purposes. - Added necessary dependencies in go.mod and go.sum for ftester functionality.
1 parent 332970c commit 21bd0a1

File tree

13 files changed

+3005
-18
lines changed

13 files changed

+3005
-18
lines changed

.vscode/launch.json

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,5 +67,27 @@
6767
"cwd": "${workspaceFolder}",
6868
"output": "${workspaceFolder}/build/__debug_bin_ctester",
6969
},
70+
{
71+
"type": "go",
72+
"request": "launch",
73+
"name": "Launch Tools Tests",
74+
"program": "${workspaceFolder}/backend/cmd/ftester/",
75+
"envFile": "${workspaceFolder}/.env",
76+
"env": {
77+
"LLM_SERVER_CONFIG_PATH": "${workspaceFolder}/examples/configs/openrouter.provider.yml",
78+
"DATABASE_URL": "postgres://postgres:postgres@localhost:5432/pentagidb?sslmode=disable",
79+
// Langfuse (optional) uncomment to enable
80+
"LANGFUSE_BASE_URL": "http://localhost:4000",
81+
// Observability (optional) uncomment to enable
82+
"OTEL_HOST": "localhost:8148",
83+
},
84+
"args": [
85+
"-flow", "0",
86+
// "describe", "-verbose",
87+
"search", "-question", "how I can install burp suite on Kali Linux?",
88+
],
89+
"cwd": "${workspaceFolder}",
90+
"output": "${workspaceFolder}/build/__debug_bin_ftester",
91+
},
7092
]
7193
}

Dockerfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@ RUN go build -trimpath -o /pentagi ./cmd/pentagi
6464
# Build ctester utility
6565
RUN go build -trimpath -o /ctester ./cmd/ctester
6666

67+
# Build ftester utility
68+
RUN go build -trimpath -o /ftester ./cmd/ftester
69+
6770
# STEP 3: Build the final image
6871
FROM alpine:3.21
6972

@@ -91,6 +94,7 @@ RUN mkdir -p \
9194

9295
COPY --from=be-build /pentagi /opt/pentagi/bin/pentagi
9396
COPY --from=be-build /ctester /opt/pentagi/bin/ctester
97+
COPY --from=be-build /ftester /opt/pentagi/bin/ftester
9498
COPY --from=fe-build /frontend/dist /opt/pentagi/fe
9599

96100
# Copy provider configuration files

README.md

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
- [Advanced Setup](#-advanced-setup)
1313
- [Development](#-development)
1414
- [Testing LLM Agents](#-testing-llm-agents)
15+
- [Function Testing with ftester](#-function-testing-with-ftester)
1516
- [Building](#%EF%B8%8F-building)
1617
- [Credits](#-credits)
1718
- [License](#-license)
@@ -808,6 +809,243 @@ simple_json:
808809

809810
This tool helps ensure your AI agents are using the most effective models for their specific tasks, improving reliability while optimizing costs.
810811

812+
## 🔍 Function Testing with ftester
813+
814+
PentAGI includes a versatile utility called `ftester` for debugging, testing, and developing specific functions and AI agent behaviors. While `ctester` focuses on testing LLM model capabilities, `ftester` allows you to directly invoke individual system functions and AI agent components with precise control over execution context.
815+
816+
### Key Features
817+
818+
- **Direct Function Access**: Test individual functions without running the entire system
819+
- **Mock Mode**: Test functions without a live PentAGI deployment using built-in mocks
820+
- **Interactive Input**: Fill function arguments interactively for exploratory testing
821+
- **Detailed Output**: Color-coded terminal output with formatted responses and errors
822+
- **Context-Aware Testing**: Debug AI agents within the context of specific flows, tasks, and subtasks
823+
- **Observability Integration**: All function calls are logged to Langfuse and Observability stack
824+
825+
### Usage Modes
826+
827+
#### Command Line Arguments
828+
829+
Run ftester with specific function and arguments directly from the command line:
830+
831+
```bash
832+
# Basic usage with mock mode
833+
cd backend
834+
go run cmd/ftester/main.go [function_name] -[arg1] [value1] -[arg2] [value2]
835+
836+
# Example: Test terminal command in mock mode
837+
go run cmd/ftester/main.go terminal -command "ls -la" -message "List files"
838+
839+
# Using a real flow context
840+
go run cmd/ftester/main.go -flow 123 terminal -command "whoami" -message "Check user"
841+
842+
# Testing AI agent in specific task/subtask context
843+
go run cmd/ftester/main.go -flow 123 -task 456 -subtask 789 pentester -message "Find vulnerabilities"
844+
```
845+
846+
#### Interactive Mode
847+
848+
Run ftester without arguments for a guided interactive experience:
849+
850+
```bash
851+
# Start interactive mode
852+
go run cmd/ftester/main.go [function_name]
853+
854+
# For example, to interactively fill browser tool arguments
855+
go run cmd/ftester/main.go browser
856+
```
857+
858+
<details>
859+
<summary><b>Available Functions</b> (click to expand)</summary>
860+
861+
### Environment Functions
862+
- **terminal**: Execute commands in a container and return the output
863+
- **file**: Perform file operations (read, write, list) in a container
864+
865+
### Search Functions
866+
- **browser**: Access websites and capture screenshots
867+
- **google**: Search the web using Google Custom Search
868+
- **duckduckgo**: Search the web using DuckDuckGo
869+
- **tavily**: Search using Tavily AI search engine
870+
- **traversaal**: Search using Traversaal AI search engine
871+
- **perplexity**: Search using Perplexity AI
872+
873+
### Vector Database Functions
874+
- **search_in_memory**: Search for information in vector database
875+
- **search_guide**: Find guidance documents in vector database
876+
- **search_answer**: Find answers to questions in vector database
877+
- **search_code**: Find code examples in vector database
878+
879+
### AI Agent Functions
880+
- **advice**: Get expert advice from an AI agent
881+
- **coder**: Request code generation or modification
882+
- **maintenance**: Run system maintenance tasks
883+
- **memorist**: Store and organize information in vector database
884+
- **pentester**: Perform security tests and vulnerability analysis
885+
- **search**: Complex search across multiple sources
886+
887+
### Utility Functions
888+
- **describe**: Show information about flows, tasks, and subtasks
889+
890+
</details>
891+
892+
<details>
893+
<summary><b>Debugging Flow Context</b> (click to expand)</summary>
894+
895+
The `describe` function provides detailed information about tasks and subtasks within a flow. This is particularly useful for diagnosing issues when PentAGI encounters problems or gets stuck.
896+
897+
```bash
898+
# List all flows in the system
899+
go run cmd/ftester/main.go describe
900+
901+
# Show all tasks and subtasks for a specific flow
902+
go run cmd/ftester/main.go -flow 123 describe
903+
904+
# Show detailed information for a specific task
905+
go run cmd/ftester/main.go -flow 123 -task 456 describe
906+
907+
# Show detailed information for a specific subtask
908+
go run cmd/ftester/main.go -flow 123 -task 456 -subtask 789 describe
909+
910+
# Show verbose output with full descriptions and results
911+
go run cmd/ftester/main.go -flow 123 describe -verbose
912+
```
913+
914+
This function allows you to identify the exact point where a flow might be stuck and resume processing by directly invoking the appropriate agent function.
915+
916+
</details>
917+
918+
<details>
919+
<summary><b>Function Help and Discovery</b> (click to expand)</summary>
920+
921+
Each function has a help mode that shows available parameters:
922+
923+
```bash
924+
# Get help for a specific function
925+
go run cmd/ftester/main.go [function_name] -help
926+
927+
# Examples:
928+
go run cmd/ftester/main.go terminal -help
929+
go run cmd/ftester/main.go browser -help
930+
go run cmd/ftester/main.go describe -help
931+
```
932+
933+
You can also run ftester without arguments to see a list of all available functions:
934+
935+
```bash
936+
go run cmd/ftester/main.go
937+
```
938+
939+
</details>
940+
941+
<details>
942+
<summary><b>Output Format</b> (click to expand)</summary>
943+
944+
The `ftester` utility uses color-coded output to make interpretation easier:
945+
946+
- **Blue headers**: Section titles and key names
947+
- **Cyan [INFO]**: General information messages
948+
- **Green [SUCCESS]**: Successful operations
949+
- **Red [ERROR]**: Error messages
950+
- **Yellow [WARNING]**: Warning messages
951+
- **Yellow [MOCK]**: Indicates mock mode operation
952+
- **Magenta values**: Function arguments and results
953+
954+
JSON and Markdown responses are automatically formatted for readability.
955+
956+
</details>
957+
958+
<details>
959+
<summary><b>Advanced Usage Scenarios</b> (click to expand)</summary>
960+
961+
### Debugging Stuck AI Flows
962+
963+
When PentAGI gets stuck in a flow:
964+
965+
1. Pause the flow through the UI
966+
2. Use `describe` to identify the current task and subtask
967+
3. Directly invoke the agent function with the same task/subtask IDs
968+
4. Examine the detailed output to identify the issue
969+
5. Resume the flow or manually intervene as needed
970+
971+
### Testing Environment Variables
972+
973+
Verify that API keys and external services are configured correctly:
974+
975+
```bash
976+
# Test Google search API configuration
977+
go run cmd/ftester/main.go google -query "pentesting tools"
978+
979+
# Test browser access to external websites
980+
go run cmd/ftester/main.go browser -url "https://example.com"
981+
```
982+
983+
### Developing New AI Agent Behaviors
984+
985+
When developing new prompt templates or agent behaviors:
986+
987+
1. Create a test flow in the UI
988+
2. Use ftester to directly invoke the agent with different prompts
989+
3. Observe responses and adjust prompts accordingly
990+
4. Check Langfuse for detailed traces of all function calls
991+
992+
### Verifying Docker Container Setup
993+
994+
Ensure containers are properly configured:
995+
996+
```bash
997+
go run cmd/ftester/main.go -flow 123 terminal -command "env | grep -i proxy" -message "Check proxy settings"
998+
```
999+
1000+
</details>
1001+
1002+
<details>
1003+
<summary><b>Docker Container Usage</b> (click to expand)</summary>
1004+
1005+
If you have PentAGI running in Docker, you can use ftester from within the container:
1006+
1007+
```bash
1008+
# Run ftester inside the running PentAGI container
1009+
docker exec -it pentagi /opt/pentagi/bin/ftester [arguments]
1010+
1011+
# Examples:
1012+
docker exec -it pentagi /opt/pentagi/bin/ftester -flow 123 describe
1013+
docker exec -it pentagi /opt/pentagi/bin/ftester -flow 123 terminal -command "ps aux" -message "List processes"
1014+
```
1015+
1016+
This is particularly useful for production deployments where you don't have a local development environment.
1017+
1018+
</details>
1019+
1020+
<details>
1021+
<summary><b>Integration with Observability Tools</b> (click to expand)</summary>
1022+
1023+
All function calls made through ftester are logged to:
1024+
1025+
1. **Langfuse**: Captures the entire AI agent interaction chain, including prompts, responses, and function calls
1026+
2. **OpenTelemetry**: Records metrics, traces, and logs for system performance analysis
1027+
3. **Terminal Output**: Provides immediate feedback on function execution
1028+
1029+
To access detailed logs:
1030+
1031+
- Check Langfuse UI for AI agent traces (typically at `http://localhost:4000`)
1032+
- Use Grafana dashboards for system metrics (typically at `http://localhost:3000`)
1033+
- Examine terminal output for immediate function results and errors
1034+
1035+
</details>
1036+
1037+
### Command-line Options
1038+
1039+
The main utility accepts several options:
1040+
1041+
- `-env <path>` - Path to environment file (optional, default: `.env`)
1042+
- `-provider <type>` - Provider type to use (default: `custom`, options: `openai`, `anthropic`, `custom`)
1043+
- `-flow <id>` - Flow ID for testing (0 means using mocks, default: `0`)
1044+
- `-task <id>` - Task ID for agent context (optional)
1045+
- `-subtask <id>` - Subtask ID for agent context (optional)
1046+
1047+
Function-specific arguments are passed after the function name using `-name value` format.
1048+
8111049
## 🏗️ Building
8121050

8131051
### Building Docker Image

0 commit comments

Comments
 (0)