Like Postman for APIs, but for AI Agents π
Production-grade testing framework for AI agents. Validates Task completion, Trajectory efficiency, and Memory consistency across Azure AI Foundry, OpenAI, LangChain, and custom implementations.
- Task Validation: Goal achievement, constraint satisfaction, output schema compliance
- Trajectory Validation: Action efficiency, loop detection, path optimization
- Memory Validation: Context retention, consistency checking, relevance scoring
- π Security Analysis: SAST scanning, dependency vulnerabilities, configuration security
- π‘οΈ Security Scanning: Built-in SAST, dependency scanning, secret detection, and configuration security checks
- Multi-Platform: Azure AI Foundry, OpenAI, GitHub Models, LangChain, custom agents
- Enterprise-Ready: Security-first design, comprehensive logging, CI/CD integration
- Extensible: Adapter pattern for custom platforms
Testing AI agents shouldn't be harder than testing APIs. Agent Tester brings the simplicity of Postman to AI agent testing:
- β Simple CLI - Test agents with a single command
- π YAML/JSON Configuration - Define tests like Postman collections
- π¨ Rich Output - Beautiful, readable test results
- π Multi-Platform - Works with Azure, OpenAI, LangChain, and more
- π Production-Ready - Enterprise-grade validation and reporting
- π‘οΈ Built-in Security - Automated vulnerability scanning and security best practices
# Install from PyPI (coming soon)
pip install agent-tester
# Or install from source
git clone https://github.com/ritikkumarv/agent-tester.git
cd agent-tester
pip install -e .# 1. Initialize a test configuration
agent-tester init
# 2. Edit agent_tests.yaml to define your tests
# 3. Run tests
agent-tester run -c agent_tests.yaml
# 4. View beautiful HTML report
open test_report.htmlπ Read the Complete Quick Start Guide
# Install Azure SDK
pip install agent-framework-azure-ai --pre
# Configure
export AZURE_AI_PROJECT_ENDPOINT="https://your-resource.services.ai.azure.com/api/projects/your-project"
export AZURE_AI_MODEL_DEPLOYMENT="your-deployment-name"
# Test
python tests/test_azure_simple.py# Configure
export OPENAI_API_KEY="your-api-key"
# Test
python examples/example_test_openai_agent.py# Install Anthropic SDK
pip install anthropic
# Configure
export ANTHROPIC_API_KEY="your-api-key"
# Test
python examples/example_anthropic_agent.py# Initialize test configuration
agent-tester init
# Run all tests
agent-tester run -c my_tests.yaml
# Run with custom output
agent-tester run -c my_tests.yaml -o custom_report.html
# Quick validation
agent-tester validate task1 --goal "Summarize this document"
# Run security scan
agent-tester security
# Run security scan with custom output
agent-tester security --format json --output security_report.json
# Security Scanning
agent-tester security scan # Run full security scan
agent-tester security scan --type sast # Run only SAST
agent-tester security scan --type dependency # Check dependencies
agent-tester security report # Quick security summary
agent-tester security check-deps # Check for vulnerable deps
# See all commands
agent-tester --help
# View examples
agent-tester examples# Scan current directory
agent-tester security
# Scan specific path
agent-tester security --path /path/to/project
# Show only critical/high severity issues
agent-tester security --severity high
# Generate JSON report
agent-tester security --format json --output report.jsonfrom agent_tester import (
TaskDefinition,
TaskValidator,
TrajectoryValidator,
MemoryValidator
)
# Define your test
task = TaskDefinition(
task_id="sentiment_analysis",
goal="Analyze customer sentiment from reviews",
expected_output_schema={"required": ["sentiment", "confidence"]},
timeout_seconds=30
)
# Run your agent
result = my_agent.execute(task)
# Validate results
validator = TaskValidator()
validation = validator.validate(
result["output"],
task,
result["execution_time"]
)
print(f"β
Passed: {validation.passed}")
print(f"Goal Achieved: {validation.goal_achieved}")
print(f"Constraints Met: {all(validation.constraints_met.values())}")from agent_tester.security import SecurityValidator
# Run comprehensive security scan
validator = SecurityValidator()
report = validator.validate_repository("./my-project")
# Get summary
summary = report.get_summary()
print(f"Total Issues: {summary['total_issues']}")
print(f"Critical: {summary['critical']}")
print(f"High: {summary['high']}")
# Get critical issues
for issue in report.critical_issues:
print(f"π΄ {issue.title}")
print(f" File: {issue.file_path}:{issue.line_number}")
print(f" Fix: {issue.recommendation}")
# Export reports
validator.export_report(report, format="markdown", output_file="security_report.md")
validator.export_report(report, format="json", output_file="security_report.json")name: My Agent Test Suite
description: Comprehensive tests for my AI agent
tests:
- task_id: test_1
goal: "Summarize a technical document"
constraints:
- name: word_count
type: value_in_range
min_value: 50
max_value: 200
expected_output_schema:
required: ["summary", "key_points"]
timeout_seconds: 60
- task_id: test_2
goal: "Answer customer questions accurately"
expected_output_schema:
required: ["answer", "confidence"]
timeout_seconds: 30
validators:
task:
strict_mode: false
trajectory:
max_actions: 20
allow_backtracking: true
memory:
min_retention_score: 0.7Agent Tester includes a comprehensive Cybersecurity & Secure-Code Contributor module that provides:
-
Static Application Security Testing (SAST)
- Detects code injection vulnerabilities (eval, exec, compile)
- Identifies SQL and command injection patterns
- Finds hardcoded secrets and credentials
- Detects insecure deserialization and weak cryptography
-
Dependency Vulnerability Scanning
- Scans requirements.txt and pyproject.toml
- Detects known CVEs in dependencies
- Identifies unpinned dependencies
-
Configuration Security Analysis
- Scans .env files for exposed secrets
- Analyzes Dockerfile for security issues
- Checks GitHub Actions workflows for injection vulnerabilities
- Validates YAML configurations
Integrated with industry standards:
- OWASP Top 10 2021 - Web application security risks
- SANS Top 25 CWE - Most dangerous software weaknesses
- MITRE ATT&CK - Adversary tactics and techniques
- 5-tier severity system: Critical, High, Medium, Low, Info
- Detailed remediation guidance with code samples
- Multiple export formats: Markdown, JSON
- CVE/CWE/OWASP references for each issue
π Read the Complete Security Documentation
agent-tester/
βββ agent_tester/ # Main package
β βββ __init__.py # Package exports
β βββ models.py # Core data models
β βββ cli.py # Command-line interface
β βββ suite.py # Test orchestration
β βββ validators/ # Validation modules
β β βββ task_validator.py
β β βββ trajectory_validator.py
β β βββ memory_validator.py
β βββ adapters/ # Platform adapters
β β βββ azure_adapter.py # Azure AI Foundry
β β βββ openai_adapter.py # OpenAI
β βββ security/ # π Security module
β βββ sast_scanner.py # Static code analysis
β βββ dependency_scanner.py # CVE scanning
β βββ config_scanner.py # Configuration security
β βββ security_validator.py # Orchestrator
β βββ security_reporter.py # Report generation
β βββ knowledge_base.py # OWASP/SANS/MITRE
β βββ security/ # Security scanning (NEW!)
β βββ sast_scanner.py # Static security analysis
β βββ dependency_scanner.py # Dependency vulnerabilities
β βββ secret_scanner.py # Secret detection
β βββ config_scanner.py # Configuration security
β βββ security_orchestrator.py # Scan coordination
βββ examples/ # Usage examples
β βββ simple_example.py
β βββ security_scan_example.py
βββ tests/ # Test files
β βββ security/ # Security module tests
βββ .github/workflows/ # CI/CD workflows
β βββ security-scan.yml # Automated security scanning
βββ pyproject.toml # Package configuration
βββ QUICKSTART.md # Getting started guide
βββ SECURITY_MODULE.md # Security documentation
βββ README.md # This file
Agent Tester includes comprehensive security scanning capabilities:
- SAST (Static Application Security Testing): Detects SQL injection, XSS, command injection, path traversal, eval usage, weak crypto, and more
- Dependency Scanning: Identifies known CVEs in your dependencies
- Secret Detection: Finds exposed API keys, passwords, tokens (AWS, GitHub, OpenAI, etc.)
- Configuration Security: Checks for insecure defaults and misconfigurations
# Run full security scan
agent-tester security scan
# Generate HTML report
agent-tester security scan --format html --output security_report
# Check only dependencies
agent-tester security check-depsThe included GitHub Actions workflow automatically:
- Scans every push and PR
- Posts security summaries on PRs
- Runs daily security audits
- Blocks merges with critical vulnerabilities
See Security Documentation for detailed information.
# All tests
pytest test_agent_framework.py -v
# With coverage
pytest test_agent_framework.py --cov=agent_tester --cov-report=html
# Run simple example
python examples/simple_example.pySee DOCUMENTATION.md for:
- Complete API reference
- Platform-specific guides
- Enterprise deployment patterns
- CI/CD integration examples
- Troubleshooting guide
- Competitive analysis
This framework follows security best practices:
- No code execution vulnerabilities
- Environment variable-based configuration
- Pinned dependencies
- Comprehensive input validation
Report security issues per SECURITY.md.
# GitHub Actions
- run: |
pip install -r requirements.txt
pytest tests/ -v --html=report.html
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}- Azure AI Foundry
- OpenAI
- Anthropic (Claude)
- GitHub Models
- LangChain
- Custom Agents
| Feature | This Framework | LangSmith | DeepEval |
|---|---|---|---|
| Task Validation | Yes | Limited | Yes |
| Trajectory Analysis | Yes | No | Limited |
| Memory Testing | Yes | No | No |
| Multi-Platform | Yes | Limited | Limited |
| Self-Hosted | Yes | No | Yes |
| Open Source | Yes | No | Yes |
Key Differentiators:
- Only framework testing all three dimensions
- Platform-agnostic architecture
- Production-ready security and logging
- No vendor lock-in
Contributions welcome. See CONTRIBUTING.md.
MIT License. See LICENSE.
- Documentation: DOCUMENTATION.md
- Issues: GitHub Issues
- Security: SECURITY.md