Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ LLM_SERVER_KEY=
LLM_SERVER_MODEL=
LLM_SERVER_CONFIG_PATH=

## Embedding
EMBEDDING_URL=
EMBEDDING_KEY=
EMBEDDING_MODEL=
EMBEDDING_PROVIDER=
EMBEDDING_BATCH_SIZE=

## HTTP proxy to use it in isolation environment
PROXY_URL=

Expand Down
16 changes: 16 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"runtimeExecutable": "npm",
"runtimeArgs": ["run", "dev"],
"env": {
"VITE_APP_LOG_LEVEL": "DEBUG",
"VITE_API_URL": "localhost:8080",
"VITE_USE_HTTPS": "false",
"VITE_PORT": "8000",
Expand Down Expand Up @@ -89,5 +90,20 @@
"cwd": "${workspaceFolder}",
"output": "${workspaceFolder}/build/__debug_bin_ftester",
},
{
"type": "go",
"request": "launch",
"name": "Launch Embedding Tests",
"program": "${workspaceFolder}/backend/cmd/etester/",
"envFile": "${workspaceFolder}/.env",
"env": {
"DATABASE_URL": "postgres://postgres:postgres@localhost:5432/pentagidb?sslmode=disable",
},
"args": [
"-cmd", "reindex",
],
"cwd": "${workspaceFolder}",
"output": "${workspaceFolder}/build/__debug_bin_etester",
},
]
}
4 changes: 4 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ RUN go build -trimpath -o /ctester ./cmd/ctester
# Build ftester utility
RUN go build -trimpath -o /ftester ./cmd/ftester

# Build etester utility
RUN go build -trimpath -o /etester ./cmd/etester

# STEP 3: Build the final image
FROM alpine:3.21

Expand Down Expand Up @@ -95,6 +98,7 @@ RUN mkdir -p \
COPY --from=be-build /pentagi /opt/pentagi/bin/pentagi
COPY --from=be-build /ctester /opt/pentagi/bin/ctester
COPY --from=be-build /ftester /opt/pentagi/bin/ftester
COPY --from=be-build /etester /opt/pentagi/bin/etester
COPY --from=fe-build /frontend/dist /opt/pentagi/fe

# Copy provider configuration files
Expand Down
133 changes: 133 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- [Advanced Setup](#-advanced-setup)
- [Development](#-development)
- [Testing LLM Agents](#-testing-llm-agents)
- [Embedding Configuration and Testing](#-embedding-configuration-and-testing)
- [Function Testing with ftester](#-function-testing-with-ftester)
- [Building](#%EF%B8%8F-building)
- [Credits](#-credits)
Expand Down Expand Up @@ -809,6 +810,138 @@ simple_json:

This tool helps ensure your AI agents are using the most effective models for their specific tasks, improving reliability while optimizing costs.

## 🧮 Embedding Configuration and Testing

PentAGI uses vector embeddings for semantic search, knowledge storage, and memory management. The system supports multiple embedding providers that can be configured according to your needs and preferences.

### Supported Embedding Providers

PentAGI supports the following embedding providers:

- **OpenAI** (default): Uses OpenAI's text embedding models
- **Ollama**: Local embedding model through Ollama
- **Mistral**: Mistral AI's embedding models
- **Jina**: Jina AI's embedding service
- **HuggingFace**: Models from HuggingFace
- **GoogleAI**: Google's embedding models
- **VoyageAI**: VoyageAI's embedding models

<details>
<summary><b>Embedding Provider Configuration</b> (click to expand)</summary>

### Environment Variables

To configure the embedding provider, set the following environment variables in your `.env` file:

```bash
# Primary embedding configuration
EMBEDDING_PROVIDER=openai # Provider type (openai, ollama, mistral, jina, huggingface, googleai, voyageai)
EMBEDDING_MODEL=text-embedding-3-small # Model name to use
EMBEDDING_URL= # Optional custom API endpoint
EMBEDDING_KEY= # API key for the provider (if required)
EMBEDDING_BATCH_SIZE=100 # Number of documents to process in a batch
EMBEDDING_STRIP_NEW_LINES=true # Whether to remove new lines from text before embedding

# Advanced settings
PROXY_URL= # Optional proxy for all API calls
```

### Provider-Specific Limitations

Each provider has specific limitations and supported features:

- **OpenAI**: Supports all configuration options
- **Ollama**: Does not support `EMBEDDING_KEY` as it uses local models
- **Mistral**: Does not support `EMBEDDING_MODEL` or custom HTTP client
- **Jina**: Does not support custom HTTP client
- **HuggingFace**: Requires `EMBEDDING_KEY` and supports all other options
- **GoogleAI**: Does not support `EMBEDDING_URL`, requires `EMBEDDING_KEY`
- **VoyageAI**: Supports all configuration options

If `EMBEDDING_URL` and `EMBEDDING_KEY` are not specified, the system will attempt to use the corresponding LLM provider settings (e.g., `OPEN_AI_KEY` when `EMBEDDING_PROVIDER=openai`).

### Why Consistent Embedding Providers Matter

It's crucial to use the same embedding provider consistently because:

1. **Vector Compatibility**: Different providers produce vectors with different dimensions and mathematical properties
2. **Semantic Consistency**: Changing providers can break semantic similarity between previously embedded documents
3. **Memory Corruption**: Mixed embeddings can lead to poor search results and broken knowledge base functionality

If you change your embedding provider, you should flush and reindex your entire knowledge base (see `etester` utility below).

</details>

### Embedding Tester Utility (etester)

PentAGI includes a specialized `etester` utility for testing, managing, and debugging embedding functionality. This tool is essential for diagnosing and resolving issues related to vector embeddings and knowledge storage.

<details>
<summary><b>Etester Commands</b> (click to expand)</summary>

```bash
# Test embedding provider and database connection
cd backend
go run cmd/etester/main.go test -verbose

# Show statistics about the embedding database
go run cmd/etester/main.go info

# Delete all documents from the embedding database (use with caution!)
go run cmd/etester/main.go flush

# Recalculate embeddings for all documents (after changing provider)
go run cmd/etester/main.go reindex

# Search for documents in the embedding database
go run cmd/etester/main.go search -query "How to install PostgreSQL" -limit 5
```

### Using Docker

If you're running PentAGI in Docker, you can use etester from within the container:

```bash
# Test embedding provider
docker exec -it pentagi /opt/pentagi/bin/etester test

# Show detailed database information
docker exec -it pentagi /opt/pentagi/bin/etester info -verbose
```

### Advanced Search Options

The `search` command supports various filters to narrow down results:

```bash
# Filter by document type
docker exec -it pentagi /opt/pentagi/bin/etester search -query "Security vulnerability" -doc_type guide -threshold 0.8

# Filter by flow ID
docker exec -it pentagi /opt/pentagi/bin/etester search -query "Code examples" -doc_type code -flow_id 42

# All available search options
docker exec -it pentagi /opt/pentagi/bin/etester search -help
```

Available search parameters:
- `-query STRING`: Search query text (required)
- `-doc_type STRING`: Filter by document type (answer, memory, guide, code)
- `-flow_id NUMBER`: Filter by flow ID (positive number)
- `-answer_type STRING`: Filter by answer type (guide, vulnerability, code, tool, other)
- `-guide_type STRING`: Filter by guide type (install, configure, use, pentest, development, other)
- `-limit NUMBER`: Maximum number of results (default: 3)
- `-threshold NUMBER`: Similarity threshold (0.0-1.0, default: 0.7)

### Common Troubleshooting Scenarios

1. **After changing embedding provider**: Always run `flush` or `reindex` to ensure consistency
2. **Poor search results**: Try adjusting the similarity threshold or check if embeddings are correctly generated
3. **Database connection issues**: Verify PostgreSQL is running with pgvector extension installed
4. **Missing API keys**: Check environment variables for your chosen embedding provider

</details>

## 🔍 Function Testing with ftester

PentAGI includes a versatile utility called `ftester` for debugging, testing, and developing specific functions and AI agent behaviors. While `ctester` focuses on testing LLM model capabilities, `ftester` allows you to directly invoke individual system functions and AI agent components with precise control over execution context.
Expand Down
42 changes: 42 additions & 0 deletions backend/cmd/etester/flush.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package main

import (
"fmt"
"os"

"pentagi/pkg/terminal"
)

// flush deletes all documents from the embedding store
func (t *Tester) flush() error {
terminal.Warning("This will delete ALL documents from the embedding store.")
response, err := terminal.GetYesNoInputContext(t.ctx, "Are you sure you want to continue?", os.Stdin)
if err != nil {
return fmt.Errorf("failed to get yes/no input: %w", err)
}

if !response {
terminal.Info("Operation cancelled.")
return nil
}

tx, err := t.conn.Begin(t.ctx)
if err != nil {
return fmt.Errorf("failed to start transaction: %w", err)
}
defer tx.Rollback(t.ctx)

result, err := tx.Exec(t.ctx, fmt.Sprintf("DELETE FROM %s", t.embeddingTableName))
if err != nil {
return fmt.Errorf("failed to delete documents: %w", err)
}

if err := tx.Commit(t.ctx); err != nil {
return fmt.Errorf("failed to commit transaction: %w", err)
}

rowsAffected := result.RowsAffected()
terminal.Success("\nSuccessfully deleted %d documents from the embedding store.", rowsAffected)

return nil
}
Loading
Loading