-
Notifications
You must be signed in to change notification settings - Fork 90
feat: code base indexing engine with dedicated CodeIndexAgent #721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
3a63ab2
7192c83
1b71410
2b397c4
203e2fd
ffd12b8
2790ebf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| { | ||
| "repo_notes": [ | ||
| { | ||
| "content": "GAIA is AMD's open-source framework for building AI agents in Python and C++ that run entirely on local hardware. No cloud dependency — all processing stays on-device with AMD NPU and GPU acceleration on Ryzen AI processors. The repository has two frameworks: a Python SDK (src/gaia/) and a C++ SDK (cpp/). Documentation lives in docs/ using .mdx format (Mintlify), published at https://amd-gaia.ai. Python agents inherit from the base Agent class in src/gaia/agents/base/agent.py and register tools via the @tool decorator. LLM inference runs locally via Lemonade Server. Default models: Qwen3-0.6B-GGUF (general), Qwen3.5-35B-A3B-GGUF (agents/code), Qwen3-VL-4B-Instruct-GGUF (vision). CLI entry point: src/gaia/cli.py. Development setup: uv pip install -e '.[dev]'." | ||
| } | ||
| ], | ||
| "pages": [ | ||
| { | ||
| "title": "Architecture Overview", | ||
| "purpose": "High-level architecture of both Python and C++ frameworks: agent system, LLM backends (Lemonade Server), MCP integration, and how components connect" | ||
| }, | ||
| { | ||
| "title": "Python Agent Framework", | ||
| "purpose": "Base Agent class (src/gaia/agents/base/), tool registration with @tool decorator, mixins (MCPAgent, ApiAgent), and how to create new agents", | ||
| "parent": "Architecture Overview" | ||
| }, | ||
| { | ||
| "title": "C++ Agent Framework", | ||
| "purpose": "C++ SDK for building native agents (cpp/), gaia::Agent base class, tool registration, health/wifi/process agent examples", | ||
| "parent": "Architecture Overview" | ||
| }, | ||
| { | ||
| "title": "Agent UI", | ||
| "purpose": "Privacy-first desktop chat with drag-and-drop document Q&A. FastAPI backend (src/gaia/ui/), React/Electron frontend (src/gaia/apps/webui/), launched via gaia --ui" | ||
| }, | ||
| { | ||
| "title": "Core Capabilities", | ||
| "purpose": "Document Q&A with RAG (src/gaia/rag/), speech-to-speech (Whisper ASR + Kokoro TTS in src/gaia/audio/), image generation (Stable Diffusion in src/gaia/agents/sd/), agent routing, and MCP integration" | ||
| }, | ||
| { | ||
| "title": "Code Index", | ||
| "purpose": "Semantic code search over repositories using FAISS + Lemonade embeddings (src/gaia/code_index/), CLI via gaia index, dedicated CodeIndexAgent", | ||
| "parent": "Core Capabilities" | ||
| }, | ||
| { | ||
| "title": "CLI and Configuration", | ||
| "purpose": "CLI entry point (gaia command) with subcommands: chat, talk, llm, api, mcp, index, sd, blender, jira, docker, eval. Also gaia --ui for Agent UI" | ||
| } | ||
| ] | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -429,3 +429,22 @@ EOF | |
| release-assets/* | ||
| dist/** | ||
| body_path: RELEASE_BODY.md | ||
|
|
||
| refresh-context7: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the purpose of this?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to refresh context7 submission
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. seems out of scope. |
||
| runs-on: ubuntu-latest | ||
| needs: [github-release] | ||
| steps: | ||
| - name: Refresh Context7 | ||
| run: | | ||
| HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \ | ||
| -X POST https://context7.com/api/v1/refresh \ | ||
| -H "Authorization: Bearer ${{ secrets.CONTEXT7_API_KEY }}" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"libraryName": "/amd/gaia"}') | ||
| if [ "$HTTP_STATUS" = "200" ] || [ "$HTTP_STATUS" = "202" ]; then | ||
| echo "Context7 refresh triggered (HTTP $HTTP_STATUS)" | ||
| elif [ "$HTTP_STATUS" = "429" ]; then | ||
| echo "::warning::Context7 rate limited — refresh skipped" | ||
| else | ||
| echo "::warning::Context7 refresh returned HTTP $HTTP_STATUS" | ||
| fi | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| { | ||
| "$schema": "https://context7.com/schema/context7.json", | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change seems unrelated. Its fine to include this feature but please document it both in the PR and in our docs site.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was an opportunity to take since @akostadinov mentioned DeepWiki, you and I discussed the possibility of adding DeepWiki and Context7 submission. I don't feel that its something we should document as it's a submission to a site that gives users the ability to get help, while building agents using GAIA.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do devs use the feature? is it automatic when they use claude or other code assistants? |
||
| "projectTitle": "GAIA", | ||
| "description": "AMD's open-source framework for building AI agents in Python and C++ that run entirely on local hardware, with AMD NPU and GPU acceleration on Ryzen AI processors.", | ||
| "folders": ["docs"], | ||
| "excludeFolders": ["tests", "scripts", "workshop", "docs/spec"], | ||
| "excludeFiles": ["CHANGELOG.md"], | ||
| "rules": [ | ||
| "GAIA has two frameworks: Python (src/gaia/) and C++ (cpp/). Most documentation covers the Python SDK.", | ||
| "Python agents inherit from the base Agent class in src/gaia/agents/base/agent.py", | ||
| "Tools are registered using the @tool decorator from gaia.agents.base.tools", | ||
| "LLM inference runs locally via Lemonade Server on AMD NPU/GPU hardware", | ||
| "Default models: Qwen3-0.6B-GGUF (general), Qwen3.5-35B-A3B-GGUF (agents/code), Qwen3-VL-4B-Instruct-GGUF (vision)", | ||
| "Agent UI is the primary user interface — launch with 'gaia --ui' for privacy-first desktop chat with document Q&A", | ||
| "All new features require tests in tests/ and documentation in docs/ (.mdx format for Mintlify)", | ||
| "Use 'uv pip install -e .[dev]' for development setup", | ||
| "The code index (gaia.code_index) provides semantic search over repositories using local FAISS + Lemonade embeddings" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we should use a single source of truth, aka CLAUDE.md.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is part of what is necessary for context7 submission
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there's a lot of redundancy that can get stale fast. |
||
| ], | ||
| "previousVersions": [ | ||
| {"tag": "v0.17.1"}, | ||
| {"tag": "v0.17.0"}, | ||
| {"tag": "v0.16.0"}, | ||
| {"tag": "v0.15.0"} | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will get old very quickly. How do we catch this in CI/CD if it gets stale? |
||
| ] | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,180 @@ | ||
| --- | ||
| title: Code Index | ||
| description: Semantic search over your codebase, git history, and pull requests using local AMD-accelerated embeddings. | ||
| --- | ||
|
|
||
| The GAIA Code Index enables fast semantic search over large codebases without sending your code to the cloud. It parses source files, generates embeddings via Lemonade Server on AMD NPU/GPU hardware, and stores them in a local FAISS index for sub-second queries. | ||
|
|
||
| ## Overview | ||
|
|
||
| | Feature | Description | | ||
| |---------|-------------| | ||
| | **Languages** | Python (AST), JavaScript, TypeScript, Go, Rust, Java, C, C++ | | ||
| | **Git history** | Optional — index commit messages and file changes | | ||
| | **PR search** | Optional — index closed/merged GitHub PRs via `gh` CLI | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if |
||
| | **Embeddings** | Local AMD NPU/GPU via Lemonade Server | | ||
| | **Storage** | `~/.gaia/code_index/<repo-hash>/` | | ||
|
|
||
| ## Setup | ||
|
|
||
| Install the required dependency: | ||
|
|
||
| ```bash | ||
| pip install faiss-cpu | ||
| # or, for GPU acceleration: | ||
| pip install faiss-gpu | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be part of |
||
| ``` | ||
|
|
||
| Lemonade Server must be running to generate embeddings: | ||
|
|
||
| ```bash | ||
| lemonade-server serve | ||
| ``` | ||
|
|
||
| ## CLI Usage | ||
|
|
||
| ### Index a repository | ||
|
|
||
| ```bash | ||
| # Index the current directory | ||
| gaia index | ||
|
|
||
| # Index a specific repository | ||
| gaia index --repo /path/to/repo | ||
|
|
||
| # Include git history (commit messages and changed files) | ||
| gaia index --repo /path/to/repo --git-history | ||
|
|
||
| # Include GitHub pull requests (requires gh CLI and authentication) | ||
| gaia index --repo /path/to/repo --prs | ||
| ``` | ||
|
|
||
| ### Search the index | ||
|
|
||
| ```bash | ||
| # Semantic search across code, commits, and PRs | ||
| gaia index search "how does the agent handle errors" | ||
|
|
||
| # Search only source code | ||
| gaia index search "authentication flow" --scope code | ||
|
|
||
| # Search commit history | ||
| gaia index search "fix memory leak" --scope commit | ||
|
|
||
| # Return more results | ||
| gaia index search "embedding model" --top-k 20 | ||
| ``` | ||
|
|
||
| ### Manage the index | ||
|
|
||
| ```bash | ||
| # Show index status | ||
| gaia index status | ||
|
|
||
| # Clear and rebuild | ||
| gaia index clear | ||
| gaia index | ||
| ``` | ||
|
|
||
| ## Agent Tools | ||
|
|
||
| When the code index is wired into an agent (ChatAgent or CodeAgent), five tools become available: | ||
|
|
||
| | Tool | Description | | ||
| |------|-------------| | ||
| | `index_codebase` | Index a repository (path optional) | | ||
| | `search_code_index` | Semantic search over indexed chunks | | ||
| | `code_index_status` | Show index statistics | | ||
| | `clear_code_index` | Remove the cached index | | ||
| | `search_git_history` | Text search over commit messages via git | | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we have a solid |
||
|
|
||
| ### Example agent interaction | ||
|
|
||
| ``` | ||
| User: Find all places in the codebase where we handle authentication errors | ||
|
|
||
| Agent: [calls search_code_index with query="authentication error handling"] | ||
|
|
||
| Results: | ||
| - src/gaia/agents/base/agent.py:145 — handle_error() function | ||
| - src/gaia/llm/lemonade_client.py:89 — auth retry logic | ||
| - tests/unit/test_auth.py:23 — test_auth_error_recovery | ||
| ``` | ||
|
|
||
| ## Python SDK | ||
|
|
||
| ```python | ||
| from gaia.code_index.sdk import CodeIndexConfig, CodeIndexSDK | ||
|
|
||
| config = CodeIndexConfig( | ||
| repo_path="/path/to/repo", | ||
| index_git_history=True, | ||
| index_prs=False, | ||
| max_files=5000, | ||
| embedding_model="nomic-embed-text-v2-moe-GGUF", | ||
| ) | ||
|
|
||
| sdk = CodeIndexSDK(config) | ||
|
|
||
| # Index the repository | ||
| result = sdk.index_repository() | ||
| print(f"Indexed {result.files_indexed} files, {result.chunks_created} chunks") | ||
|
|
||
| # Search | ||
| results = sdk.search("how does agent tool registration work", top_k=5) | ||
| for r in results: | ||
| chunk = r.chunk | ||
| print(f"{chunk.file_path}:{chunk.start_line} — {chunk.symbol_name} (score: {r.score:.3f})") | ||
|
|
||
| # Check status | ||
| status = sdk.get_status() | ||
| print(f"Total chunks: {status['total_chunks']}") | ||
| ``` | ||
|
|
||
| ## Configuration | ||
|
|
||
| ```python | ||
| from gaia.code_index.sdk import CodeIndexConfig | ||
|
|
||
| config = CodeIndexConfig( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the code index SDK integrated with the code agent? |
||
| repo_path=".", # Repository root path (required) | ||
| max_files=5000, # Max files to index | ||
| max_file_size_mb=1.0, # Skip files larger than this | ||
| chunk_overlap=50, # Token overlap between chunks | ||
| embedding_model="nomic-embed-text-v2-moe-GGUF", # Lemonade model | ||
| cache_dir="~/.gaia/code_index", # Cache location | ||
| index_git_history=True, # Include git commits | ||
| index_prs=False, # Include GitHub PRs | ||
| max_commits=1000, # Max commits to index | ||
| embedding_base_url=None, # Custom Lemonade URL (default: localhost) | ||
| ) | ||
| ``` | ||
|
|
||
| ## Supported Languages | ||
|
|
||
| | Language | Parser | Symbols extracted | | ||
| |----------|--------|-------------------| | ||
| | Python | AST | Functions, classes, methods | | ||
| | JavaScript/TypeScript | Regex | Functions, classes, interfaces, arrow functions | | ||
| | Go | Regex | Functions, structs, interfaces | | ||
| | Rust | Regex | Functions (`fn`), structs, enums, impl blocks | | ||
| | Java | Regex | Classes, methods | | ||
| | C/C++ | Regex | Functions | | ||
| | Other | Block splitter | Paragraph blocks | | ||
|
|
||
| ## Cache Layout | ||
|
|
||
| ``` | ||
| ~/.gaia/code_index/ | ||
| └── <repo-hash>/ | ||
| ├── metadata.json # Chunk metadata, file hashes, model name | ||
| └── index.faiss # FAISS IndexFlatL2 embeddings | ||
| ``` | ||
|
|
||
| The cache is keyed by the repository root path hash. The embedding model name is stored in metadata — a warning is shown if the model changes between runs (requiring a re-index). | ||
|
|
||
| ## Privacy | ||
|
|
||
| All processing is local. No source code, commit messages, or PR data is sent to external services. Embeddings are generated by your local Lemonade Server instance using AMD NPU/GPU hardware. | ||
|
|
||
| Sensitive files are automatically excluded from indexing: `.env`, `.pem`, `.key`, credential files, and files matching common secret patterns. | ||
Uh oh!
There was an error while loading. Please reload this page.