Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env.sample
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
TAVILY_API_KEY=xxxxx
OPENAI_API_KEY=xxxxx
VITE_APP_URL=http://localhost:5173
GROQ_API_KEY=xxxx
GROQ_API_KEY=xxxxx
224 changes: 224 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
# CLAUDE.md - Developer Guide for AI Assistants

## Project Overview

This is the **Tavily Chat Web Agent** - a conversational AI agent with real-time web access capabilities. The project demonstrates building a ReAct agent that can intelligently route questions, perform web searches, extract content, and crawl websites using Tavily's API.

### Key Technologies
- **Backend**: Python 3.11, FastAPI, LangGraph, LangChain
- **Frontend**: React 19, TypeScript, Vite, TailwindCSS
- **AI/LLM**: OpenAI (GPT-4.1-nano), Groq (Kimi-K2)
- **Web Tools**: Tavily (Search, Extract, Crawl)
- **Observability**: Weave integration

## Architecture

### Backend Architecture (`backend/`)
The backend implements a ReAct agent pattern with three main components:

1. **Agent (`backend/agent.py`)**
- `WebAgent` class: Orchestrates the LangGraph workflow
- Uses `create_react_agent` from LangGraph prebuilt
- Wraps Tavily tools with summarization capabilities
- Implements `create_output_summarizer()` for processing tool outputs
- Tools: TavilySearch, TavilyExtract, TavilyCrawl

2. **Prompts (`backend/prompts.py`)**
- `REASONING_PROMPT`: For complex queries requiring multi-step reasoning
- `SIMPLE_PROMPT`: For straightforward queries
- Customizable prompt templates for agent behavior

3. **Server (`app.py`)**
- FastAPI application with CORS support
- Streaming responses via `/stream_agent` endpoint
- Conversation management (save, list, get, delete)
- File upload support for document processing
- Uses `MemorySaver` for conversation checkpointing

4. **Utilities**
- `backend/utils.py`: API key validation
- `backend/response_handler.py`: Conversation persistence
- `backend/file_handler.py`: Document upload handling

### Frontend Architecture (`ui/`)
- **React + TypeScript**: Type-safe component architecture
- **Vite**: Fast build tool and dev server
- **TailwindCSS**: Utility-first styling
- **Streaming UI**: Real-time display of agent reasoning steps
- **Markdown Support**: Rich text rendering with `react-markdown`
- **Citations**: Displays web sources with favicons

## Development Setup

### Backend Setup
```bash
# Create and activate virtual environment
python3.11 -m venv venv
source venv/bin/activate # Windows: .\venv\Scripts\activate

# Install dependencies
python3.11 -m pip install -r requirements.txt

# Set up environment variables (see below)

# Run server from project root
python app.py
```

### Frontend Setup
```bash
cd ui
npm install
npm run dev # Starts on http://localhost:5173
```

### Environment Variables

**Root `.env`:**
```bash
TAVILY_API_KEY="your-tavily-api-key"
OPENAI_API_KEY="your-openai-api-key"
GROQ_API_KEY="your-groq-api-key"
VITE_APP_URL=http://localhost:5173
```

**`ui/.env`:**
```bash
VITE_BACKEND_URL=http://localhost:8080
```

## Key Files and Their Purpose

### Backend Files
- `app.py` (312 lines): FastAPI server, routing, streaming logic
- `backend/agent.py` (153 lines): Core agent logic, tool wrapping, summarization
- `backend/prompts.py`: System prompts for agent behavior
- `backend/response_handler.py`: Conversation persistence layer
- `backend/file_handler.py`: Document upload and processing
- `backend/utils.py`: Helper functions, API key validation

### Frontend Files
- `ui/src/`: React components and application logic
- `ui/index.html`: Entry point
- `ui/vite.config.ts`: Vite configuration
- `ui/tailwind.config.js`: TailwindCSS configuration
- `ui/tsconfig.json`: TypeScript configuration

## Common Development Tasks

### Modifying Agent Behavior
1. Edit prompts in `backend/prompts.py`
2. Adjust tool configurations in `backend/agent.py` (lines 87-101)
3. Modify summarization logic in `create_output_summarizer()` (lines 15-62)

### Adding New Tools
1. Import the tool in `backend/agent.py`
2. Initialize in `build_graph()` method
3. Add to tools list when calling `create_react_agent()` (line 147)
4. Optionally wrap with summarization logic

### Changing LLM Models
Modify the model initialization in `app.py`:
```python
nano = ChatOpenAI(model="gpt-4.1-nano", api_key=os.getenv("OPENAI_API_KEY"))
kimik2 = ChatGroq(model="moonshotai/kimi-k2-instruct", api_key=os.getenv("GROQ_API_KEY"))
```

### Customizing UI
- Components are in `ui/src/`
- Styling uses TailwindCSS utility classes
- Markdown rendering configured in relevant components

### Running Tests
```bash
# Backend (if tests are added)
pytest

# Frontend
cd ui
npm run lint # Check linting
npm run format # Format code
```

## API Endpoints

### POST `/stream_agent`
Streams agent execution responses with reasoning steps.

**Request Body:**
```json
{
"messages": [{"role": "user", "content": "query"}],
"thread_id": "unique-thread-id"
}
```

**Response:** Server-Sent Events (SSE) stream with agent steps

### Conversation Management
- `POST /save_conversation_turn`: Save conversation history
- `GET /list_conversations`: List all conversations
- `GET /get_conversation/{id}`: Get specific conversation
- `DELETE /delete_conversation/{id}`: Delete conversation

### File Upload
- `POST /upload_file`: Upload documents for processing
Comment on lines +159 to +165
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API endpoint documentation is outdated and doesn't match the actual implementation:

  • Listed endpoints like /save_conversation_turn, /list_conversations, /get_conversation/{id}, /delete_conversation/{id} don't match the actual endpoints
  • Actual endpoints are: GET /conversations, GET /conversations/{filename}, DELETE /conversations/{filename}
  • The file upload endpoint is listed as /upload_file but is actually /upload

Update the documentation to reflect the correct endpoints.

Suggested change
- `POST /save_conversation_turn`: Save conversation history
- `GET /list_conversations`: List all conversations
- `GET /get_conversation/{id}`: Get specific conversation
- `DELETE /delete_conversation/{id}`: Delete conversation
### File Upload
- `POST /upload_file`: Upload documents for processing
- `GET /conversations`: List all conversations
- `GET /conversations/{filename}`: Get a specific conversation
- `DELETE /conversations/{filename}`: Delete a conversation
### File Upload
- `POST /upload`: Upload documents for processing

Copilot uses AI. Check for mistakes.

## Important Notes for AI Assistants

### Code Style
- Backend: Python type hints, async/await patterns
- Frontend: TypeScript strict mode, functional components
- Use ESLint for frontend (`npm run lint:fix`)
- Use Prettier for formatting (`npm run format`)

### Testing Considerations
- Test with both OpenAI and Groq models
- Validate Tavily API responses
- Test streaming behavior
- Check CORS settings for frontend-backend communication

### Common Gotchas
1. **API Keys**: Must be set in `.env` files before running
2. **CORS**: `VITE_APP_URL` must match frontend dev server
3. **Python Version**: Requires Python 3.11
4. **Port Conflicts**: Backend runs on 8080, frontend on 5173
5. **Streaming**: Response handler removes callback managers to avoid Pydantic issues (lines 107-131 in `agent.py`)

### When Making Changes
1. **Read before modifying**: Always read the full file before making edits
2. **Preserve patterns**: Follow existing code patterns and structure
3. **Test both sides**: Changes to API contracts affect both backend and frontend
4. **Update prompts carefully**: Prompt changes significantly affect agent behavior
5. **Check dependencies**: Keep `requirements.txt` and `package.json` in sync with code

## Extending the Project

### Adding New Data Sources
1. Integrate new tools in `backend/agent.py`
2. Add API keys to `.env`
3. Update prompts to instruct agent when to use new tools

### Customizing Agent Architecture
- Modify `create_react_agent` call or implement custom LangGraph
- Adjust state management and checkpointing
- Add custom nodes and edges to the graph

### Enhancing UI
- Add new React components in `ui/src/`
- Extend Markdown rendering capabilities
- Implement additional visualization for tool outputs

## Resources

- [Tavily Documentation](https://docs.tavily.com)
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
- [LangChain Documentation](https://python.langchain.com/)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [React Documentation](https://react.dev/)

## Contact

For questions or custom implementations:
- Dean Sacoransky: deansa@tavily.com
- Michael Griff: michaelgriff@tavily.com
65 changes: 64 additions & 1 deletion app.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,12 @@
import requests
import uvicorn
from dotenv import load_dotenv
from fastapi import Depends, FastAPI, HTTPException, Request
from fastapi import Depends, FastAPI, HTTPException, Request, UploadFile, File
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import StreamingResponse
from typing import List
from backend.file_handler import save_uploaded_file
from backend.response_handler import save_conversation_turn, get_turn_number, list_conversations, get_conversation_content, delete_conversation
from langchain.schema import HumanMessage
from langgraph.graph.state import CompiledStateGraph as CompiledGraph
from pydantic import BaseModel
Expand Down Expand Up @@ -70,10 +73,56 @@ class AgentRequest(BaseModel):
agent_type: str


# Store uploaded file contents in memory (per session you could use thread_id)
uploaded_file_contents: dict = {}
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The global uploaded_file_contents dictionary is not thread-safe and shared across all requests. This creates multiple issues:

  1. Race conditions when multiple users upload files simultaneously
  2. Memory leak as files are never cleaned up
  3. Privacy concern as all users share the same file storage

Consider using a session-based storage mechanism or associating files with thread_id:

uploaded_file_contents: dict[str, dict] = {}  # thread_id -> {filename: content}

# In upload endpoint, require thread_id
@app.post("/upload")
async def upload_files(thread_id: str, files: List[UploadFile] = File(...)):
    if thread_id not in uploaded_file_contents:
        uploaded_file_contents[thread_id] = {}
    ...

Copilot uses AI. Check for mistakes.


@app.get("/")
async def ping():
return {"message": "Alive"}


@app.get("/conversations")
async def get_conversations():
"""Get list of all saved conversations."""
return {"conversations": list_conversations()}


@app.get("/conversations/{filename}")
async def get_conversation(filename: str):
"""Get content of a specific conversation."""
content = get_conversation_content(filename)
if not content:
raise HTTPException(status_code=404, detail="Conversation not found")
return {"content": content}
Comment on lines +91 to +97
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path traversal vulnerability: The filename parameter from the URL is used directly without validation. A malicious user could access files outside the responses directory using paths like ../../etc/passwd. Consider validating that the filename doesn't contain path traversal sequences:

from pathlib import Path

@app.get("/conversations/{filename}")
async def get_conversation(filename: str):
    # Prevent path traversal
    if ".." in filename or "/" in filename or "\\" in filename:
        raise HTTPException(status_code=400, detail="Invalid filename")
    content = get_conversation_content(filename)
    ...

Copilot uses AI. Check for mistakes.


@app.delete("/conversations/{filename}")
async def remove_conversation(filename: str):
"""Delete a conversation."""
success = delete_conversation(filename)
if not success:
raise HTTPException(status_code=404, detail="Conversation not found")
return {"message": "Deleted"}
Comment on lines +100 to +106
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path traversal vulnerability: The filename parameter from the URL is used directly without validation in the delete endpoint. This allows potential deletion of arbitrary files outside the responses directory. Apply the same validation as suggested for the GET endpoint to prevent path traversal attacks.

Copilot uses AI. Check for mistakes.


@app.post("/upload")
async def upload_files(files: List[UploadFile] = File(...)):
"""Upload and process files."""
results = []
for file in files:
try:
result = await save_uploaded_file(file)
# Store content for later use in chat
uploaded_file_contents[result["filename"]] = result["content"]
results.append(result)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=f"Error processing {file.filename}: {str(e)}")
return {"uploaded": results}
Comment on lines +109 to +123
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing file size limits for uploads. Without size restrictions, this endpoint is vulnerable to denial-of-service attacks through large file uploads that could exhaust disk space or memory. Consider adding:

@app.post("/upload")
async def upload_files(files: List[UploadFile] = File(...)):
    MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB
    results = []
    for file in files:
        # Check file size
        content = await file.read()
        if len(content) > MAX_FILE_SIZE:
            raise HTTPException(status_code=413, detail=f"File {file.filename} exceeds size limit")
        await file.seek(0)  # Reset file pointer
        ...

Copilot uses AI. Check for mistakes.


@app.post("/stream_agent")
async def stream_agent(
body: AgentRequest,
Expand Down Expand Up @@ -255,6 +304,20 @@ async def event_generator():
)
+ "\n"
)

# Save the conversation turn to file
try:
turn_number = get_turn_number(body.thread_id)
uploaded_files_list = list(uploaded_file_contents.keys()) if uploaded_file_contents else None
await save_conversation_turn(
thread_id=body.thread_id,
question=body.input,
answer=final_answer,
turn_number=turn_number,
uploaded_files=uploaded_files_list
Comment on lines +311 to +317
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversation is saved with the global uploaded_file_contents dictionary keys, which includes files from all users/sessions. This will incorrectly attribute files to conversations that don't belong to them. The file list should be associated with the specific thread_id or passed through the request. Consider tracking which files belong to which conversation/thread.

Copilot uses AI. Check for mistakes.
)
except Exception as e:
print(f"Error saving conversation: {e}")

return StreamingResponse(event_generator(), media_type="application/json")

Expand Down
63 changes: 63 additions & 0 deletions backend/file_handler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import os
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The os import is unused. Consider removing it:

from pathlib import Path
from typing import Optional
import aiofiles
Suggested change
import os

Copilot uses AI. Check for mistakes.
from pathlib import Path
from typing import Optional
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Optional import is unused. Consider removing it.

Suggested change
from typing import Optional

Copilot uses AI. Check for mistakes.
import aiofiles
from fastapi import UploadFile
import pypdf
import docx

UPLOAD_DIR = Path("uploads")
UPLOAD_DIR.mkdir(exist_ok=True)

ALLOWED_EXTENSIONS = {".pdf", ".txt", ".md", ".docx", ".csv", ".html"}


async def save_uploaded_file(file: UploadFile) -> dict:
"""Save uploaded file and extract text content."""
ext = Path(file.filename).suffix.lower()

if ext not in ALLOWED_EXTENSIONS:
raise ValueError(f"File type {ext} not supported. Allowed: {', '.join(ALLOWED_EXTENSIONS)}")

file_path = UPLOAD_DIR / file.filename
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential path traversal vulnerability: file.filename is used directly without sanitization. A malicious user could provide a filename like ../../etc/passwd to write files outside the upload directory. Consider sanitizing the filename to prevent directory traversal attacks:

from pathlib import Path

# Sanitize filename to prevent path traversal
safe_filename = Path(file.filename).name
file_path = UPLOAD_DIR / safe_filename

Copilot uses AI. Check for mistakes.

async with aiofiles.open(file_path, "wb") as f:
content = await file.read()
await f.write(content)

text_content = await extract_text(file_path, ext)

return {
"filename": file.filename,
"path": str(file_path),
"content": text_content,
"size": len(content)
}


async def extract_text(file_path: Path, ext: str) -> str:
"""Extract text from various file formats."""

if ext in [".txt", ".md", ".html"]:
async with aiofiles.open(file_path, "r", encoding="utf-8") as f:
return await f.read()

elif ext == ".pdf":
text = ""
with open(file_path, "rb") as f:
reader = pypdf.PdfReader(f)
for page in reader.pages:
page_text = page.extract_text()
if page_text:
text += page_text + "\n"
return text

elif ext == ".docx":
doc = docx.Document(file_path)
return "\n".join([para.text for para in doc.paragraphs])

elif ext == ".csv":
async with aiofiles.open(file_path, "r", encoding="utf-8") as f:
return await f.read()

return ""
Loading