Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions example_integrations/tavily/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Web Research Agent with Tavily

A voice agent that searches the web and extracts page content using the Tavily API, then synthesizes results into conversational responses. Uses Tavily's `fast` search depth for low-latency voice interactions and `extract` for deep-diving into specific pages.

## Setup

### Prerequisites

- [OpenAI API key](https://platform.openai.com/api-keys)
- [Tavily API key](https://app.tavily.com/home)

### Environment Variables

Create a `.env` file:

```bash
OPENAI_API_KEY=your-openai-key
TAVILY_API_KEY=your-tavily-key
```

### Installation

```bash
uv sync
```

## Running

```bash
python main.py
```

Then connect:

```bash
cartesia chat 8000
```

## How It Works

Everything is in `main.py`:

1. **`web_search`** - A `@loopback_tool` that calls Tavily Search and returns formatted results to the LLM
2. **`web_extract`** - A `@loopback_tool` that extracts the full content of a webpage by URL, useful for deep-diving into a promising search result
3. **`get_agent`** - Creates an `LlmAgent` with both tools and a voice-optimized system prompt
4. **`VoiceAgentApp`** - Handles the voice connection

Both tools use Tavily's `AsyncTavilyClient`, which provides native async support and automatically reads `TAVILY_API_KEY` from the environment.

## Configuration

### Tavily Search Parameters

The `web_search` tool calls `AsyncTavilyClient.search()` with these defaults:

```python
response = await client.search(
query=query,
search_depth="fast",
max_results=5,
)
```

#### Search Depth

| Depth | Latency | Content Type | Cost | Best For |
|-------|---------|--------------|------|----------|
| `ultra-fast` | Lowest | NLP summary per URL | 1 credit | Voice agents, real-time chat |
| `fast` | Low | Reranked chunks per URL | 1 credit | Chunk-based results with low latency |
| `basic` | Medium | NLP summary per URL | 1 credit | General-purpose search |
| `advanced` | Higher | Reranked chunks per URL | 2 credits | Precision-critical queries |

#### Additional Parameters

You can extend the `web_search` tool with Tavily features like:

- **`topic`** - `"general"`, `"news"`, or `"finance"` to focus results
- **`time_range`** - `"day"`, `"week"`, `"month"`, or `"year"` for recency filtering
- **`include_domains`** / **`exclude_domains`** - restrict or block specific sources
- **`include_answer`** - `"basic"` or `"advanced"` to get an LLM-generated answer alongside results
- **`country`** - boost results from a specific country (available for `"general"` topic)

See the [Tavily Search API docs](https://docs.tavily.com/documentation/api-reference/endpoint/search) and the [Python SDK reference](https://docs.tavily.com/sdk/python/reference) for the full parameter list.

### Tavily Extract Parameters

The `web_extract` tool calls `AsyncTavilyClient.extract()` with minimal defaults:

```python
response = await client.extract(urls=[url])
```

Extracted content is truncated to 3000 characters to keep LLM context manageable. You can adjust this in `main.py` or add parameters like:

- **`extract_depth`** - `"basic"` (default) or `"advanced"` for tables and embedded content
- **`format`** - `"markdown"` (default) or `"text"` for plain text

See the [Tavily Extract API docs](https://docs.tavily.com/documentation/api-reference/endpoint/extract) for more options.

### LLM Configuration

```python
LlmConfig(
system_prompt=SYSTEM_PROMPT,
introduction=INTRODUCTION,
max_tokens=600,
temperature=0.7,
)
```
Empty file.
162 changes: 162 additions & 0 deletions example_integrations/tavily/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
"""Web Research Agent with Tavily and Cartesia Line SDK."""

from datetime import datetime
import os
from typing import Annotated

from loguru import logger
from tavily import AsyncTavilyClient

from line.llm_agent import LlmAgent, LlmConfig, ToolEnv, end_call, loopback_tool
from line.voice_agent_app import AgentEnv, CallRequest, VoiceAgentApp

today = datetime.now().strftime("%Y-%m-%d")

SYSTEM_PROMPT = f"""Today is {today}. You are a sharp, fast research assistant on a live voice call.
Comment thread
lakshyaag-tavily marked this conversation as resolved.
Outdated

You have two web tools powered by Tavily:

1. web_search — Find relevant pages across the web. Use for questions about current events, \
facts, prices, people, or anything that needs fresh data. Start here for most questions.

2. web_extract — Pull full content from a specific URL. Use when a search snippet is too \
thin to answer confidently, or when the user mentions a specific link they want you to read.

Your workflow: search first, scan the snippets. If you can answer from snippets alone, do it \
immediately. If a result looks right but you need more detail, extract that page and then answer. \
Don't extract unless you need to.

When answering:
- Lead with the answer, not the preamble. No "Great question" or "Let me look that up."
- Keep it to two or three sentences unless the user asks you to go deeper.
- Name your source naturally when it matters. "According to Reuters" beats rattling off URLs.
- If results conflict or seem stale, say so. Don't fake confidence.
- If you genuinely can't find it, say that and suggest how the user could refine.

Use end_call when the user wraps up.

CRITICAL: This is a voice call. Speak in plain, natural sentences only. No markdown, no bullet \
points, no numbered lists, no asterisks, no dashes, no special characters of any kind."""

INTRODUCTION = (
"Hey! I'm your research assistant, powered by Tavily and Cartesia. "
"Ask me anything and I'll dig it up live. What do you want to know?"
)

MAX_OUTPUT_TOKENS = 600
TEMPERATURE = 0.7


@loopback_tool
async def web_search(
ctx: ToolEnv,
query: Annotated[
str,
"The search query. Be specific and include key terms.",
],
time_range: Annotated[
str,
"The time range to search for. Use 'day', 'week', 'month', or 'year'.",
] = "month",
Comment thread
lakshyaag-tavily marked this conversation as resolved.
Outdated
) -> str:
"""Search the web for current information.
Use when you need up-to-date facts, news, or any information that requires factual accuracy."""
logger.info(f"Performing Tavily web search: '{query}'")

api_key = os.environ.get("TAVILY_API_KEY")
if not api_key:
return "Web search failed: TAVILY_API_KEY not set."

try:
client = AsyncTavilyClient(api_key=api_key, client_source="cartesia-line-agent")
Comment thread
lakshyaag-tavily marked this conversation as resolved.
Outdated
response = await client.search(
query=query,
time_range=time_range,
search_depth="fast",
max_results=5,
)

results = response.get("results", [])
if not results:
return "No relevant information found."

# Format results for LLM
content_parts = [f"Search Results for: '{query}'\n"]
for i, result in enumerate(results):
score = result.get("score", 0)
content_parts.append(f"\n--- Source {i + 1}: {result['title']} (relevance: {score:.2f}) ---\n")
if result.get("content"):
content_parts.append(f"{result['content']}\n")
content_parts.append(f"URL: {result['url']}\n")

response_time = response.get("response_time", 0)
logger.info(f"Search completed: {len(results)} sources found in {response_time:.2f}s")
return "".join(content_parts)

except Exception as e:
logger.error(f"Tavily search failed: {e}")
return f"Web search failed: {e}"


@loopback_tool
async def web_extract(
ctx: ToolEnv,
url: Annotated[
str,
"The URL to extract content from.",
],
) -> str:
"""Extract the full content of a webpage given its URL.
Use when you need detailed information from a specific page found via web_search."""
logger.info(f"Extracting content from: '{url}'")

api_key = os.environ.get("TAVILY_API_KEY")
if not api_key:
return "Content extraction failed: TAVILY_API_KEY not set."

try:
client = AsyncTavilyClient(api_key=api_key, client_source="cartesia-line-agent")
response = await client.extract(urls=[url])

results = response.get("results", [])
if not results:
failed = response.get("failed_results", [])
if failed:
return f"Extraction failed for {url}: {failed[0].get('error', 'unknown error')}"
return "No content could be extracted from that URL."

extracted = results[0]
raw_content = extracted.get("raw_content", "")
if not raw_content:
return "The page was reached but no readable content was found."

max_chars = 3000
if len(raw_content) > max_chars:
raw_content = raw_content[:max_chars] + "\n\n[Content truncated]"

logger.info(f"Extraction completed: {len(raw_content)} characters from {url}")
Comment thread
lakshyaag-tavily marked this conversation as resolved.
Outdated
return f"Extracted content from {url}:\n\n{raw_content}"

except Exception as e:
logger.error(f"Tavily extract failed: {e}")
return f"Content extraction failed: {e}"


async def get_agent(env: AgentEnv, call_request: CallRequest):
return LlmAgent(
model="openai/gpt-4o-mini",
api_key=os.getenv("OPENAI_API_KEY"),
tools=[web_search, web_extract, end_call],
config=LlmConfig(
system_prompt=SYSTEM_PROMPT,
introduction=INTRODUCTION,
max_tokens=MAX_OUTPUT_TOKENS,
temperature=TEMPERATURE,
),
)


app = VoiceAgentApp(get_agent=get_agent)

if __name__ == "__main__":
app.run()
14 changes: 14 additions & 0 deletions example_integrations/tavily/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
[project]
name = "tavily-web-search"
version = "0.1.0"
description = "A web research voice agent using Tavily API and Cartesia Line SDK"
requires-python = ">=3.10"
dependencies = [
"cartesia-line>=0.2.7",
"tavily-python>=0.7.23",
"loguru>=0.7.3",
"python-dotenv>=1.2.2",
]

[project.scripts]
tavily-search = "main:app.run"