-
Notifications
You must be signed in to change notification settings - Fork 40
Tavily integration example #197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
akavi
merged 6 commits into
cartesia-ai:main
from
lakshyaag-tavily:feat/tavily-sdk-example
Apr 20, 2026
+285
−0
Merged
Changes from 1 commit
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
a506ef5
Add initial implementation of Tavily web research voice agent
lakshyaag-tavily ab8c736
fixes
lakshyaag-tavily 06eed57
Merge branch 'main' into feat/tavily-sdk-example
lakshyaag-tavily 2e87093
Merge branch 'main' into feat/tavily-sdk-example
lakshyaag-tavily a8a76fa
Merge branch 'main' into feat/tavily-sdk-example
lakshyaag-tavily f69ff78
Refactor Tavily integration to use a dedicated TavilyTools class for …
lakshyaag-tavily File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,109 @@ | ||
| # Web Research Agent with Tavily | ||
|
|
||
| A voice agent that searches the web and extracts page content using the Tavily API, then synthesizes results into conversational responses. Uses Tavily's `fast` search depth for low-latency voice interactions and `extract` for deep-diving into specific pages. | ||
|
|
||
| ## Setup | ||
|
|
||
| ### Prerequisites | ||
|
|
||
| - [OpenAI API key](https://platform.openai.com/api-keys) | ||
| - [Tavily API key](https://app.tavily.com/home) | ||
|
|
||
| ### Environment Variables | ||
|
|
||
| Create a `.env` file: | ||
|
|
||
| ```bash | ||
| OPENAI_API_KEY=your-openai-key | ||
| TAVILY_API_KEY=your-tavily-key | ||
| ``` | ||
|
|
||
| ### Installation | ||
|
|
||
| ```bash | ||
| uv sync | ||
| ``` | ||
|
|
||
| ## Running | ||
|
|
||
| ```bash | ||
| python main.py | ||
| ``` | ||
|
|
||
| Then connect: | ||
|
|
||
| ```bash | ||
| cartesia chat 8000 | ||
| ``` | ||
|
|
||
| ## How It Works | ||
|
|
||
| Everything is in `main.py`: | ||
|
|
||
| 1. **`web_search`** - A `@loopback_tool` that calls Tavily Search and returns formatted results to the LLM | ||
| 2. **`web_extract`** - A `@loopback_tool` that extracts the full content of a webpage by URL, useful for deep-diving into a promising search result | ||
| 3. **`get_agent`** - Creates an `LlmAgent` with both tools and a voice-optimized system prompt | ||
| 4. **`VoiceAgentApp`** - Handles the voice connection | ||
|
|
||
| Both tools use Tavily's `AsyncTavilyClient`, which provides native async support and automatically reads `TAVILY_API_KEY` from the environment. | ||
|
|
||
| ## Configuration | ||
|
|
||
| ### Tavily Search Parameters | ||
|
|
||
| The `web_search` tool calls `AsyncTavilyClient.search()` with these defaults: | ||
|
|
||
| ```python | ||
| response = await client.search( | ||
| query=query, | ||
| search_depth="fast", | ||
| max_results=5, | ||
| ) | ||
| ``` | ||
|
|
||
| #### Search Depth | ||
|
|
||
| | Depth | Latency | Content Type | Cost | Best For | | ||
| |-------|---------|--------------|------|----------| | ||
| | `ultra-fast` | Lowest | NLP summary per URL | 1 credit | Voice agents, real-time chat | | ||
| | `fast` | Low | Reranked chunks per URL | 1 credit | Chunk-based results with low latency | | ||
| | `basic` | Medium | NLP summary per URL | 1 credit | General-purpose search | | ||
| | `advanced` | Higher | Reranked chunks per URL | 2 credits | Precision-critical queries | | ||
|
|
||
| #### Additional Parameters | ||
|
|
||
| You can extend the `web_search` tool with Tavily features like: | ||
|
|
||
| - **`topic`** - `"general"`, `"news"`, or `"finance"` to focus results | ||
| - **`time_range`** - `"day"`, `"week"`, `"month"`, or `"year"` for recency filtering | ||
| - **`include_domains`** / **`exclude_domains`** - restrict or block specific sources | ||
| - **`include_answer`** - `"basic"` or `"advanced"` to get an LLM-generated answer alongside results | ||
| - **`country`** - boost results from a specific country (available for `"general"` topic) | ||
|
|
||
| See the [Tavily Search API docs](https://docs.tavily.com/documentation/api-reference/endpoint/search) and the [Python SDK reference](https://docs.tavily.com/sdk/python/reference) for the full parameter list. | ||
|
|
||
| ### Tavily Extract Parameters | ||
|
|
||
| The `web_extract` tool calls `AsyncTavilyClient.extract()` with minimal defaults: | ||
|
|
||
| ```python | ||
| response = await client.extract(urls=[url]) | ||
| ``` | ||
|
|
||
| Extracted content is truncated to 3000 characters to keep LLM context manageable. You can adjust this in `main.py` or add parameters like: | ||
|
|
||
| - **`extract_depth`** - `"basic"` (default) or `"advanced"` for tables and embedded content | ||
| - **`format`** - `"markdown"` (default) or `"text"` for plain text | ||
|
|
||
| See the [Tavily Extract API docs](https://docs.tavily.com/documentation/api-reference/endpoint/extract) for more options. | ||
|
|
||
| ### LLM Configuration | ||
|
|
||
| ```python | ||
| LlmConfig( | ||
| system_prompt=SYSTEM_PROMPT, | ||
| introduction=INTRODUCTION, | ||
| max_tokens=600, | ||
| temperature=0.7, | ||
| ) | ||
| ``` |
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,162 @@ | ||
| """Web Research Agent with Tavily and Cartesia Line SDK.""" | ||
|
|
||
| from datetime import datetime | ||
| import os | ||
| from typing import Annotated | ||
|
|
||
| from loguru import logger | ||
| from tavily import AsyncTavilyClient | ||
|
|
||
| from line.llm_agent import LlmAgent, LlmConfig, ToolEnv, end_call, loopback_tool | ||
| from line.voice_agent_app import AgentEnv, CallRequest, VoiceAgentApp | ||
|
|
||
| today = datetime.now().strftime("%Y-%m-%d") | ||
|
|
||
| SYSTEM_PROMPT = f"""Today is {today}. You are a sharp, fast research assistant on a live voice call. | ||
|
|
||
| You have two web tools powered by Tavily: | ||
|
|
||
| 1. web_search — Find relevant pages across the web. Use for questions about current events, \ | ||
| facts, prices, people, or anything that needs fresh data. Start here for most questions. | ||
|
|
||
| 2. web_extract — Pull full content from a specific URL. Use when a search snippet is too \ | ||
| thin to answer confidently, or when the user mentions a specific link they want you to read. | ||
|
|
||
| Your workflow: search first, scan the snippets. If you can answer from snippets alone, do it \ | ||
| immediately. If a result looks right but you need more detail, extract that page and then answer. \ | ||
| Don't extract unless you need to. | ||
|
|
||
| When answering: | ||
| - Lead with the answer, not the preamble. No "Great question" or "Let me look that up." | ||
| - Keep it to two or three sentences unless the user asks you to go deeper. | ||
| - Name your source naturally when it matters. "According to Reuters" beats rattling off URLs. | ||
| - If results conflict or seem stale, say so. Don't fake confidence. | ||
| - If you genuinely can't find it, say that and suggest how the user could refine. | ||
|
|
||
| Use end_call when the user wraps up. | ||
|
|
||
| CRITICAL: This is a voice call. Speak in plain, natural sentences only. No markdown, no bullet \ | ||
| points, no numbered lists, no asterisks, no dashes, no special characters of any kind.""" | ||
|
|
||
| INTRODUCTION = ( | ||
| "Hey! I'm your research assistant, powered by Tavily and Cartesia. " | ||
| "Ask me anything and I'll dig it up live. What do you want to know?" | ||
| ) | ||
|
|
||
| MAX_OUTPUT_TOKENS = 600 | ||
| TEMPERATURE = 0.7 | ||
|
|
||
|
|
||
| @loopback_tool | ||
| async def web_search( | ||
| ctx: ToolEnv, | ||
| query: Annotated[ | ||
| str, | ||
| "The search query. Be specific and include key terms.", | ||
| ], | ||
| time_range: Annotated[ | ||
| str, | ||
| "The time range to search for. Use 'day', 'week', 'month', or 'year'.", | ||
| ] = "month", | ||
|
lakshyaag-tavily marked this conversation as resolved.
Outdated
|
||
| ) -> str: | ||
| """Search the web for current information. | ||
| Use when you need up-to-date facts, news, or any information that requires factual accuracy.""" | ||
| logger.info(f"Performing Tavily web search: '{query}'") | ||
|
|
||
| api_key = os.environ.get("TAVILY_API_KEY") | ||
| if not api_key: | ||
| return "Web search failed: TAVILY_API_KEY not set." | ||
|
|
||
| try: | ||
| client = AsyncTavilyClient(api_key=api_key, client_source="cartesia-line-agent") | ||
|
lakshyaag-tavily marked this conversation as resolved.
Outdated
|
||
| response = await client.search( | ||
| query=query, | ||
| time_range=time_range, | ||
| search_depth="fast", | ||
| max_results=5, | ||
| ) | ||
|
|
||
| results = response.get("results", []) | ||
| if not results: | ||
| return "No relevant information found." | ||
|
|
||
| # Format results for LLM | ||
| content_parts = [f"Search Results for: '{query}'\n"] | ||
| for i, result in enumerate(results): | ||
| score = result.get("score", 0) | ||
| content_parts.append(f"\n--- Source {i + 1}: {result['title']} (relevance: {score:.2f}) ---\n") | ||
| if result.get("content"): | ||
| content_parts.append(f"{result['content']}\n") | ||
| content_parts.append(f"URL: {result['url']}\n") | ||
|
|
||
| response_time = response.get("response_time", 0) | ||
| logger.info(f"Search completed: {len(results)} sources found in {response_time:.2f}s") | ||
| return "".join(content_parts) | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Tavily search failed: {e}") | ||
| return f"Web search failed: {e}" | ||
|
|
||
|
|
||
| @loopback_tool | ||
| async def web_extract( | ||
| ctx: ToolEnv, | ||
| url: Annotated[ | ||
| str, | ||
| "The URL to extract content from.", | ||
| ], | ||
| ) -> str: | ||
| """Extract the full content of a webpage given its URL. | ||
| Use when you need detailed information from a specific page found via web_search.""" | ||
| logger.info(f"Extracting content from: '{url}'") | ||
|
|
||
| api_key = os.environ.get("TAVILY_API_KEY") | ||
| if not api_key: | ||
| return "Content extraction failed: TAVILY_API_KEY not set." | ||
|
|
||
| try: | ||
| client = AsyncTavilyClient(api_key=api_key, client_source="cartesia-line-agent") | ||
| response = await client.extract(urls=[url]) | ||
|
|
||
| results = response.get("results", []) | ||
| if not results: | ||
| failed = response.get("failed_results", []) | ||
| if failed: | ||
| return f"Extraction failed for {url}: {failed[0].get('error', 'unknown error')}" | ||
| return "No content could be extracted from that URL." | ||
|
|
||
| extracted = results[0] | ||
| raw_content = extracted.get("raw_content", "") | ||
| if not raw_content: | ||
| return "The page was reached but no readable content was found." | ||
|
|
||
| max_chars = 3000 | ||
| if len(raw_content) > max_chars: | ||
| raw_content = raw_content[:max_chars] + "\n\n[Content truncated]" | ||
|
|
||
| logger.info(f"Extraction completed: {len(raw_content)} characters from {url}") | ||
|
lakshyaag-tavily marked this conversation as resolved.
Outdated
|
||
| return f"Extracted content from {url}:\n\n{raw_content}" | ||
|
|
||
| except Exception as e: | ||
| logger.error(f"Tavily extract failed: {e}") | ||
| return f"Content extraction failed: {e}" | ||
|
|
||
|
|
||
| async def get_agent(env: AgentEnv, call_request: CallRequest): | ||
| return LlmAgent( | ||
| model="openai/gpt-4o-mini", | ||
| api_key=os.getenv("OPENAI_API_KEY"), | ||
| tools=[web_search, web_extract, end_call], | ||
| config=LlmConfig( | ||
| system_prompt=SYSTEM_PROMPT, | ||
| introduction=INTRODUCTION, | ||
| max_tokens=MAX_OUTPUT_TOKENS, | ||
| temperature=TEMPERATURE, | ||
| ), | ||
| ) | ||
|
|
||
|
|
||
| app = VoiceAgentApp(get_agent=get_agent) | ||
|
|
||
| if __name__ == "__main__": | ||
| app.run() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| [project] | ||
| name = "tavily-web-search" | ||
| version = "0.1.0" | ||
| description = "A web research voice agent using Tavily API and Cartesia Line SDK" | ||
| requires-python = ">=3.10" | ||
| dependencies = [ | ||
| "cartesia-line>=0.2.7", | ||
| "tavily-python>=0.7.23", | ||
| "loguru>=0.7.3", | ||
| "python-dotenv>=1.2.2", | ||
| ] | ||
|
|
||
| [project.scripts] | ||
| tavily-search = "main:app.run" |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.