Skip to content

luixaviles/ariontalk

Repository files navigation

ArionTalk - Voice AI Agent for Any Website

ArionTalk Logo

TypeScript Node.js Gemini Live Chrome 139+ License: MIT

A voice AI agent that understands your webpage — reads your content, sees your images, and highlights what it's talking about. Powered by Gemini Live.

What It Does

ArionTalk adds a voice assistant to any website with a single HTML tag. Visitors speak naturally, and the AI responds with voice while scrolling to and highlighting the exact content being discussed. It works with two engines: Gemini Live (cloud, multimodal, 12 languages) and a Local engine (offline, on-device, privacy-first).

Key Features

  • Page Understanding — Automatically extracts text, images, and structure from any webpage. The AI knows what's on the page before you even ask.
  • Interactive Highlights — As the AI discusses content, it scrolls to and highlights the exact section or image — powered by Gemini function calling.
  • Natural Voice with Barge-in — Talk naturally, interrupt anytime. The AI stops, listens, and adapts — just like a real conversation.
  • Offline Mode — The local engine runs entirely on-device via Gemini Nano. No server, no API keys, no internet required.

Packages

Package Description
@ariontalk/core Headless voice engine — services, types, and session logic with no UI dependency
@ariontalk/widget Drop-in Web Component that wraps @ariontalk/core with a ready-made UI
@ariontalk/engine-gemini Cloud engine add-on using Gemini Live API for real-time voice conversations
@ariontalk/token-server Lightweight Hono server that issues ephemeral Gemini API tokens
@ariontalk/plugin-silero-vad Silero VAD plugin for AI-powered barge-in detection

Quick Start

Gemini Live Engine (recommended)

The fastest path is the ArionTalk cloud service. Register your site at ariontalk.com to get a site key — no server to run.

<ariontalk-widget
  site-key="YOUR_SITE_KEY"
  interactive-highlights
></ariontalk-widget>
<script
  type="module"
  src="https://cdn.jsdelivr.net/npm/@ariontalk/widget@latest/dist/ariontalk.js"
  async
></script>

When site-key is set, engine="gemini" resolves automatically and the widget points at the cloud service.

Self-hosted alternative. If you'd rather run your own token server, use the token-server attribute and start the included server:

<ariontalk-widget
  engine="gemini"
  token-server="http://localhost:3001/api/token"
  interactive-highlights
  settings
></ariontalk-widget>
cd packages/token-server
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY (get one at https://aistudio.google.com/apikey)
pnpm dev

The token server runs on http://localhost:3001 and issues ephemeral tokens so your API key is never exposed to the browser.

Local Engine (offline)

For a fully offline experience with no server required:

<ariontalk-widget></ariontalk-widget>
<script
  type="module"
  src="https://cdn.jsdelivr.net/npm/@ariontalk/widget@latest/dist/ariontalk.js"
  async
></script>

Requires Chrome 139+ with the Prompt API origin trial enabled.

Configuration

Attribute Type Default Description
site-key string "" Site key for the ArionTalk cloud service. When set, engine defaults to "gemini" and service-url is defaulted automatically.
service-url string auto Base URL for the ArionTalk cloud service. Defaulted automatically when site-key is present.
engine string "local" Engine type: "local" (on-device) or "gemini" (cloud)
token-server string "" URL of a self-hosted token server. Kept for back-compat; prefer site-key for the cloud service.
lang string "auto" BCP-47 language code, or "auto" to detect from <html lang>.
label string "Voice Chat" Text shown on the floating action button.
variant string "default" FAB size variant: "default" or "compact".
icon string "mic" FAB icon: "mic" or "wave".
interactive-highlights boolean false Enable real-time content highlighting during Gemini conversations
gemini-voice string "" Gemini voice name (Kore, Puck, Charon, Aoede, Fenrir, Leda, Orus, Zephyr)
gemini-model string "" Gemini model identifier
position string "bottom-right" Widget position ("bottom-right" or "bottom-left")
theme string "light" Color theme ("light" or "dark")
settings boolean false Show settings gear icon for pre-session configuration
force boolean false Skip browser support check and always show the widget
log-level string "disabled" Console logging: "disabled", "error", "warning", "info", "debug"

Architecture

ariontalk/
├── packages/
│   ├── core/                        # @ariontalk/core
│   │   └── src/
│   │       ├── engine/
│   │       │   └── voice-engine.ts          # Local voice session orchestrator
│   │       ├── services/
│   │       │   ├── page-extractor.ts        # Page content extraction (text + images)
│   │       │   ├── page-indexer.ts          # Annotated page index for interactive highlights
│   │       │   ├── speech-recognition.ts    # WebSpeech Recognition wrapper
│   │       │   ├── speech-synthesis.ts      # WebSpeech Synthesis wrapper
│   │       │   ├── ai-session.ts            # Prompt API (Gemini Nano) wrapper
│   │       │   └── barge-in-detector.ts     # Energy-based interruption detection
│   │       └── utils/
│   │           ├── browser-support.ts       # Feature detection
│   │           └── timer.ts                 # Session timer
│   ├── engine-gemini/               # @ariontalk/engine-gemini
│   │   └── src/
│   │       ├── gemini-engine.ts             # Gemini Live WebSocket engine
│   │       ├── audio/
│   │       │   ├── audio-capture.ts         # Mic capture at 16kHz PCM
│   │       │   └── audio-playback.ts        # Web Audio playback with worklet
│   │       ├── highlights/
│   │       │   └── highlight-manager.ts     # Scroll + highlight via function calling
│   │       └── session/
│   │           └── token-manager.ts         # Ephemeral token lifecycle
│   ├── token-server/                # @ariontalk/token-server
│   │   └── src/
│   │       ├── app.ts                       # Hono app (exported for tests)
│   │       ├── index.ts                     # Node server entry point
│   │       └── prompts/
│   │           └── voice-assistant.md       # System instruction template
│   ├── widget/                      # @ariontalk/widget
│   │   └── src/
│   │       ├── components/
│   │       │   ├── widget-root.ts           # Root component: FAB + session + minimize
│   │       │   ├── widget-fab.ts            # Floating action button (label/variant/icon)
│   │       │   ├── widget-session.ts        # Expanded session panel
│   │       │   ├── widget-minimized.ts      # Collapsed status-aware indicator
│   │       │   └── widget-voice-settings.ts # Settings UI
│   │       └── controllers/
│   │           └── voice-session.controller.ts  # Engine lifecycle management
│   └── plugin-silero-vad/           # @ariontalk/plugin-silero-vad
│       └── src/
│           └── silero-vad-detector.ts       # AI-powered voice activity detection
├── demo/                            # Demo pages with widget examples
├── docs/                            # Documentation content (MDX)
├── docs-site/                       # Astro + Starlight shell for docs/
├── pnpm-workspace.yaml
└── package.json

Development

Prerequisites

  • Node.js 20+
  • pnpm 9+
  • Gemini API key (for Gemini engine — get one at Google AI Studio)

Setup

# Clone and install
git clone https://github.com/luixaviles/ariontalk.git
cd ariontalk
pnpm install
pnpm build

# Configure environment
cp packages/token-server/.env.example packages/token-server/.env
# Edit packages/token-server/.env and add your GEMINI_API_KEY

Run with Gemini Live

# Terminal 1: Start the token server
pnpm token-server
# Runs on http://localhost:3001

# Terminal 2: Start the demo
pnpm demo
# Opens at http://localhost:5173

Other Commands

# Run the docs site locally
pnpm docs

# Build all packages
pnpm build

# Run tests
pnpm test

Deployment

Token Server on Google Cloud Run

One-time GCP setup:

gcloud auth login
GCP_PROJECT_ID=your-project ./scripts/setup-gcp.sh

This enables required APIs, creates an Artifact Registry repository, and stores your GEMINI_API_KEY in Secret Manager.

Save your project ID so the deploy script picks it up automatically:

cp .env.example .env
# Edit .env and set GCP_PROJECT_ID

Deploy:

pnpm deploy-token-server

Builds a Docker image, pushes to Artifact Registry, and deploys to Cloud Run. The GEMINI_API_KEY is pulled from Secret Manager at runtime — never passed in plain text.

You can also pass the project ID inline: GCP_PROJECT_ID=your-project pnpm deploy-token-server.

Tech Stack

  • Gemini Live API — Real-time multimodal voice streaming with function calling
  • Lit — Web Components library
  • TypeScript — Type safety across all packages
  • Hono — Lightweight HTTP server for token endpoint
  • Google Cloud Run — Serverless deployment for token server
  • Web Audio API — Low-latency audio capture and playback
  • Chrome Built-in APIs — WebSpeech, Prompt API (Gemini Nano) for local engine

Browser Support

Engine Browser Notes
Gemini Live Any modern browser Requires WebSocket + microphone access
Local Chrome 139+ Requires Prompt API origin trial

License

MIT

About

ArionTalk lets users talk to any webpage. It listens to the user, understands the page content and images, responds with voice, and scrolls to highlight the exact section it's discussing.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors