Skip to content

Latest commit

 

History

History
280 lines (223 loc) · 11.5 KB

File metadata and controls

280 lines (223 loc) · 11.5 KB

ArionTalk - Voice AI Agent for Any Website

ArionTalk Logo

TypeScript Node.js Gemini Live Chrome 139+ License: MIT

A voice AI agent that understands your webpage — reads your content, sees your images, and highlights what it's talking about. Powered by Gemini Live.

What It Does

ArionTalk adds a voice assistant to any website with a single HTML tag. Visitors speak naturally, and the AI responds with voice while scrolling to and highlighting the exact content being discussed. It works with two engines: Gemini Live (cloud, multimodal, 12 languages) and a Local engine (offline, on-device, privacy-first).

Key Features

  • Page Understanding — Automatically extracts text, images, and structure from any webpage. The AI knows what's on the page before you even ask.
  • Interactive Highlights — As the AI discusses content, it scrolls to and highlights the exact section or image — powered by Gemini function calling.
  • Natural Voice with Barge-in — Talk naturally, interrupt anytime. The AI stops, listens, and adapts — just like a real conversation.
  • Offline Mode — The local engine runs entirely on-device via Gemini Nano. No server, no API keys, no internet required.

Packages

Package Description
@ariontalk/core Headless voice engine — services, types, and session logic with no UI dependency
@ariontalk/widget Drop-in Web Component that wraps @ariontalk/core with a ready-made UI
@ariontalk/engine-gemini Cloud engine add-on using Gemini Live API for real-time voice conversations
@ariontalk/token-server Lightweight Hono server that issues ephemeral Gemini API tokens
@ariontalk/plugin-silero-vad Silero VAD plugin for AI-powered barge-in detection

Quick Start

Gemini Live Engine (recommended)

The fastest path is the ArionTalk cloud service. Register your site at ariontalk.com to get a site key — no server to run.

<ariontalk-widget
  site-key="YOUR_SITE_KEY"
  interactive-highlights
></ariontalk-widget>
<script
  type="module"
  src="https://cdn.jsdelivr.net/npm/@ariontalk/widget@latest/dist/ariontalk.js"
  async
></script>

When site-key is set, engine="gemini" resolves automatically and the widget points at the cloud service.

Self-hosted alternative. If you'd rather run your own token server, use the token-server attribute and start the included server:

<ariontalk-widget
  engine="gemini"
  token-server="http://localhost:3001/api/token"
  interactive-highlights
  settings
></ariontalk-widget>
cd packages/token-server
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY (get one at https://aistudio.google.com/apikey)
pnpm dev

The token server runs on http://localhost:3001 and issues ephemeral tokens so your API key is never exposed to the browser.

Local Engine (offline)

For a fully offline experience with no server required:

<ariontalk-widget></ariontalk-widget>
<script
  type="module"
  src="https://cdn.jsdelivr.net/npm/@ariontalk/widget@latest/dist/ariontalk.js"
  async
></script>

Requires Chrome 139+ with the Prompt API origin trial enabled.

Configuration

Attribute Type Default Description
site-key string "" Site key for the ArionTalk cloud service. When set, engine defaults to "gemini" and service-url is defaulted automatically.
service-url string auto Base URL for the ArionTalk cloud service. Defaulted automatically when site-key is present.
engine string "local" Engine type: "local" (on-device) or "gemini" (cloud)
token-server string "" URL of a self-hosted token server. Kept for back-compat; prefer site-key for the cloud service.
lang string "auto" BCP-47 language code, or "auto" to detect from <html lang>.
label string "Voice Chat" Text shown on the floating action button.
variant string "default" FAB size variant: "default" or "compact".
icon string "mic" FAB icon: "mic" or "wave".
interactive-highlights boolean false Enable real-time content highlighting during Gemini conversations
gemini-voice string "" Gemini voice name (Kore, Puck, Charon, Aoede, Fenrir, Leda, Orus, Zephyr)
gemini-model string "" Gemini model identifier
position string "bottom-right" Widget position ("bottom-right" or "bottom-left")
theme string "light" Color theme ("light" or "dark")
settings boolean false Show settings gear icon for pre-session configuration
force boolean false Skip browser support check and always show the widget
log-level string "disabled" Console logging: "disabled", "error", "warning", "info", "debug"

Architecture

ariontalk/
├── packages/
│   ├── core/                        # @ariontalk/core
│   │   └── src/
│   │       ├── engine/
│   │       │   └── voice-engine.ts          # Local voice session orchestrator
│   │       ├── services/
│   │       │   ├── page-extractor.ts        # Page content extraction (text + images)
│   │       │   ├── page-indexer.ts          # Annotated page index for interactive highlights
│   │       │   ├── speech-recognition.ts    # WebSpeech Recognition wrapper
│   │       │   ├── speech-synthesis.ts      # WebSpeech Synthesis wrapper
│   │       │   ├── ai-session.ts            # Prompt API (Gemini Nano) wrapper
│   │       │   └── barge-in-detector.ts     # Energy-based interruption detection
│   │       └── utils/
│   │           ├── browser-support.ts       # Feature detection
│   │           └── timer.ts                 # Session timer
│   ├── engine-gemini/               # @ariontalk/engine-gemini
│   │   └── src/
│   │       ├── gemini-engine.ts             # Gemini Live WebSocket engine
│   │       ├── audio/
│   │       │   ├── audio-capture.ts         # Mic capture at 16kHz PCM
│   │       │   └── audio-playback.ts        # Web Audio playback with worklet
│   │       ├── highlights/
│   │       │   └── highlight-manager.ts     # Scroll + highlight via function calling
│   │       └── session/
│   │           └── token-manager.ts         # Ephemeral token lifecycle
│   ├── token-server/                # @ariontalk/token-server
│   │   └── src/
│   │       ├── app.ts                       # Hono app (exported for tests)
│   │       ├── index.ts                     # Node server entry point
│   │       └── prompts/
│   │           └── voice-assistant.md       # System instruction template
│   ├── widget/                      # @ariontalk/widget
│   │   └── src/
│   │       ├── components/
│   │       │   ├── widget-root.ts           # Root component: FAB + session + minimize
│   │       │   ├── widget-fab.ts            # Floating action button (label/variant/icon)
│   │       │   ├── widget-session.ts        # Expanded session panel
│   │       │   ├── widget-minimized.ts      # Collapsed status-aware indicator
│   │       │   └── widget-voice-settings.ts # Settings UI
│   │       └── controllers/
│   │           └── voice-session.controller.ts  # Engine lifecycle management
│   └── plugin-silero-vad/           # @ariontalk/plugin-silero-vad
│       └── src/
│           └── silero-vad-detector.ts       # AI-powered voice activity detection
├── demo/                            # Demo pages with widget examples
├── docs/                            # Documentation content (MDX)
├── docs-site/                       # Astro + Starlight shell for docs/
├── pnpm-workspace.yaml
└── package.json

Development

Prerequisites

  • Node.js 20+
  • pnpm 9+
  • Gemini API key (for Gemini engine — get one at Google AI Studio)

Setup

# Clone and install
git clone https://github.com/luixaviles/ariontalk.git
cd ariontalk
pnpm install
pnpm build

# Configure environment
cp packages/token-server/.env.example packages/token-server/.env
# Edit packages/token-server/.env and add your GEMINI_API_KEY

Run with Gemini Live

# Terminal 1: Start the token server
pnpm token-server
# Runs on http://localhost:3001

# Terminal 2: Start the demo
pnpm demo
# Opens at http://localhost:5173

Other Commands

# Run the docs site locally
pnpm docs

# Build all packages
pnpm build

# Run tests
pnpm test

Deployment

Token Server on Google Cloud Run

One-time GCP setup:

gcloud auth login
GCP_PROJECT_ID=your-project ./scripts/setup-gcp.sh

This enables required APIs, creates an Artifact Registry repository, and stores your GEMINI_API_KEY in Secret Manager.

Save your project ID so the deploy script picks it up automatically:

cp .env.example .env
# Edit .env and set GCP_PROJECT_ID

Deploy:

pnpm deploy-token-server

Builds a Docker image, pushes to Artifact Registry, and deploys to Cloud Run. The GEMINI_API_KEY is pulled from Secret Manager at runtime — never passed in plain text.

You can also pass the project ID inline: GCP_PROJECT_ID=your-project pnpm deploy-token-server.

Tech Stack

  • Gemini Live API — Real-time multimodal voice streaming with function calling
  • Lit — Web Components library
  • TypeScript — Type safety across all packages
  • Hono — Lightweight HTTP server for token endpoint
  • Google Cloud Run — Serverless deployment for token server
  • Web Audio API — Low-latency audio capture and playback
  • Chrome Built-in APIs — WebSpeech, Prompt API (Gemini Nano) for local engine

Browser Support

Engine Browser Notes
Gemini Live Any modern browser Requires WebSocket + microphone access
Local Chrome 139+ Requires Prompt API origin trial

License

MIT