Skip to content

Commit 269cae6

Browse files
committed
feat(providers): wire missing media handlers + fix video routing + add CartesiaTTS
Closes the gaps reported in NEUROLINK_SDK_GAPS.md from a downstream consumer (Director). After v9.65.0, several shipped capabilities could not be invoked through nl.generate(...) because: - package.json exports map exposed only 7 subpaths; internal modules were blocked by Node ESM resolution. - Music + Avatar handlers were registered only on ProviderRegistry's lazy path, so consumers importing MusicProcessor / AvatarProcessor directly saw an empty registry. - FishAudioTTS shipped in dist/voice/providers/ but was never registered with TTSProcessor. - Top-level package re-exports surfaced only GoogleTTSHandler; every other handler class was unreachable from import * as NL. - handleVideoGeneration in baseProvider ignored output.video.provider and always called generateVideoWithVertex, so Kling/Runway/Replicate routing was dead. - The Cartesia adapter in dist/ was a streaming WebSocket class, not a TTSHandler, so tts.provider:"cartesia" failed. Changes ------- * package.json: add 11 subpaths (./voice, ./music, ./avatar, ./image-gen, ./hitl, ./evaluation, ./workflow, ./rag, ./files, ./processors, ./processors/*, ./adapters/*). * Module-level auto-registration in voice/music/avatar barrels; every shipped handler whose API key is in process.env registers itself at import time. Idempotent with ProviderRegistry's existing block. * New src/lib/voice/providers/CartesiaTTS.ts: synchronous TTSHandler over Cartesia's /tts/bytes endpoint. CartesiaStream remains for the realtime voice server. * providerRegistry.ts: add fish-audio and cartesia registration blocks alongside the other TTS handlers for architectural consistency. * src/lib/index.ts: re-export every shipped TTS/STT/Realtime/Music/ Avatar/Video handler + ImageGenService + HITLManager + STTProcessor + RealtimeProcessor + registerDefault* helpers. * src/lib/types/{tts,music,avatar,multimodal}.ts: add TTSProviderName, MusicProviderName, AvatarProviderName, VideoProviderName unions with (string & {}) escape hatch. * src/lib/core/baseProvider.ts: handleVideoGeneration now dispatches through VideoProcessor.generate(provider, ...) honoring output.video .provider; propagates real provider in logs + result.provider; rejects unknown providers with VIDEO_ERROR_CODES.PROVIDER_NOT_SUPPORTED instead of silently falling back to Vertex. Tests ----- * test/continuous-test-suite-tts.ts: registration assertions for fish-audio + cartesia plus live e2e tests (real Cartesia API call verified — 30 KB MP3 output). * test/continuous-test-suite-media-gen.ts: video registration check, unknown-provider rejection (regression guard for the silent-vertex- fallback bug), per-provider routing tests for Kling/Runway/Replicate. Each PASSes either via a successful generation or via the handler's own typed error (which proves the dispatcher reached the right handler). Docs ---- * New docs/getting-started/providers/cartesia.md provider guide. * docs/features/tts.md: env-setup, Supported Providers table, CLI --tts-provider enum, and per-provider examples updated with Fish Audio + Cartesia. * docs/features/video-generation.md: four-provider compatibility table, per-provider routing examples, and a Unknown Provider Behavior subsection documenting the new PROVIDER_NOT_SUPPORTED guard. * docs/getting-started/provider-setup.md: new Fish Audio + Cartesia TTS configuration sections after Azure Speech. * docs/getting-started/providers/index.md: Cartesia card under TTS plus provider feature-matrix row. * README.md: TTS provider count 4 -> 6; tts.provider union widened. * CLAUDE.md: project-overview line widened to include Fish Audio, Cartesia, and the media-gen handler set. Verification ------------ - pnpm exec tsc --noEmit --strict: pass - pnpm run lint: pass (0 errors, 15 pre-existing warnings) - pnpm run build: pass; publint "All good!" - pnpm run test:tts: 18/18 PASS (Fish Audio + Cartesia live e2e) - pnpm run test:music: 12/12 PASS - pnpm run test:avatar: 10/10 PASS - pnpm run test:media: 23/23 PASS (Kling/Runway/Replicate routing) - pnpm run test:bugfixes: 48/48 PASS - pnpm run test:credentials: 15/15 PASS - pnpm run test:unit: 7 sub-suites all PASS - pnpm run test:ci: exit 0 (main + client 13/13 + hitl 4/4)
1 parent 74d73cc commit 269cae6

52 files changed

Lines changed: 2120 additions & 103 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,3 +176,6 @@ docs-site/static/search-index.json
176176

177177
# Local working notes for the AI SDK removal migration (not for upstream)
178178
memory-bank/native-runtime/
179+
180+
# Consumer-supplied gap/spec docs we work from locally — not part of the repo
181+
NEUROLINK_SDK_GAPS.md

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Guidance for Claude Code when working in this repository.
1616

1717
## Project Overview
1818

19-
NeuroLink is a unified AI development platform shipping as both a **TypeScript SDK** and **CLI**. It wraps 21+ AI providers (OpenAI, Anthropic, Google AI Studio, Vertex, AWS Bedrock, Azure, Mistral, LiteLLM, SageMaker, Hugging Face, Ollama, OpenAI-compatible, DeepSeek, NVIDIA NIM, LM Studio, llama.cpp, OpenRouter, ElevenLabs, Deepgram, Azure Speech, and more) behind a single consistent API, with full MCP support, multimodal file processing, voice (TTS/STT/realtime), RAG pipelines, observability, and a workflow engine.
19+
NeuroLink is a unified AI development platform shipping as both a **TypeScript SDK** and **CLI**. It wraps 21+ AI providers (OpenAI, Anthropic, Google AI Studio, Vertex, AWS Bedrock, Azure, Mistral, LiteLLM, SageMaker, Hugging Face, Ollama, OpenAI-compatible, DeepSeek, NVIDIA NIM, LM Studio, llama.cpp, OpenRouter, ElevenLabs, Deepgram, Azure Speech, Fish Audio, Cartesia, and more) behind a single consistent API, with full MCP support, multimodal file processing, voice (TTS/STT/realtime), media generation (image / video / music / avatar with Kling / Runway / Replicate / Beatoven / Lyria / D-ID / HeyGen handlers), RAG pipelines, observability, and a workflow engine.
2020

2121
---
2222

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Extracted from production systems at Juspay and battle-tested at enterprise scal
4343
| Feature | Version | Description | Guide |
4444
| -------------------------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
4545
| **Avatar / Music Modalities + 12 Providers** | next | New `output: { mode: "avatar" \| "music" }` dispatch with handlers for D-ID, HeyGen, Replicate-MuseTalk (avatar) and Beatoven, ElevenLabs Music, Lyria, Replicate-MusicGen (music). Plus Fish Audio TTS, Kling/Runway/Replicate video, xAI/Groq/Cohere/Together/Fireworks/Perplexity/Cloudflare LLMs, Voyage/Jina embeddings, Stability/Ideogram/Recraft/Replicate image-gen. | [Provider Integration](docs/provider-integration/) |
46-
| **Multi-Provider Voice (TTS/STT)** | v9.62.0 | 4 TTS providers (OpenAI TTS, ElevenLabs, Google TTS, Azure TTS) + 4 STT providers (Whisper, Deepgram, Azure STT, Google STT) + 2 realtime APIs (OpenAI Realtime, Gemini Live). | [TTS Guide](docs/features/tts.md) \| [STT Guide](docs/features/audio-input.md) \| [Realtime Guide](docs/features/real-time-services.md) |
46+
| **Multi-Provider Voice (TTS/STT)** | v9.62.0 | 6 TTS providers (OpenAI TTS, ElevenLabs, Google TTS, Azure TTS, Fish Audio, Cartesia) + 4 STT providers (Whisper, Deepgram, Azure STT, Google STT) + 2 realtime APIs (OpenAI Realtime, Gemini Live). | [TTS Guide](docs/features/tts.md) \| [STT Guide](docs/features/audio-input.md) \| [Realtime Guide](docs/features/real-time-services.md) |
4747
| **4 New Providers** | v9.60.0 | DeepSeek (V3/R1), NVIDIA NIM (400+ catalog), LM Studio (local), llama.cpp (GGUF local). | [Provider Setup](docs/getting-started/provider-setup.md) |
4848
| **ModelAccessDeniedError** | v9.59.0 | Typed `ModelAccessDeniedError` + `sdk.checkCredentials()` API for proactive credential validation before first call. | [Error Reference](docs/reference/troubleshooting.md) |
4949
| **Provider Fallback Policy** | v9.58.0 | `providerFallback` callback + `modelChain` config for centralized multi-provider fallback logic. | [Advanced Guide](docs/advanced/index.md) |
@@ -71,7 +71,7 @@ const result = await neurolink.generate({
7171
voice: "en-US-Neural2-C",
7272
format: "mp3",
7373
output: "./output.mp3", // optional: save to disk
74-
provider: "elevenlabs", // optional override: openai-tts | elevenlabs | google-ai | vertex | azure-tts
74+
provider: "elevenlabs", // optional override: openai-tts | elevenlabs | google-ai | vertex | azure-tts | fish-audio | cartesia
7575
},
7676
});
7777
// result.audio: { buffer: Buffer, format: "mp3", ... }
@@ -955,16 +955,16 @@ Full command and API breakdown lives in [`docs/cli/commands.md`](docs/cli/comman
955955
956956
## Platform Capabilities at a Glance
957957
958-
| Capability | Highlights |
959-
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------ |
960-
| **Provider unification** | 21+ providers with automatic fallback, cost-aware routing, `providerFallback` policy, `modelChain` config. |
961-
| **Multimodal pipeline** | Stream images + CSV data + PDF documents across providers with local/remote assets. Auto-detection for mixed file types. |
962-
| **Voice pipeline** | TTS (4 providers) + STT (4 providers) + realtime voice APIs (OpenAI Realtime, Gemini Live). |
963-
| **Quality & governance** | Auto-evaluation engine (14 scorers), guardrails middleware, HITL workflows, audit logging. |
964-
| **Memory & context** | Per-user condensed memory (S3/Redis/SQLite), Redis session export, 4-stage context compaction. |
965-
| **CLI tooling** | Loop sessions, setup wizard, config validation, Redis auto-detect, JSON output, TTS/STT flags. |
966-
| **Enterprise ops** | Claude proxy, OTLP observability, OpenObserve dashboard, regional routing, credential management. |
967-
| **Tool ecosystem** | MCP auto discovery, HTTP/stdio/SSE/WebSocket transports, LiteLLM hub access, SageMaker custom deployment, web search. |
958+
| Capability | Highlights |
959+
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
960+
| **Provider unification** | 21+ providers with automatic fallback, cost-aware routing, `providerFallback` policy, `modelChain` config. |
961+
| **Multimodal pipeline** | Stream images + CSV data + PDF documents across providers with local/remote assets. Auto-detection for mixed file types. |
962+
| **Voice pipeline** | TTS (6 providers: Google, OpenAI, ElevenLabs, Azure, Fish Audio, Cartesia) + STT (4 providers) + realtime voice APIs (OpenAI Realtime, Gemini Live). |
963+
| **Quality & governance** | Auto-evaluation engine (14 scorers), guardrails middleware, HITL workflows, audit logging. |
964+
| **Memory & context** | Per-user condensed memory (S3/Redis/SQLite), Redis session export, 4-stage context compaction. |
965+
| **CLI tooling** | Loop sessions, setup wizard, config validation, Redis auto-detect, JSON output, TTS/STT flags. |
966+
| **Enterprise ops** | Claude proxy, OTLP observability, OpenObserve dashboard, regional routing, credential management. |
967+
| **Tool ecosystem** | MCP auto discovery, HTTP/stdio/SSE/WebSocket transports, LiteLLM hub access, SageMaker custom deployment, web search. |
968968
969969
## Documentation Map
970970

docs-site/scripts/sync-docs.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1364,6 +1364,7 @@ const LINK_MAPPINGS: Record<string, string> = {
13641364
"elevenlabs-music": "/getting-started/providers/elevenlabs-music",
13651365
kling: "/getting-started/providers/kling",
13661366
runway: "/getting-started/providers/runway",
1367+
cartesia: "/getting-started/providers/cartesia",
13671368

13681369
// ── New provider-integration guides + their cross-refs ────────────────
13691370
"00-architecture": "/provider-integration/00-architecture",
70 Bytes
Loading
70 Bytes
Loading
70 Bytes
Loading
70 Bytes
Loading
70 Bytes
Loading
70 Bytes
Loading

0 commit comments

Comments
 (0)