Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CustomSettings.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@
assistantSpeechOn = True
offlineTTS = False # Set to true if you want to use an offline TTS instead of ElevenLabs or Google TTS
elevenLabs = True # For a highly realistic English TTS
sixtyDB = True # 60db.ai English TTS (used together with ElevenLabs)

# English TTS provider priority: "elevenlabs" or "60db".
# The chosen provider is used first; if it fails, the other is tried automatically, then offline TTS.
ttsProvider = "elevenlabs"
googleSTT = True # If this is False, a pretty bad STT in english will be used

# Google Cloud Speech To Text and Text To Speech
Expand Down Expand Up @@ -74,6 +79,14 @@
#VOICE_ID = "TxGEqnHWrfWFTfGW9XjX" # Josh (Snarky voice)


# 60db.ai (English TTS) - get voice IDs from "GET https://api.60db.ai/myvoices"
SIXTYDB_VOICE_ID = "{VOICE_ID_HERE}" # Leave as-is to use the 60db system default voice
SIXTYDB_STABILITY = 30 # 0-100, lower = less stable & funnier (mirrors ElevenLabs STABILITY x100)
SIXTYDB_SIMILARITY = 90 # 0-100, source voice matching (mirrors ElevenLabs SIMILARITY_BOOST x100)
SIXTYDB_SPEED = 1.0 # 0.5-2.0 speech speed multiplier
SIXTYDB_ENHANCE = True # Audio quality improvement



# Print out more outputs, for debugging (Developer Mode)
devMode = False
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SmartVoiceAssistant
A smart AI voice assistant with multi-language support and long-term memory. Currently best for Swedish and English. Compatible with Windows and Raspberry Pi. The assistant can use various functions and tools to answer question (Google, Wolfram Alpha, etc.). Based on OpenAI's GPT-models, Google STT and TTS, and ElevenLabs TTS.
A smart AI voice assistant with multi-language support and long-term memory. Currently best for Swedish and English. Compatible with Windows and Raspberry Pi. The assistant can use various functions and tools to answer question (Google, Wolfram Alpha, etc.). Based on OpenAI's GPT-models, Google STT and TTS, ElevenLabs TTS, and 60db.ai TTS.

<img src="https://github.com/ottobjorkland/SmartVoiceAssistant/assets/81506445/015800af-8b39-46a6-be58-94af17b0179a" height="400">

Expand All @@ -13,6 +13,7 @@ If you do not want to gather all of this information or do not have time, simply
- [OpenAI API Key](https://platform.openai.com/account/api-keys): If you have an OpenAI account, you can find your API Key in the user settings.
- [Porcupine Access Key (Wake-word recognizer)](https://console.picovoice.ai/): Sign up for Picovoice Console to get your Access Key.
- [ElevenLabs API Key (Text-To-Speech)](https://elevenlabs.io/): Create an account, click the profile icon in the top-right corner, and get the API Key from "Profile Settings".
- [60db.ai API Key (Text-To-Speech)](https://60db.ai/): Create an account and get your API Key from the dashboard. Used as an alternative/fallback English TTS alongside ElevenLabs (see the "English Text-To-Speech providers" section below).
- [Wolfram Alpha App ID](https://developer.wolframalpha.com/): Sign up for a developer account, create an app under "My Apps" > "Get an AppID", and get the AppID
- Google
- [Custom Search API Key (developerKey)](https://developers.google.com/custom-search/v1/overview): Create a new project and get the API Key by clicking "Get a Key".
Expand All @@ -26,6 +27,33 @@ If you do not want to gather all of this information or do not have time, simply
1. A JSON key file will be downloaded to your device
1. Move the JSON file to the repository folder that all other files Python and JSON files are in

## English Text-To-Speech providers (ElevenLabs + 60db.ai)
For English, the assistant supports two high-quality TTS providers: **ElevenLabs** and **60db.ai**. You pick one as the primary; if it fails (e.g. a network error or wrong API Key), the other is tried automatically, and if both fail it falls back to the offline TTS. Swedish always uses Google Cloud TTS.

### Configure it in `CustomSettings.py`
```python
elevenLabs = True # Enable ElevenLabs
sixtyDB = True # Enable 60db.ai
ttsProvider = "elevenlabs" # Which one to use first: "elevenlabs" or "60db" (the other is the automatic fallback)

# 60db.ai voice & settings (English) - get voice IDs from "GET https://api.60db.ai/myvoices"
SIXTYDB_VOICE_ID = "{VOICE_ID_HERE}" # Leave as-is to use the 60db system default voice
SIXTYDB_STABILITY = 30 # 0-100, lower = less stable & funnier
SIXTYDB_SIMILARITY = 90 # 0-100, source voice matching
SIXTYDB_SPEED = 1.0 # 0.5-2.0 speech speed multiplier
SIXTYDB_ENHANCE = True # Audio quality improvement
```
Add your 60db API Key in `apiKeys.py`:
```python
SIXTYDB_API_KEY = "{KEY_HERE}"
```

### Notes
- To make 60db the primary voice, set `ttsProvider = "60db"`.
- To disable a provider entirely, set its flag (`elevenLabs` / `sixtyDB`) to `False`.
- Finding your 60db voice IDs: send a `GET` request to `https://api.60db.ai/myvoices` with the header `Authorization: Bearer YOUR_API_KEY`; copy a `voice_id` from the response into `SIXTYDB_VOICE_ID`.
- No extra packages are required for 60db (it uses `requests` and the built-in `base64`, which are already installed).

## Raspberry Pi
### Set-Up
**Run these commands to install packages on a Raspberry Pi (terminal):**
Expand Down
194 changes: 130 additions & 64 deletions VoiceAssistant_5.3.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import os, time, math, re, json, pyttsx3, requests, pvporcupine, struct, pyaudio, datetime, tiktoken
import os, time, math, re, json, base64, pyttsx3, requests, pvporcupine, struct, pyaudio, datetime, tiktoken
import speech_recognition as sr
from colorama import init, Fore, Back, Style
from urllib.request import urlopen
Expand Down Expand Up @@ -42,7 +42,9 @@
from CustomSettings import wakeUpWords, STABILITY, SIMILARITY_BOOST, VOICE_ID, animationFPS, googleSTT, swedish, english, maxToolsPerPrompt, openAIdelay
from CustomSettings import googleTTS_name, googleTTS_gender, swedishStartPrompt, wakeWordOn, wakeSpeaker, speakerSleepTime, RaspberryPi, devMode, overrideMemPrompt
from CustomSettings import sweOverrideMemPrompt, GPT4, elevenLabs, wolframAlpha, googleSearch
from CustomSettings import sixtyDB, ttsProvider, SIXTYDB_VOICE_ID, SIXTYDB_STABILITY, SIXTYDB_SIMILARITY, SIXTYDB_SPEED, SIXTYDB_ENHANCE
from apiKeys import openai_api_key, porcupineAccessKey, XI_API_KEY, googleCustomSearchAPI, googleSearchEngineID, GOOGLE_JSON_CREDENTIALS, wolframAlphaAppID
from apiKeys import SIXTYDB_API_KEY
import openai
openai.api_key = openai_api_key

Expand Down Expand Up @@ -719,79 +721,143 @@ def on_end_reached(event):
pygame.init()
pygame.mixer.init()

def elevenLabsTTS(text):
# Generate English speech with ElevenLabs and write it to textToSpeechFilePath
headers = {
"Accept": "audio/mpeg",
"xi-api-key": XI_API_KEY,
"Content-Type": "application/json"
}
data = {
"text": text,
"voice_settings": {
"stability": STABILITY,
"similarity_boost": SIMILARITY_BOOST
}
}
if vlcLib == True: # For Raspberry Pi
response = requests.post(f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}", json=data, headers=headers)
response.raise_for_status()
with open(textToSpeechFilePath, "wb") as f:
f.write(response.content)
else: # Streamed (for Windows/pygame and as the default)
response = requests.post(f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream", json=data, headers=headers, stream=True)
response.raise_for_status()
with open(textToSpeechFilePath, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)

def sixtyDBtts(text):
# Generate English speech with 60db.ai and write it to textToSpeechFilePath.
# 60db returns JSON with the audio as a base64 string, so we decode it before writing.
headers = {
"Authorization": f"Bearer {SIXTYDB_API_KEY}",
"Content-Type": "application/json"
}
data = {
"text": text,
"stability": SIXTYDB_STABILITY, # 0-100
"similarity": SIXTYDB_SIMILARITY, # 0-100
"speed": SIXTYDB_SPEED, # 0.5-2.0
"enhance": SIXTYDB_ENHANCE,
"output_format": "mp3" # keep mp3 so it plays from textToSpeech.mp3 like every other provider
}
if SIXTYDB_VOICE_ID and "{" not in SIXTYDB_VOICE_ID: # Otherwise use the 60db system default voice
data["voice_id"] = SIXTYDB_VOICE_ID
response = requests.post("https://api.60db.ai/tts-synthesize", json=data, headers=headers)
response.raise_for_status()
payload = response.json()
if not payload.get("success") or not payload.get("audio_base64"):
raise RuntimeError(payload.get("message", "60db returned no audio"))
with open(textToSpeechFilePath, "wb") as f:
f.write(base64.b64decode(payload["audio_base64"]))

def englishTTSproviders():
# Ordered list of (name, generatorFunc) for English TTS.
# The provider chosen in ttsProvider comes first, the other is the automatic fallback.
allProviders = {
"ElevenLabs": (elevenLabs, elevenLabsTTS),
"60db": (sixtyDB, sixtyDBtts),
}
primary = "60db" if str(ttsProvider).lower() in ("60db", "sixtydb") else "ElevenLabs"
secondary = "ElevenLabs" if primary == "60db" else "60db"
order = []
for name in (primary, secondary):
enabled, generate = allProviders[name]
if enabled:
order.append((name, generate))
return order

def disableTTSprovider(name):
# Disable a provider for the rest of the session (e.g. after a wrong API key) so we stop retrying it
global elevenLabs, sixtyDB
if name == "ElevenLabs": elevenLabs = False
elif name == "60db": sixtyDB = False
print(Style.BRIGHT+Fore.RED+f"WARNING: Disabling {name} for this session. Please provide a valid API Key in apiKeys.py to use {name}.")

def isAuthError(e):
# True if the exception is an HTTP 401/403 (likely a wrong/missing API key)
response = getattr(e, "response", None)
return response is not None and getattr(response, "status_code", None) in (401, 403)

def textToSpeech(text, language):

global elevenLabs

print("Generating text-to-speech...")

if language == "sv": # Swedish
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(text=text)
if language == "sv": # Swedish (Google Cloud TTS)
try:
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(text=text)

if googleTTS_gender == "MALE": ssml_gender=texttospeech.SsmlVoiceGender.MALE
else: ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
if googleTTS_gender == "MALE": ssml_gender=texttospeech.SsmlVoiceGender.MALE
else: ssml_gender=texttospeech.SsmlVoiceGender.FEMALE

voice = texttospeech.VoiceSelectionParams(
language_code="sv-SE",
name=googleTTS_name,
ssml_gender=ssml_gender,
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
response = client.synthesize_speech(
request={"input": input_text, "voice": voice, "audio_config": audio_config}
)
with open(textToSpeechFilePath, "wb") as out:
out.write(response.audio_content)

else: # English
if elevenLabs:
headers = {
"Accept": "audio/mpeg",
"xi-api-key": XI_API_KEY,
"Content-Type": "application/json"
}
data = {
"text": text,
"voice_settings": {
"stability": STABILITY,
"similarity_boost": SIMILARITY_BOOST
}
}
try:
if vlcLib == True: # For Raspberry Pi
response = requests.post(f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}", json=data, headers=headers)
with open(textToSpeechFilePath, "wb") as f:
f.write(response.content)
elif pygameLib == True: # For Windows
response = requests.post(f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream", json=data, headers=headers, stream=True)
with open(textToSpeechFilePath, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
except Exception as e: # There was an error
print(Style.BRIGHT+Fore.RED+f"ElevenLabs Error:\n{e}")
print("Using offline text-to-speech instead...")
offlineTextToSpeech(text)
return None
else: # elevenLabs == False
offlineTextToSpeech(text)
return None

try:
playAudio(language)
except Exception as e:
print(Style.BRIGHT+Fore.RED+f"Error trying to play audio:\n{e}")
if elevenLabs == True: # If ElevenLabs is used, this error likely occured because of wrong API Key
print(Style.BRIGHT+Fore.RED+"WARNING: This error likely occured because the ElvenLabs API Key is wrong. Please provide an API Key in apiKeys.py to use ElevenLabs.")
voice = texttospeech.VoiceSelectionParams(
language_code="sv-SE",
name=googleTTS_name,
ssml_gender=ssml_gender,
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
response = client.synthesize_speech(
request={"input": input_text, "voice": voice, "audio_config": audio_config}
)
with open(textToSpeechFilePath, "wb") as out:
out.write(response.audio_content)
playAudio(language)
except Exception as e:
print(Style.BRIGHT+Fore.RED+f"Google TTS Error:\n{e}")
print("Using offline text-to-speech instead...")
offlineTextToSpeech(text)
elevenLabs = False
return None
return textToSpeechFilePath

return textToSpeechFilePath
# English: try the selected provider first, then the other, then offline TTS
for name, generate in englishTTSproviders():
try:
generate(text) # Download/decode the audio to textToSpeechFilePath
except Exception as e: # The request to the provider failed
if isAuthError(e):
print(Style.BRIGHT+Fore.RED+f"{name} Error (likely a wrong/missing API Key in apiKeys.py):\n{e}")
disableTTSprovider(name)
else:
print(Style.BRIGHT+Fore.RED+f"{name} Error:\n{e}")
print("Trying the next text-to-speech option...")
continue
try:
playAudio(language)
return textToSpeechFilePath
except Exception as e: # The audio could not be played (corrupt audio or wrong API Key)
print(Style.BRIGHT+Fore.RED+f"Error trying to play {name} audio (the API Key may be wrong):\n{e}")
disableTTSprovider(name)
print("Trying the next text-to-speech option...")
continue

print("Using offline text-to-speech instead...")
offlineTextToSpeech(text)
return None

def offlineTextToSpeech(text):
tts.say(text)
Expand Down
Binary file added __pycache__/CustomSettings.cpython-311.pyc
Binary file not shown.
Binary file added __pycache__/VoiceAssistant_5.3.cpython-311.pyc
Binary file not shown.
Binary file added __pycache__/apiKeys.cpython-311.pyc
Binary file not shown.
3 changes: 3 additions & 0 deletions apiKeys.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
# ElevenLabs
XI_API_KEY = "{KEY_HERE}"

# 60db.ai (Text-To-Speech)
SIXTYDB_API_KEY = "{KEY_HERE}"

# Google Search Engine
googleCustomSearchAPI = "{KEY_HERE}" # Google Custom Search API Key (developerKey)
googleSearchEngineID = "{ENGINE_ID_HERE}" # Google Search Engine ID (cx)
Expand Down