Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 39 additions & 118 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,13 @@
# A Hackers AI Voice Assistant

### I am not mantaining this repo anymore. If you want to take over, please shoot me a message.

Build your own voice ai. This repo is for my [YouTube video series](https://www.youtube.com/playlist?list=PL5rWfvZIL-NpFXM9nFr15RmEEh4F4ePZW) on building an AI voice assistant with PyTorch.

## Looking for contributors!
Looking for contributors to help build out the assistant. There is still alot of work to do. This would be a good oppurtunity to learn Machine Learning and how to Engineer an entire ML system from the ground up. If you're interested join the [Discord Server](https://discord.gg/9wSTT4F)

TODO:
- [x] wake word model and engine
- [ ] pre-trained wake word model use for fine tuning on your own wakeword
- [x] speech recognition model, pretrained model, and engine
- [ ] natural langauge understanding model, pretrained model, and engine
- [ ] speech synthesis model, pretrained model, and engine
- [ ] skills framework
- [ ] Core A.I. Voice Assistant logic to integrate wake word, speech recongition, natural language understanding, speech sysnthesis, and the skills framework.

## Running on native machine
### dependencies
* python3
* portaudio (for recording with pyaudio to work)
* [ctcdecode](https://github.com/parlance/ctcdecode) - for speechrecognition
* [ctcdecode](https://github.com/parlance/ctcdecode)

If you're on mac you can install `portaudio` using `homebrew`

**NOTICE: If you are using windows, some things may not work. For example, torchaudio. I suggest trying this on linux or mac, or use wsl2 on windows**

### using virtualenv (recommend)
1. `virtualenv voiceassistant.venv`
2. `source voiceassistant.venv/bin/activate`
Expand All @@ -41,115 +23,54 @@ If you are running with just the cpu
If you are running on a cuda enabled machine
`docker build -f Dockerfile -t voiceassistant .`

## Wake word
[Youtube Video For WakeWord](https://www.youtube.com/watch?v=ob0p7G2QoHA&list=PL5rWfvZIL-NpFXM9nFr15RmEEh4F4ePZW)

### scripts
For more details make sure to visit these files to look at script arguments and description

`wakeword/neuralnet/train.py` is used to train the model

`wakeword/neuralnet/optimize_graph.py` is used to create a production ready graph that can be used in `engine.py`

`wakeword/engine.py` is used to demo the wakeword model

`wakeword/scripts/collect_wakeword_audio.py` - used to collect wakeword and environment data

`wakeword/scripts/split_audio_into_chunks.py` - used to split audio into n second chunks

`wakeword/scripts/split_commonvoice.py` - if you download the common voice dataset, use this script to split it into n second chunks

`wakeword/scripts/create_wakeword_jsons.py` - used to create the wakeword json for training

### Steps to train and demo your wakeword model

For more details make sure to visit these files to look at script arguments and description

1. collect data
1. environment and wakeword data can be collected using `python collect_wakeword_audio.py`
```
cd VoiceAssistant/wakeword/scripts
mkdir data
cd data
mkdir 0 1 wakewords
python collect_wakeword_audio.py --sample_rate 8000 --seconds 2 --interactive --interactive_save_path ./data/wakewords
```
2. to avoid the imbalanced dataset problem, we can duplicate the wakeword clips with
```
python replicate_audios.py --wakewords_dir data/wakewords/ --copy_destination data/1/ --copy_number 100
```
3. be sure to collect other speech data like common voice. split the data into n seconds chunk with `split_audio_into_chunks.py`.
4. put data into two seperate directory named `0` and `1`. `0` for non wakeword, `1` for wakeword. use `create_wakeword_jsons.py` to create train and test json
5. create a train and test json in this format...
```
// make each sample is on a seperate line
{"key": "/path/to/audio/sample.wav, "label": 0}
{"key": "/path/to/audio/sample.wav, "label": 1}
```

2. train model
1. use `train.py` to train model
2. after model training us `optimize_graph.py` to create an optimized pytorch model

3. test
1. test using the `engine.py` script


## Speech Recognition
[YouTube Video for Speech Recognition](https://www.youtube.com/watch?v=YereI6Gn3bM&list=PL5rWfvZIL-NpFXM9nFr15RmEEh4F4ePZW&index=2)

### scripts
For more details make sure to visit these files to look at script arguments and description

`speechrecognition/scripts/mimic_create_jsons.py`is used to create the train.json and test.json files with Mimic Recording Studio
# Projet Voice Assistant

`speechrecognition/scripts/commonvoice_create_jsons.py`is used to convert mp3 into wav and create the train.json and test.json files with the Commonvoice dataset
Ce projet contient du code et des ressources pour implémenter un assistant vocal.

`spechrecognition/neuralnet/train.py` is used to train the model
## Structure des Répertoires

`spechrecognition/neuralnet/optimize_graph.py` is used to create a production ready graph that can be used in `engine.py`
### fun\arnold_audio
- Contient des fichiers audio ou des ressources liées à la voix d'Arnold.

`spechrecognition/engine.py` is used to demo the speech recognizer model
### VoiceAssistant\nlu
- **images**: Dossier contenant des images liées au réseau de neurones ou au jeu de données.
- **neuralnet**: Contient des fichiers et des scripts pour l'implémentation du réseau de neurones.
- **nlu_dataset**: Jeu de données utilisé pour l'entraînement ou les tests des modèles NLU.
- **scripts**:
- `__init__.py`: Fichier d'initialisation pour les packages Python.
- `bash.sh`: Script bash pour diverses tâches, par exemple, la configuration de l'environnement, l'exécution de tâches, etc.
- `engine.py`: Script principal gérant les fonctionnalités principales du NLU.
- `meta_data.bin`: Fichier binaire contenant des métadonnées.
- `readme.md`: Documentation fournissant des détails sur la partie NLU du projet.
- `requirements.txt`: Liste des dépendances requises pour exécuter la partie NLU.

`spechrecognition/demo/demo.py` is used to demo the speech recognizer model with a Web GUI
### speechrecognition
- **demo**:
- Contient des fichiers de démonstration et des modèles illustrant le fonctionnement de la reconnaissance vocale.
- `demo.html`: Modèle HTML pour l'interface de démonstration.
- `demo.py`: Script Python pour les fonctionnalités de démonstration.

- **templates**:
- Modèles HTML utilisés dans différentes parties du module de reconnaissance vocale.

### Steps for pretraining or finetuning speech recognition model
- **neuralnet**:
- Fichiers liés à l'implémentation du réseau de neurones dans le module de reconnaissance vocale, notamment :
* dataset.py: Gère le chargement et la prétraitement des données du jeu de données,
* model.py: Définit l'architecture du modèle de réseau de neurones,
* optimize_graph.py: Script pour optimiser le graphe de calcul,
* scorer.py: Évalue les performances du modèle,
* train.dv: Script d'entraînement ou de validation des données.

The pretrained model can be found here at this [google drive](https://drive.google.com/drive/folders/14ljfpvisK1tz8fvFYETbdWqR3lOmJ_2Y?usp=sharing)
## Pour Commencer

1. Collect your own data - the pretrain model was trained on common voice. To make this model work for you, you can collect about an hour or so of your own voice using the [Mimic Recording Studio](https://github.com/MycroftAI/mimic-recording-studio). They have prompts that you can read from.
1. collect data using mimic recording studio, or your own dataset.
2. be sure to chop up your audio into 5 - 16 seconds chunks max.
3. create a train and test json in this format...
```
// make each sample is on a seperate line
{"key": "/path/to/audio/speech.wav, "text": "this is your text"}
{"key": "/path/to/audio/speech.wav, "text": "another text example"}
```
use `mimic_create_jsons.py` to create train and test json's with the data from Mimic Recording Studio.

python mimic_create_jsons.py --file_folder_directory /dir/to/the/folder/with/the/studio/data --save_json_path /path/where/you/want/them/saved
Suivez ces étapes pour configurer et exécuter le projet :

(The Mimic Recording Studio files are usually stored in ~/mimic-recording-studio-master/backend/audio_files/[random_string].)

use `commonvoice_create_jsons.py` to convert from mp3 to wav and to create train and test json's with the data from Commonvoice by Mozilla

python commonvoice_create_jsons.py --file_path /path/to/commonvoice/file/.tsv --save_json_path /path/where/you/want/them/saved

if you dont want to convert use `--not-convert`

2. Train model
1. use `train.py` to fine tune. checkout the [train.py](https://github.com/LearnedVector/A-Hackers-AI-Voice-Assistant/blob/master/VoiceAssistant/speechrecognition/neuralnet/train.py#L115) argparse for other arguments
```
python train.py --train_file /path/to/train/json --valid_file /path/to/valid/json --load_model_from /path/to/pretrain/speechrecognition.ckpt
```
2. To train from scratch omit the `--load_model_from` argument in train.py
3. after model training us `optimize_graph.py` to create a frozen optimized pytorch model. The pretrained optimized torch model can be found in the google drive link as `speechrecognition.zip`
1. Clonez ce dépôt localement,
2. Accédez au répertoire VoiceAssistant\nlu,
3. Installez les dépendances avec la commande pip install requirements.txt,
4. Exécutez bash.sh pour effectuer les tâches de configuration initiale
5. Lancez engine.py pour démarrer le moteur NLU,

Pour tester la reconnaissance vocale, accédez au répertoire speechrecognition\demo et exécutez demo.py après avoir configuré l'environnement comme indiqué dans le readme.md à l'intérieur du dossier nlu.

3. test
1. test using the `engine.py` script

## Raspberry pi
documenation to get this running on rpi is in progress...
2 changes: 1 addition & 1 deletion VoiceAssistant/nlu/neuralnet/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ def __init__(self,num_entity, num_intent, num_scenario):
config.BASE_MODEL
)
self.drop_1 = nn.Dropout(0.3)
self.drop_2 = nn.Dropout(0.3)
self.drop_2 = nn.Dropout(0.2)
self.drop_3 = nn.Dropout(0.3)

self.out_entity = nn.Linear(768,self.num_entity)
Expand Down
2 changes: 1 addition & 1 deletion VoiceAssistant/speechrecognition/demo/templates/demo.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<html>

<head>
<title>Speech Recognition Demo</title>
<title>Speech Recognition App</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
Expand Down
2 changes: 1 addition & 1 deletion VoiceAssistant/speechrecognition/neuralnet/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ class SpeechRecognition(nn.Module):
hyper_parameters = {
"num_classes": 29,
"n_feats": 81,
"dropout": 0.1,
"dropout": 0.2,
"hidden_size": 1024,
"num_layers": 1
}
Expand Down
2 changes: 1 addition & 1 deletion VoiceAssistant/speechrecognition/neuralnet/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def configure_optimizers(self):
self.optimizer = optim.AdamW(self.model.parameters(), self.args.learning_rate)
self.scheduler = optim.lr_scheduler.ReduceLROnPlateau(
self.optimizer, mode='min',
factor=0.50, patience=6)
factor=0.50, patience=7)
return [self.optimizer], [self.scheduler]

def step(self, batch):
Expand Down