LearnedVector · fradjmag38 · Mar 31, 2024 · Mar 31, 2024
diff --git a/README.md b/README.md
@@ -1,31 +1,13 @@
 # A Hackers AI Voice Assistant
 
-### I am not mantaining this repo anymore. If you want to take over, please shoot me a message.
-
-Build your own voice ai. This repo is for my [YouTube video series](https://www.youtube.com/playlist?list=PL5rWfvZIL-NpFXM9nFr15RmEEh4F4ePZW) on building an AI voice assistant with PyTorch.
-
-## Looking for contributors!
-Looking for contributors to help build out the assistant. There is still alot of work to do. This would be a good oppurtunity to learn Machine Learning and how to Engineer an entire ML system from the ground up. If you're interested join the [Discord Server](https://discord.gg/9wSTT4F)
-
-TODO:
-- [x] wake word model and engine
-- [ ] pre-trained wake word model use for fine tuning on your own wakeword
-- [x] speech recognition model, pretrained model, and engine
-- [ ] natural langauge understanding model, pretrained model, and engine
-- [ ] speech synthesis model, pretrained model, and engine
-- [ ] skills framework
-- [ ] Core A.I. Voice Assistant logic to integrate wake word, speech recongition, natural language understanding, speech sysnthesis, and the skills framework.
-
 ## Running on native machine
 ### dependencies
 * python3
 * portaudio (for recording with pyaudio to work)
-* [ctcdecode](https://github.com/parlance/ctcdecode) - for speechrecognition
+* [ctcdecode](https://github.com/parlance/ctcdecode) 
 
 If you're on mac you can install `portaudio` using `homebrew`
 
-**NOTICE: If you are using windows, some things may not work. For example, torchaudio. I suggest trying this on linux or mac, or use wsl2 on windows**
-
 ### using virtualenv (recommend)
 1. `virtualenv voiceassistant.venv`
 2. `source voiceassistant.venv/bin/activate`
@@ -41,115 +23,54 @@ If you are running with just the cpu
 If you are running on a cuda enabled machine 
 `docker build -f Dockerfile -t voiceassistant .`
 
-## Wake word
-[Youtube Video For WakeWord](https://www.youtube.com/watch?v=ob0p7G2QoHA&list=PL5rWfvZIL-NpFXM9nFr15RmEEh4F4ePZW)
-
-### scripts
-For more details make sure to visit these files to look at script arguments and description
-
-`wakeword/neuralnet/train.py` is used to train the model
-
-`wakeword/neuralnet/optimize_graph.py` is used to create a production ready graph that can be used in `engine.py`
-
-`wakeword/engine.py` is used to demo the wakeword model
-
-`wakeword/scripts/collect_wakeword_audio.py` - used to collect wakeword and environment data
-
-`wakeword/scripts/split_audio_into_chunks.py` - used to split audio into n second chunks
-
-`wakeword/scripts/split_commonvoice.py` - if you download the common voice dataset, use this script to split it into n second chunks
-
-`wakeword/scripts/create_wakeword_jsons.py` - used to create the wakeword json for training
-
-### Steps to train and demo your wakeword model
-
-For more details make sure to visit these files to look at script arguments and description
-
-1. collect data
-    1. environment and wakeword data can be collected using `python collect_wakeword_audio.py`
-       ```
-       cd VoiceAssistant/wakeword/scripts
-       mkdir data
-       cd data
-       mkdir 0 1 wakewords
-       python collect_wakeword_audio.py --sample_rate 8000 --seconds 2 --interactive --interactive_save_path ./data/wakewords
-       ```
-    2. to avoid the imbalanced dataset problem, we can duplicate the wakeword clips with 
-       ```
-       python replicate_audios.py --wakewords_dir data/wakewords/ --copy_destination data/1/ --copy_number 100
-       ```
-    3. be sure to collect other speech data like common voice. split the data into n seconds chunk with `split_audio_into_chunks.py`.
-    4. put data into two seperate directory named `0` and `1`. `0` for non wakeword, `1` for wakeword. use `create_wakeword_jsons.py` to create train and test json
-    5. create a train and test json in this format...
-        ```
-        // make each sample is on a seperate line
-        {"key": "/path/to/audio/sample.wav, "label": 0}
-        {"key": "/path/to/audio/sample.wav, "label": 1}
-        ```
-
-2. train model
-    1. use `train.py` to train model
-    2. after model training us `optimize_graph.py` to create an optimized pytorch model
-
-3. test
-    1. test using the `engine.py` script
-
-
-## Speech Recognition
-[YouTube Video for Speech Recognition](https://www.youtube.com/watch?v=YereI6Gn3bM&list=PL5rWfvZIL-NpFXM9nFr15RmEEh4F4ePZW&index=2)
-
-### scripts
-For more details make sure to visit these files to look at script arguments and description
-
-`speechrecognition/scripts/mimic_create_jsons.py`is used to create the train.json and test.json files with Mimic Recording Studio 
+# Projet Voice Assistant
 
-`speechrecognition/scripts/commonvoice_create_jsons.py`is used to convert mp3 into wav and create the train.json and test.json files with the Commonvoice dataset
+Ce projet contient du code et des ressources pour implémenter un assistant vocal.
 
-`spechrecognition/neuralnet/train.py` is used to train the model
+## Structure des Répertoires
 
-`spechrecognition/neuralnet/optimize_graph.py` is used to create a production ready graph that can be used in `engine.py`
+### fun\arnold_audio
+- Contient des fichiers audio ou des ressources liées à la voix d'Arnold.
 
-`spechrecognition/engine.py` is used to demo the speech recognizer model
+### VoiceAssistant\nlu
+- **images**: Dossier contenant des images liées au réseau de neurones ou au jeu de données.
+- **neuralnet**: Contient des fichiers et des scripts pour l'implémentation du réseau de neurones.
+- **nlu_dataset**: Jeu de données utilisé pour l'entraînement ou les tests des modèles NLU.
+- **scripts**: 
+    - `__init__.py`: Fichier d'initialisation pour les packages Python.
+    - `bash.sh`: Script bash pour diverses tâches, par exemple, la configuration de l'environnement, l'exécution de tâches, etc.
+    - `engine.py`: Script principal gérant les fonctionnalités principales du NLU.
+    - `meta_data.bin`: Fichier binaire contenant des métadonnées.
+    - `readme.md`: Documentation fournissant des détails sur la partie NLU du projet.
+    - `requirements.txt`: Liste des dépendances requises pour exécuter la partie NLU.
 
-`spechrecognition/demo/demo.py` is used to demo the speech recognizer model with a Web GUI
+### speechrecognition
+- **demo**:
+   - Contient des fichiers de démonstration et des modèles illustrant le fonctionnement de la reconnaissance vocale.
+   - `demo.html`: Modèle HTML pour l'interface de démonstration.
+   - `demo.py`: Script Python pour les fonctionnalités de démonstration.
 
+- **templates**:
+  - Modèles HTML utilisés dans différentes parties du module de reconnaissance vocale.
 
-### Steps for pretraining or finetuning speech recognition model
+- **neuralnet**:
+  - Fichiers liés à l'implémentation du réseau de neurones dans le module de reconnaissance vocale, notamment :
+      * dataset.py: Gère le chargement et la prétraitement des données du jeu de données,
+      * model.py: Définit l'architecture du modèle de réseau de neurones,
+      * optimize_graph.py: Script pour optimiser le graphe de calcul,
+      * scorer.py: Évalue les performances du modèle,
+      * train.dv: Script d'entraînement ou de validation des données.
 
-The pretrained model can be found here at this [google drive](https://drive.google.com/drive/folders/14ljfpvisK1tz8fvFYETbdWqR3lOmJ_2Y?usp=sharing)
+## Pour Commencer
 
-1. Collect your own data - the pretrain model was trained on common voice. To make this model work for you, you can collect about an hour or so of your own voice using the [Mimic Recording Studio](https://github.com/MycroftAI/mimic-recording-studio). They have prompts that you can read from.
-    1. collect data using mimic recording studio, or your own dataset.
-    2. be sure to chop up your audio into 5 - 16 seconds chunks max.
-    3. create a train and test json in this format...
-    ```
-        // make each sample is on a seperate line
-        {"key": "/path/to/audio/speech.wav, "text": "this is your text"}
-        {"key": "/path/to/audio/speech.wav, "text": "another text example"}
-    ```
-    use `mimic_create_jsons.py` to create train and test json's with the data from Mimic Recording Studio.
-
-        python mimic_create_jsons.py --file_folder_directory /dir/to/the/folder/with/the/studio/data --save_json_path /path/where/you/want/them/saved
+Suivez ces étapes pour configurer et exécuter le projet :
 
-    (The Mimic Recording Studio files are usually stored in ~/mimic-recording-studio-master/backend/audio_files/[random_string].) 
-
-    use `commonvoice_create_jsons.py` to convert from mp3 to wav and to create train and test json's with the data from Commonvoice by Mozilla
-
-        python commonvoice_create_jsons.py --file_path /path/to/commonvoice/file/.tsv --save_json_path /path/where/you/want/them/saved 
-
-    if you dont want to convert use `--not-convert` 
-
-2. Train model
-    1. use `train.py` to fine tune. checkout the [train.py](https://github.com/LearnedVector/A-Hackers-AI-Voice-Assistant/blob/master/VoiceAssistant/speechrecognition/neuralnet/train.py#L115) argparse for other arguments
-    ```
-       python train.py --train_file /path/to/train/json --valid_file /path/to/valid/json --load_model_from /path/to/pretrain/speechrecognition.ckpt
-    ```
-   2. To train from scratch omit the `--load_model_from` argument in train.py
-   3. after model training us `optimize_graph.py` to create a frozen optimized pytorch model. The pretrained optimized torch model can be found in the google drive link as `speechrecognition.zip`
+1. Clonez ce dépôt localement,
+2. Accédez au répertoire VoiceAssistant\nlu,
+3. Installez les dépendances avec la commande pip install requirements.txt,
+4. Exécutez bash.sh pour effectuer les tâches de configuration initiale 
+5. Lancez engine.py pour démarrer le moteur NLU,
 
+Pour tester la reconnaissance vocale, accédez au répertoire speechrecognition\demo et exécutez demo.py après avoir configuré l'environnement comme indiqué dans le readme.md à l'intérieur du dossier nlu.
 
-3. test
-    1. test using the `engine.py` script
 
-## Raspberry pi
-documenation to get this running on rpi is in progress...
diff --git a/VoiceAssistant/nlu/neuralnet/model.py b/VoiceAssistant/nlu/neuralnet/model.py
@@ -12,7 +12,7 @@ def __init__(self,num_entity, num_intent, num_scenario):
             config.BASE_MODEL
         ) 
         self.drop_1 = nn.Dropout(0.3)
-        self.drop_2 = nn.Dropout(0.3)
+        self.drop_2 = nn.Dropout(0.2)
         self.drop_3 = nn.Dropout(0.3)
 
         self.out_entity = nn.Linear(768,self.num_entity)

diff --git a/VoiceAssistant/speechrecognition/demo/templates/demo.html b/VoiceAssistant/speechrecognition/demo/templates/demo.html
@@ -2,7 +2,7 @@
 <html>
 
 <head>
-  <title>Speech Recognition Demo</title>
+  <title>Speech Recognition App</title>
   <meta charset="utf-8" />
   <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
   <meta name="viewport" content="width=device-width, initial-scale=1" />

diff --git a/VoiceAssistant/speechrecognition/neuralnet/model.py b/VoiceAssistant/speechrecognition/neuralnet/model.py
@@ -24,7 +24,7 @@ class SpeechRecognition(nn.Module):
     hyper_parameters = {
         "num_classes": 29,
         "n_feats": 81,
-        "dropout": 0.1,
+        "dropout": 0.2,
         "hidden_size": 1024,
         "num_layers": 1
     }

diff --git a/VoiceAssistant/speechrecognition/neuralnet/train.py b/VoiceAssistant/speechrecognition/neuralnet/train.py
@@ -29,7 +29,7 @@ def configure_optimizers(self):
         self.optimizer = optim.AdamW(self.model.parameters(), self.args.learning_rate)
         self.scheduler = optim.lr_scheduler.ReduceLROnPlateau(
                                         self.optimizer, mode='min',
-                                        factor=0.50, patience=6)
+                                        factor=0.50, patience=7)
         return [self.optimizer], [self.scheduler]
 
     def step(self, batch):