-
Notifications
You must be signed in to change notification settings - Fork 0
Speaker diarization final #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Martaesplo
wants to merge
20
commits into
main
Choose a base branch
from
speaker-diarization-final
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 18 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
22deb26
diarization to run locally
Martaesplo abd2290
Update README.md
Martaesplo 53ed5dd
notebook
Martaesplo d48111c
visualization fix
Martaesplo 46fdaeb
Update README.md
Martaesplo d14020f
fix imports notebook
Martaesplo a14b520
Merge branch 'speaker-diarization-final' of github.com:beeldengeluid/…
Martaesplo 0f55cb3
fix plot notebook
Martaesplo 61a057a
notebook fix
Martaesplo 29e3578
Applied black formatting
e07618d
flake8 and black style fixed
Martaesplo 8a79122
file content extended explanation
Martaesplo ffcd722
Update README.md
Martaesplo adc2f6c
cudnn error solved notebook
Martaesplo eb8613e
fixed cudnn error
Martaesplo 9b4f4c0
dependancies
Martaesplo 85c7d3d
Fixed merge conflict
9892007
Fixed merge conflict
9df0bc6
review changes
Martaesplo a28c0bb
Update README.md
Martaesplo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| /data | ||
| /misc | ||
| /model | ||
| /config | ||
| /tests | ||
| .venv | ||
| .flake8 | ||
| .git | ||
| .github | ||
| .mypy_cache | ||
| .pytest_cache | ||
| .coverage | ||
| __pycache__ | ||
| s3-creds.env | ||
| .vscode |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,56 @@ | ||
| # use .flake8 until we can move this config to pyproject.toml (not possible yet (27/02/2024) according to issue below) | ||
| # https://github.com/PyCQA/flake8/issues/234 | ||
|
|
||
| [flake8] | ||
| select = | ||
| # B: bugbear warnings | ||
| B, | ||
|
|
||
| # B950: bugbear max-linelength warning | ||
| # as suggested in the black docs | ||
| # https://github.com/psf/black/blob/d038a24ca200da9dacc1dcb05090c9e5b45b7869/docs/the_black_code_style/current_style.md#line-length | ||
| B950, | ||
|
|
||
| # C: currently only C901, mccabe code complexity | ||
| C, | ||
|
|
||
| # E: pycodestyle errors | ||
| E, | ||
|
|
||
| # F: flake8 codes for pyflakes | ||
| F, | ||
|
|
||
| # W: pycodestyle warnings | ||
| W, | ||
|
|
||
| extend-ignore = | ||
| # E203: pycodestyle's "whitespace before ',', ';' or ':'" error | ||
| # ignored as suggested in the black docs | ||
| # https://github.com/psf/black/blob/d038a24ca200da9dacc1dcb05090c9e5b45b7869/docs/the_black_code_style/current_style.md#slices | ||
| E203, | ||
|
|
||
| # E501: pycodestyle's "line too long (82 > 79) characters" error | ||
| # ignored in favor of B950 as suggested in the black docs | ||
| # https://github.com/psf/black/blob/d038a24ca200da9dacc1dcb05090c9e5b45b7869/docs/the_black_code_style/current_style.md#line-length | ||
| E501, | ||
|
|
||
| # W503 line break before binary operator | ||
| W503, | ||
|
|
||
| # set max-line-length to be black compatible, as suggested in the black docs | ||
| # https://github.com/psf/black/blob/d038a24ca200da9dacc1dcb05090c9e5b45b7869/docs/the_black_code_style/current_style.md#line-length | ||
| max-line-length = 88 | ||
|
|
||
| # set max cyclomatic complexity for mccabe plugin | ||
| max-complexity = 10 | ||
|
|
||
| # show total number of errors, set exit code to 1 if tot is not empty | ||
| count = True | ||
|
|
||
| # show the source generating each error or warning | ||
| show-source = True | ||
|
|
||
| # count errors and warnings | ||
| statistics = True | ||
| exclude = | ||
| .venv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| __pycache__/ | ||
| *.wav | ||
| *.mp4 | ||
| *_audio_segments | ||
| *_video_segments | ||
| *.txt | ||
| temp_outputs | ||
| speaker_transcript.txt |
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please remove anything you have not worked on from the PR/branch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| FROM docker.io/python:3.10 | ||
|
|
||
| # Create dirs for: | ||
| # - Injecting config.yml: /root/.DANE | ||
| # - Mount point for input & output files: /mnt/dane-fs | ||
| # - Storing the source code: /src | ||
| # - Storing the input file to be used while testing: /src/data | ||
| RUN mkdir /root/.DANE /mnt/dane-fs /src /data | ||
|
|
||
| WORKDIR /src | ||
|
|
||
| ENV POETRY_NO_INTERACTION=1 \ | ||
| POETRY_VIRTUALENVS_IN_PROJECT=1 \ | ||
| POETRY_VIRTUALENVS_CREATE=1 \ | ||
| POETRY_CACHE_DIR=/tmp/poetry_cache | ||
|
|
||
| RUN pip install poetry==1.8.2 | ||
|
|
||
| COPY pyproject.toml poetry.lock ./ | ||
| RUN poetry install --without dev --no-root && rm -rf $POETRY_CACHE_DIR | ||
|
|
||
| # Write provenance info about software versions to file | ||
| RUN echo "dane-example-worker;https://github.com/beeldengeluid/dane-example-worker/commit/$(git rev-parse HEAD)" >> /software_provenance.txt | ||
|
|
||
| COPY . /src | ||
|
|
||
| ENTRYPOINT ["./docker-entrypoint.sh"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,46 @@ | ||
| # dane-speaker-diarisation-worker | ||
| # dane-speaker-diarisation-worker | ||
|
|
||
| ## File description | ||
| The worker files are copied from the example worker and not modified. Find bellow a description of the main additions: | ||
|
|
||
| ### **helpers.py** | ||
| - Vocal_extrcation: Function to perform vocal extraction with htdemucs. As input it is given the path to the input audio file. | ||
| - Text_speaker_map: Function to read the predicted speaker label for each segment of speech, and the predicted timestamps from the transcription and match them. Generating a readable file with each text segment and the corresponding speaker label. As input it takes the path to the input audio file and the collection of whisper's results, from which we are only interested on the word timestamps. | ||
| - Transcribe: Function to perform speech-to-text with faster_whisper. This function won't be needed once the ASR worker is completed. As input it takes the path to the input audio file, language (if known) of the input audio, model version, compute type and device. | ||
| - Cleanup: Function to remove all temporary results belonging to vocal_extraction, transcription and diarization; not the final output generated by the "text_speaker_map" function. As input it takes the path to the directory with the temporary results, "temp_outputs". | ||
|
|
||
| ### **diarize.py** | ||
| - Config_setup: Function to configure the MSDD module. In it there is specified the domain type either telephonic, meeting or general, as well as parameter settings regarding the speaker embedding models, clustering and VAD. As input it takes the output directory, namely "temp_outputs". | ||
| - Diarize: Function to perform speaker diarization using the MSDD module. Firstly, the audio is set to one channel for NeMo compatibility, subsequently, diarization is performed taking as input the configuration setup defined in the afore explained funciton. As input it takes the path to the directory for temporary results and the vocal target, which instead of being the raw input audio file, is the extracted vocals with htdemucs. | ||
|
|
||
| ### **torun.py** | ||
| With this file the speaker diarization pipeline can be ran on the input audio file that has to be specified inside, after the "audio_path" variable. Other settings that can be modified in this file include whether or not to perform vocal extraction, faster_whisper's model version, language and device. | ||
|
|
||
| The code is not adapted to run in the server, it can be ran locally. This will run in order: vocal extraction, transcription, speaker diarization, text to speaker mapping and finally a temporary results clean up. | ||
|
|
||
| ### **Transcript_diarize.ipynb** | ||
| This notebook contains the whole pipeline to be run on Google Colab for instance, giving the option to use a GPU if not available locally. Using the notebook can also circumvent possible dependancy issues when trying to run the pipeline locally, allowing for quick tests. | ||
|
|
||
| ## **Package Installation** | ||
| The following list should take care of the pipeline's dependancies: | ||
| ``` | ||
| pip install torch | ||
| pip install faster_whisper | ||
| pip install pydub | ||
| pip install wget | ||
| pip install nemo_toolkit[asr]==1.22.0 | ||
| pip install -U git+https://github.com/facebookresearch/demucs#egg=demucs | ||
| pip install cython | ||
| pip install transformers -U | ||
| ``` | ||
|
|
||
| Faster Whisper requires a tokenizers version <0.16, >=0.13 | ||
|
|
||
| `pip install tokenizers==0.15.2` | ||
|
|
||
| For issues related to libstdc++.so.6.0.30: https://stackoverflow.com/questions/73317676/importerror-usr-lib-aarch64-linux-gnu-libstdc-so-6-version-glibcxx-3-4-30. The first answer solved my error. | ||
|
|
||
| ## **Usage** | ||
| The pipeline can be run locally by doing `python torun.py`. | ||
|
|
||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add data and config dirs