Modal Expansion-based Data Generation Approach for Deep Learning-Enabled Sound Source Localization in a Small Enclosure
This repository contains the python implementation for the dataset generation part of the paper "Modal Expansion-based Data Generation Approach for Deep Learning-Enabled Sound Source Localization in a Small Enclosure".

- Source signals: LibriSpeech
These datasets mentioned above can be downloaded from this OneDrive link.
The data directory structure is shown as follows:
.
|---data
|---LibriSpeech
|---dev-clean
|---test-clean
|---train-clean-100
|---test (generated)
|---train (generated)
|---dev (generated)
We strongly recommend that you can use VSCode and Docker for this project, it can save you much time😁! Note that the related configurations has already been within .devcontainer. The detail information can be found in this Tutorial_for_Vscode&Dokcer.
The realted configurations are all saved in config/.
- The
dataSIMU.yamlis used to configure the data generation.
You can change the value of these items based on your need.
Note: Do not forget to install webrtcvad.
- Data Generation
Generate the training/val/test data:
bash scripts/datasimu.shIf you find our work useful in your research, please consider citing:
@article{pi2026modal,
title={Modal expansion-based data generation approach for deep learning-enabled sound source localization in a small enclosure},
author={Pi, Rendong and Yu, Xiang},
journal={Applied Acoustics},
volume={241},
pages={111023},
year={2026},
publisher={Elsevier}
}