Accurately localizing sound sources and mapping them into the environment is crucial for human–robot interaction and augmented reality. Existing methods typically assume that other sensors, such as cameras or LiDAR, have been spatially calibrated with the microphone array, allowing their measured poses (i.e., orientation and translation) to be converted into the microphone array poses. However, estimating the relative pose between sensors is a complex task. In this work, we propose a 3D sound source mapping method using an acoustic camera with unknown relative poses between the camera and the microphone array.
We have tested the library in Ubuntu 18.04 (ROS1), but it should be easy to compile in other platforms. A powerful computer will ensure more stable and accurate results.
We use Pangolin for visualization and user interface. Dowload and install instructions can be found at: https://github.com/stevenlovegrove/Pangolin.
We use OpenCV to manipulate images and features. Dowload and install instructions can be found at: http://opencv.org. Tested with OpenCV 3.2.
Required by g2o (see below). Download and install instructions can be found at: http://eigen.tuxfamily.org. Required at least 3.1.0.
We use modified versions of the DBoW2 library to perform place recognition and g2o library to perform non-linear optimizations. Both modified libraries (which are BSD) are included in the Thirdparty folder.
We provide examples to process the input of RGB-D camera using ROS.
./build.sh
# and
./build_ros.shOpen a terminal:
roscorerosrun ORB_SLAM2 RGBD Vocabulary/ORBvoc.txt Examples/RGB-D/orbbec335l.yamlrosrun ORB_SLAM2 SSM.py4. Play ROSBAG
rosbag play DESK_SOUND_SOURCE.bag /camera/color/image_raw:=/camera/rgb/image_raw /camera/depth/image_raw:=/camera/depth_registered/image_rawThe source code and dataset are released under GPLv3 license.
Please cite the paper if you feel helpful for your research.
@INPROCEEDINGS{26SSM,
author={Wang, Jiang and Shi, Runwu and Li, Jiahui and Kong, He and Nakadai, Kazuhiro},
booktitle={ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Manifold-Optimization-Based 3D Sound Source Mapping with Unknown Camera-Microphone Array Relative Pose},
year={2026},
pages={21026-21030},
keywords={Location awareness;Mobile communication;Protocols;HTTP;Indoor environment;Machine learning;Boosting;Deep learning;Reinforcement learning;Learning systems;Sound source mapping;acoustic camera;optimization},
doi={10.1109/ICASSP55912.2026.11462576}}
}


