The robustness of echoic log-surprise auditory saliency detection Articles uri icon

publication date

  • November 2018

start page

  • 72083

end page

  • 72093

volume

  • 6

International Standard Serial Number (ISSN)

  • 2169-3536

abstract

  • The concept of saliency describes how relevant a stimulus is for humans. This phenomenon hasbeen studied under different perspectives and modalities, such as audio, visual, or both. It has been employedin intelligent systems to interact with their environment in an attempt to emulate or even outperform humanbehavior in tasks, such as surveillance and alarm systems or even robotics. In this paper, we focus on theaural modality and our goal consists in measuring the robustness of Echoic log-surprise in comparison with aset of auditory saliency techniques when tested on noisy environments for the task of saliency detection. Theacoustic saliency methods that we have analyzed include Kalinli's saliency model, Bayesian log-surprise,and our proposed algorithm, Echoic log-surprise. This last method combines an unsupervised approachbased on the Bayesian log-surprise and the biological concept of echoic or auditory sensory memory bymeans of a statistical fusion scheme, where the use of different distance metrics or statistical divergences,such as Renyi's or Jensen-Shannon's among others, are considered. Additionally, for comparison purposes,we have also compared some classical onset detection techniques, such as those based on voice activity detec-tion or energy thresholding. Results show that Echoic log-surprise outperforms the detection capabilities ofthe rest of the techniques analyzed in this paper under a great variety of noises and signal-to-noise ratios,corroborating its robustness in noisy environments. In particular, our algorithm with the Jensen-Shannonfusion scheme produces the best F-scores. With the aim of better understanding the behavior of Echoic log-surprise, we have also studied the influence of its control parameters, depth and memory, and their influenceat different noise levels.

keywords

  • acoustic saliency; echoic memory; multi-scale; statistical divergence; jensen-shannon;acoustic event detection