Echoic log-surprise: A multi-scale scheme for acoustic saliency detection Articles uri icon

publication date

  • December 2018

start page

  • 255

end page

  • 266

volume

  • 114

International Standard Serial Number (ISSN)

  • 0957-4174

Electronic International Standard Serial Number (EISSN)

  • 1873-6793

abstract

  • Perceptual signals such as acoustic or visual cues carry a massive amount of information. From a human perspective, this problem is solved by means of cognitive mechanisms related to attention. In particular, saliency is a property of particular stimuli that makes them stand from others to allow the brain to take decisions about their relevance in the process of exploring the world. For artificial intelligence systems it is advantageous to mimic these mechanisms. Visual saliency algorithms have been successfully employed in tasks such as medical diagnosis, detection of violent scenes, environment understanding made by robots, etc. In contrast, computational models of the acoustic saliency mechanisms are less extended. In this context, we propose a novel acoustic saliency algorithm to be used by intelligent and expert systems facing tasks such as sound detection and classification, early alarm, surveillance, robotic exploration of the surroundings, among many other applications. This technique, we termed echoic log-surprise, combines an unsupervised statistical approach based on Bayesian log-surprise and the biological concept of echoic or Auditory Sensory Memory. Our algorithm computes several independent log-surprise cues in parallel considering a wide range of memory values, with the aim of leveraging saliency information from different temporal scales. Then, we explore several statistical metrics to combine these multi-scale signals in a single temporal saliency signal including Renyi entropy, Jensen-Shannon divergence, Cramer or Bhattacharyya distances. We have adopted Acoustic Event Detection tasks as adequate proxies to test its performance.

keywords

  • acoustic saliency; echoic memory; multi-scale; statistical divergence; jensen-shannon; acoustic event detection