Synchrony-Based Feature Extraction for Robust Automatic Speech Recognition Articles
Overview
published in
- IEEE SIGNAL PROCESSING LETTERS Journal
publication date
- August 2017
start page
- 1158
end page
- 1162
issue
- 8
volume
- 24
Digital Object Identifier (DOI)
International Standard Serial Number (ISSN)
- 1070-9908
Electronic International Standard Serial Number (EISSN)
- 1558-2361
abstract
- This letter discusses the application of models of temporal patterns of auditory-nerve firings to enhance robustness of automatic speech recognition systems. Most conventional feature extraction schemes (such as mel-frequency cepstral coefficients and perceptual linear processing coefficients) are based on shorttime energy in each frequency band, and the temporal patterns of auditory-nerve activity are discarded. We compare the impact on speech recognition accuracy of several types of feature extraction schemes based on the putative synchrony of auditory-nerve activity, including feature extraction based on a modified version of the generalized synchrony detector proposed by Seneff, and a modified version of the averaged localized synchrony response proposed by Young and Sachs. It was found that the use of features based on auditory-nerve synchrony can indeed improve speech recognition accuracy in the presence of additive noise based on experiments using multiple standard speech databases. Recognition accuracy obtained using the synchrony-based features is further increased if some form of noise removal is applied to the signal before the synchrony measure is estimated. Signal processing for noise removal based on the noise suppression that is a part of PNCC feature extraction is more effective toward this end than conventional spectral subtraction.
Classification
keywords
- auditory modeling; auditory synchrony; feature extraction; physiological modeling; robust speech recognition; auditory-nerve fibers