An attention Long Short-Term Memory based system for automatic classification of speech intelligibility Articles uri icon

publication date

  • November 2020

start page

  • 1

end page

  • 8

issue

  • 103976

volume

  • 96

International Standard Serial Number (ISSN)

  • 0952-1976

Electronic International Standard Serial Number (EISSN)

  • 1873-6769

abstract

  • Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the speech intelligibility level in this latter case. The main contribution of our research on this topic is the use of Long Short-Term Memory (LSTM) networks with log-mel spectrograms as input features for this purpose. In addition, this LSTM-based system is further enhanced by the incorporation of a simple attention mechanism that is able to determine the more relevant frames to this task. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. Results show that the attention LSTM architecture outperforms both, a reference Support Vector Machine (SVM)-based system with hand-crafted features and a LSTM-based system with Mean-Pooling.

keywords

  • speech intelligibility; dysarthria; long short-term memory (lstm); attention model; machine learning