- IEEE Access Journal
- May 2019
Digital Object Identifier (DOI)
International Standard Serial Number (ISSN)
Electronic International Standard Serial Number (EISSN)
- Individuals with sensory impairment (hearing or visual) encounter serious communication barriers within society and the world around them. These barriers hinder the communication process and make access to information an obstacle they must overcome on a daily basis. In this context, one of the most common complaints made by the Television (TV) users with sensory impairment is the lack of synchronism between audio and subtitles in some types of programs. In addition, synchronization remains one of the most significant factors in audience perception of quality in live-originated TV subtitles for the deaf and hard of hearing. This paper introduces the Sub-Sync framework intended for use in automatic synchronization of audio-visual contents and subtitles, taking advantage of current well-known techniques used in symbol sequences alignment. In this particular case, these symbol sequences are the subtitles produced by the broadcaster subtitling system and the word flow generated by an automatic speech recognizing the procedure. The goal of Sub-Sync is to address the lack of synchronism that occurs in the subtitles when produced during the broadcast of live TV programs or other programs that have some improvised parts. Furthermore, it also aims to resolve the problematic interphase of synchronized and unsynchronized parts of mixed type programs. In addition, the framework is able to synchronize the subtitles even when they do not correspond literally to the original audio and/or the audio cannot be completely transcribed by an automatic process. Sub-Sync has been successfully tested in different live broadcasts, including mixed programs, in which the synchronized parts (recorded, scripted) are interspersed with desynchronized (improvised) ones.
- accessibility; tv broadcasting; algorithm design and analysis; automatic speech recognition; deep neural-networks; speech; text; system