Sub-Sync: automatic synchronization of subtitles in the broadcasting of true live programs in spanish Articles
Overview
published in
- IEEE Access Journal
publication date
- May 2019
start page
- 60968
end page
- 60983
volume
- 7
Digital Object Identifier (DOI)
full text
Electronic International Standard Serial Number (EISSN)
- 2169-3536
abstract
- Individuals with sensory impairment (hearing or visual) encounter serious communication barriers within society and the world around them. These barriers hinder the communication process and make access to information an obstacle they must overcome on a daily basis. In this context, one of the most common complaints made by the Television (TV) users with sensory impairment is the lack of synchronism between audio and subtitles in some types of programs. In addition, synchronization remains one of the most significant factors in audience perception of quality in live-originated TV subtitles for the deaf and hard of hearing. This paper introduces the Sub-Sync framework intended for use in automatic synchronization of audio-visual contents and subtitles, taking advantage of current well-known techniques used in symbol sequences alignment. In this particular case, these symbol sequences are the subtitles produced by the broadcaster subtitling system and the word flow generated by an automatic speech recognizing the procedure. The goal of Sub-Sync is to address the lack of synchronism that occurs in the subtitles when produced during the broadcast of live TV programs or other programs that have some improvised parts. Furthermore, it also aims to resolve the problematic interphase of synchronized and unsynchronized parts of mixed type programs. In addition, the framework is able to synchronize the subtitles even when they do not correspond literally to the original audio and/or the audio cannot be completely transcribed by an automatic process. Sub-Sync has been successfully tested in different live broadcasts, including mixed programs, in which the synchronized parts (recorded, scripted) are interspersed with desynchronized (improvised) ones.
Classification
keywords
- accessibility; tv broadcasting; algorithm design and analysis; automatic speech recognition; deep neural-networks; speech; text; system