dc.contributor.author | Sachdeva, Shubam | |
dc.contributor.author | Ruan, Haoyao | |
dc.contributor.author | Hamarneh, Ghassan | |
dc.contributor.author | Behne, Dawn Marie | |
dc.contributor.author | Jongman, Allard | |
dc.contributor.author | Sereno, Joan | |
dc.contributor.author | Wang, Yue | |
dc.date.accessioned | 2023-05-15T13:44:15Z | |
dc.date.available | 2023-05-15T13:44:15Z | |
dc.date.created | 2023-01-09T22:58:04Z | |
dc.date.issued | 2023 | |
dc.identifier.citation | International Journal of Speech Technology. 2023, 26 163-184. | en_US |
dc.identifier.issn | 1381-2416 | |
dc.identifier.uri | https://hdl.handle.net/11250/3067981 | |
dc.description.abstract | Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Springer | en_US |
dc.rights | Navngivelse 4.0 Internasjonal | * |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/deed.no | * |
dc.title | Plain-to-clear speech video conversion for enhanced intelligibility | en_US |
dc.title.alternative | Plain-to-clear speech video conversion for enhanced intelligibility | en_US |
dc.type | Peer reviewed | en_US |
dc.type | Journal article | en_US |
dc.description.version | publishedVersion | en_US |
dc.source.pagenumber | 163-184 | en_US |
dc.source.volume | 26 | en_US |
dc.source.journal | International Journal of Speech Technology | en_US |
dc.identifier.doi | 10.1007/s10772-023-10018-z | |
dc.identifier.cristin | 2103692 | |
cristin.ispublished | true | |
cristin.fulltext | original | |
cristin.qualitycode | 1 | |