Vis enkel innførsel

dc.contributor.authorSachdeva, Shubam
dc.contributor.authorRuan, Haoyao
dc.contributor.authorHamarneh, Ghassan
dc.contributor.authorBehne, Dawn Marie
dc.contributor.authorJongman, Allard
dc.contributor.authorSereno, Joan
dc.contributor.authorWang, Yue
dc.date.accessioned2023-05-15T13:44:15Z
dc.date.available2023-05-15T13:44:15Z
dc.date.created2023-01-09T22:58:04Z
dc.date.issued2023
dc.identifier.citationInternational Journal of Speech Technology. 2023, 26 163-184.en_US
dc.identifier.issn1381-2416
dc.identifier.urihttps://hdl.handle.net/11250/3067981
dc.description.abstractClearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker’s visual speech style; (3) we introduce “displacement factor” as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titlePlain-to-clear speech video conversion for enhanced intelligibilityen_US
dc.title.alternativePlain-to-clear speech video conversion for enhanced intelligibilityen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.source.pagenumber163-184en_US
dc.source.volume26en_US
dc.source.journalInternational Journal of Speech Technologyen_US
dc.identifier.doi10.1007/s10772-023-10018-z
dc.identifier.cristin2103692
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal