Post-processing Automatic Speech Recognition Transcriptions: A Study for Investigative Interviews

Moe, Marit Kristine

dc.contributor.advisor	Porter, Kyle
dc.contributor.advisor	Beka, Thomas
dc.contributor.author	Moe, Marit Kristine
dc.date.accessioned	2023-09-20T17:21:27Z
dc.date.available	2023-09-20T17:21:27Z
dc.date.issued	2023
dc.identifier	no.ntnu:inspera:146715749:35266978
dc.identifier.uri	https://hdl.handle.net/11250/3090935
dc.description.abstract	Politiet er interessert i å effektivisere transkripsjonen av avhør. I dag transkriberes avhør manuelt eller kun et sammendrag blir skrevet. Håpet er å kunne forbedre leseligheten og kvaliteten av automatisk talegjenkjennings transkripsjoner ved å konvertere dem fra muntlig til skriftlig format slik at de kan bli anvendt av Politiet for å gjøre prosessen mindre manuell. I masteroppgaven vil det bli undersøkt hvordan man kan legge til store forbokstaver i egennavn, legge til komma til lister av egennavn og konvertere tall og forkortelser til det skriftlige domenet. For å kunne legge til stor forbokstav og komma til lister, blir en ny fremgangsmåte ved å benytte seg av en modell for navnegjenkjenning utprøvd. For å konvertere tall og forkortelser blir en invers tekstnormaliseringstilnærming vurdert ved å ta i bruk Stortingskorpusets standardiseringsskript. Resultatene viser en svak forbedring på 1,38 % gjennomsnittlig ordfeilrate og 1,69 % gjennomsnittlig tegnfeilrate på Utstillingskorpuset. Den teksten med flest egennavn fikk en forbedring på nesten 10 % ordfeilrate, mens den teksten som inneholdt flest tall ble forbedret med 13 % tegnfeilrate. Leseligheten til transkripsjonene ble forbedret, men dersom Politiet vil anvende den automatiske talegjenkjennings modellen og etterbehandlingsstegene fra masteroppgaven må det påregnes en manuell korrigering.
dc.description.abstract	The Norwegian Police is interested in creating an efficient transcription process for investigative interviews. Today, the transcription is done manually, or just a summary is written. The hope is to be able to improve the readability and quality of Automatic Speech Recognition (ASR) transcriptions by converting them from oral to written form, so they can be adapted by the Police to make the transcription process less manual. In this master's thesis capitalization of proper nouns, adding of commas, and conversion of numbers and abbreviations to the written domain are investigated. For the capitalization and adding of commas, a new approach utilizing Named Entity Recognition is tested. For the conversion of numbers and abbreviations, an Inverse Text Normalization approach is considered utilizing the Norwegian Parliamentary Speech Corpus's standardization scripts. The evaluation shows a slight improvement of 1.38% for the average Word Error Rate (WER) and 1,69% for the average Character Error Rate (CER) of the Exhibition Corpus. For the text in the dataset with the most proper nouns, the WER was improved by almost 10%. While the text with the most numbers in it got an improvement of over 13% CER. The readability of the transcriptions is improved, but if the ASR model and the post-processing sequence should be applied by the Police, a manual correction must be expected.
dc.language	eng
dc.publisher	NTNU
dc.title	Post-processing Automatic Speech Recognition Transcriptions: A Study for Investigative Interviews
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:146715749:3526 ...
Størrelse:: 7.284Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2525]

Vis enkel innførsel