Vis enkel innførsel

dc.contributor.authorRugayan, Janine Lizbeth Cabrera
dc.contributor.authorSvendsen, Torbjørn Karl
dc.contributor.authorSalvi, Giampiero
dc.date.accessioned2023-02-17T09:09:16Z
dc.date.available2023-02-17T09:09:16Z
dc.date.created2022-09-28T08:06:57Z
dc.date.issued2022
dc.identifier.issn2308-457X
dc.identifier.urihttps://hdl.handle.net/11250/3051822
dc.description.abstractEvaluation metrics are important for quanitfying the performance of Automatic Speech Recognition (ASR) systems. However, the widely used word error rate (WER) captures errors at the word-level only and weighs each error equally, which makes it insufficient to discern ASR system performance for downstream tasks such as Natural Language Understanding (NLU) or information retrieval. We explore in this paper a more robust and discriminative evaluation metric for Norwegian ASR systems through the use of semantic information modeled by a transformer-based language model. We propose Aligned Semantic Distance (ASD) which employs dynamic programming to quantify the similarity between the reference and hypothesis text. First, embedding vectors are generated using the NorBERT model. Afterwards, the minimum global distance of the optimal alignment between these vectors is obtained and normalized by the sequence length of the reference embedding vector. In addition, we present results using Semantic Distance (SemDist), and compare them with ASD. Results show that for the same WER, ASD and SemDist values can vary significantly, thus, exemplifying that not all recognition errors can be considered equally important. We investigate the resulting data, and present examples which demonstrate the nuances of both metrics in evaluating various transcription errors.en_US
dc.language.isoengen_US
dc.publisherInternational Speech Communication Associationen_US
dc.titleSemantically Meaningful Metrics for Norwegian ASR Systemsen_US
dc.title.alternativeSemantically Meaningful Metrics for Norwegian ASR Systemsen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionacceptedVersionen_US
dc.source.journalInterspeech (USB)en_US
dc.identifier.doi10.21437/Interspeech.2022-817
dc.identifier.cristin2056124
dc.relation.projectNorges forskningsråd: 322964en_US
dc.relation.projectThe EEA and Norway Grants Fund for Regional Cooperation: CZ-RESEARCH-0022en_US
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.fulltextpostprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel