Coarse-to-Fine Speech Retrieval Using Framewise Phoneme Probabilities

Liavaag, Harald

Liavaag, Harald

Master thesis

Åpne

346675_FULLTEXT01.pdf (Låst)

346675_ATTACHMENT01.zip (Låst)

346675_COVER01.pdf (Låst)

Permanent lenke

http://hdl.handle.net/11250/250053

Utgivelsesdato

2006

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6778]

Sammendrag

Archives of digital audio and video expand, and people need to find specific information within those archives. This is why it becomes clear that a highly efficient method of searching recorded media is required. The metadata that currently tag audio information such as title, date of recording, subject or person, is not sufficient for the accurate and rapid retrieval of specifically requested information. The field of media retrieval has achieved relatively little attention, but lately, the interest has increased. New techniques to support content-based access to archives of digital audio and video information are therefore evolving and receive much attention from the research community. Recently, a novel technique for speech retrieval was presented. The technique consists of a method to represent speech as a sequence of framewise phoneme probabilities and a new method to search speech. The search method suggested is able to use the framewise phoneme probabilities to determine the most closely matched segment of speech for a spoken query. This thesis first looks at methods to improve the retrieval performance of the proposed dynamic programming algorithm. The proposed dynamic programming algorithm finds 65% of the wanted hits among the top 10 results, using our test set consisting of 1,132 speech files. The thesis then deals with ways of increasing the speed of the search. The proposed method gives somewhat promising results, reducing the response time by 11% without affecting the retrieval effectiveness.

Utgiver

Institutt for datateknikk og informasjonsvitenskap