Audiovisual Contents Segmentation
MetadataShow full item record
The objective of this thesis is to detect high level semantic ideas to help to impose a structure on television talk shows. Indexing TV-shows is a subject that, to our knowledge, is rarely talked about in the scientific community.There is no common understanding of what this imposed structure should look like. We can say that the purpose is to organise the audiovisual content into sections that convey a specific information. It thus encompasses issues as diverse as scene segmentation, speech noise detection, speaker identification, etc. The basic problem of structuring is the gap between the information extracted from visual data flow and human interpretation made by the user of these data. Numerous studies have examined the organisation of highly structured video content. Thus, the state of the art has many studies on sport or newscast transmissions. Our goal is to detect key audiovisual events using a variety of descriptors and generic classifiers. We propose a generic approach that is able to assess all TV-show indexing problems. This enables an operator to use one single tool to infer a logical structure. Our approach can be considered as ``semi-automatic'' in the sense that the training data is collected on the fly by the operator who is asked to arbitrarily select one video excerpt of each concept involved. We have assessed a wide selection of audio and video features, used MKL as a feature selection algorithm and then built various content detectors and segmentors useful for imposing broad semantic classes on television data.This master's thesis was set forth by TELECOM ParisTech and was begun there March 1, 2010. This final report was submitted to TELECOM ParisTech, NTNU and Institute EURECOM August 29, 2010.