Using the structural content of documents to automatically generate quality metadata
Abstract
During the last decades, document sharing has become vastly more available for the general public, with large document collections being made generally available on the internet and inside of organizations on intranets. In addition, each of us has an everincreasing archive of private digital documents. At the same time efforts to enable more efficient document retrieval have only succeeded marginally. This makes finding the right document like looking for a needle in the haystack. Just now it is a bigger haystack. This lack of overview of existing document resources results in large amounts of scarce human resources that are still being used to create similar resources.
A key reason to why we are faced with this challenge is that few documents receive a sufficient metadata description in order to enable efficient retrieval. Too often the document metadata is insufficient or even incorrect. Few document creators are aware of describing their documents with metadata. Trained librarians and archivists can assist authors to create and publish metadata, but this is a costly and time-consuming process. Advanced metadata formats, such as the IEEE LOM, enable detailed and precise metadata descriptions. This format is challenging to use and the potential in the format is often not leveraged. Document formats that require such metadata, e.g. SCORM Learning Objects (LOs), are not being used to their potential due to the challenges of creating metadata.
This thesis shows how Automatic Metadata Generation (AMG) can stand as a foundation for creation, publishing and discovery of document resources with rich and correct metadata descriptions. This thesis shows how high quality metadata can be created automatically using the documents themselves and contextual data sources. Finally, this thesis shows how metadata descriptions can be used alongside the original document to create SCORM LOs to enable sharing of educational resources with educational metadata descriptions.
The main contributions by this thesis are:
C1: Establishing an overview of research literature, projects and products using AMG and the quality of their generated metadata.
C2: Establishing that AMG efforts can be combined to expand the range of elements and entities that can be generated, but also to increase the quality of generated entities.
C3: Establishing that AMG efforts can generate high quality metadata from nonhomogeneous document collections, vastly expanding the practical usefulness of AMG.
C4: Establishing that AMG efforts can contribute extensively in promoting sharing of knowledge with the creation of sharable SCORM LOs containing the educational resources themselves and extensive metadata descriptions to enable efficient location and use.
Has parts
Edvardsen, Lars Fredrik Hoimyr; Solvberg, Ingeborg Torvik. Metadata challenges in introducing the global IEEE learning object metadata (LOM) standard in a local environment. Proceedings of the Third International Conference on Web Information Systems and Technologies, Vol SeBeG/eL: SOCIETY, E-BUSINESS AND E-GOVERNMENT, E-LEARNING - : 427-432, 2007.Edvardsen, Lars Fredrik Hoimyr; Solvberg, Ingeborg Torvik; Aalberg, Trond; Traetteberg, Hallvard. Automatically Generating High Quality Metadata by Analyzing the Document Code of Common File Types. PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: 29-38, 2009. 10.1145/1555400.1555406.
Edvardsen, Lars Fredrik Hoimyr; Solvberg, Ingeborg Torvik; Aalberg, Trond; Traetteberg, Hallvard. USING THE STRUCTURAL CONTENT OF DOCUMENTS TO AUTOMATICALLY GENERATE QUALITY METADATA. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES: 354-363, 2009.
Edvardsen, Lars Fredrik Høimyr; Sølvberg, Ingeborg Torvik. Could Automatic Metadata Generation be a digital solution for speedier and easier document publishing?. IEEE International Conference on Digital Ecosystems and Technologies. (ISSN 2150-4938). 4: 216-221, 2010.
Edvardsen, Lars Fredrik Hoimyr; Solvberg, Ingeborg Torvik; Aalberg, Trond; Traetteberg, Hallvard. Using Automatic Metadata Generation to reduce the knowledge and time requirements for making SCORM Learning Objects. Proceedings of IEEE DEST 2009: 364-369, 2009. 10.1109/DEST.2009.5276729.
Sølvberg, Ingeborg Torvik; Edvardsen, Lars Fredrik Høimyr. Creating Metadata is a Costly Manual Process - And It can be Automated. Digital Libraries and Knowledge Organization: 356-362, 2012.