XSL Conversion of Bibliographic Records from BIBSYS-MARC to FRBR
Abstract
In the work described in this report, we have looked at the use of EXtensible Stylesheet Language (XSL) for complex metadata conversion. As more and more data is stored in eXtensible Markup Language (XML), XSL is an interesting alternative to conversion applications written in ordinary programming languages. To examine this, we have used as transformation case the conversion of bibliographic records. For instance, these records contain information about pieces of literature, music, and theatre plays. Today, most records are annotated in MAchine Readable Cataloging (MARC) record format, which is employed by many library systems, including the Norwegian BIBSYS database. Functional Requirements for Bibliographic Records (FRBR) is a proposed conceptual metadata framework for bibliographic records, which purpose is to introduce a more modern terminology for use of bibliographic data in information systems. A Norwegian project called "FRBR i bibliotekskataloger" studied the conversion of bibliographic records from BIBSYS-MARC into FRBR concepts. In this master thesis we have analysed this conversion, and designed and implemented an equivalent XSL implementation of the main steps wherever possible. In addition, we have looked at ways to use XSL to improve metadata quality. There are two great challenges related to this conversion from a computing perspective. That is, it is a one-to-many conversion involving complex processing steps with relationship handling, and it operates on large data volumes. XSL has a more limited processing model than for instance object-oriented languages. In spite of this, it does offer useful transformation features which we identified as applicable. The features we explored in particular were use of an extension function, use of an external web servlet, recursion, and XSL's modularisation possibilities. Recursion and XPath functions from both version 1.0 and 2.0 proved to be very useful. Also, an implemented extension function written in Java compensated for both missing functionality in XSL, and the complexity level an implementation in XSL would require. The extension function turned out to be very timeconsuming, but to blame is our Java implementation and not XSL's extension function feature. As for modularization, we developed XSLT stylesheets for the conversion steps which we executed in two manners, namely stand-alone and multiphased. In the stand-alone conversion, every conversion stylesheet was sequentially executed, using the output from one as the input for the next. In the multiphased conversion, a control stylesheet acted as a single process, and called conversion templates in turn. Our experiments showed that XSL is capable of converting equivalently large BIBSYS-MARC record collections, as the Java application from the Norwegian project. Also, our standalone conversion handled larger BIBSYS-MARC record collections, than the multiphased transformation. This is because the multiphased conversion needs to keep intermediate results in memory, and hence it requires more memory. The execution times for both the stand-alone and the multiphased conversion were reasonable. Our project proved that all the steps except from one were solvable with XSL.