Development of a Demand Driven Dom Parser

Alvestad, Gaute Odin; Gausnes, Ole Martin; Kråkenes, Ole-Jakob

dc.contributor.advisor	Aalberg, Trond	nb_NO
dc.contributor.author	Alvestad, Gaute Odin	nb_NO
dc.contributor.author	Gausnes, Ole Martin	nb_NO
dc.contributor.author	Kråkenes, Ole-Jakob	nb_NO
dc.date.accessioned	2014-12-19T13:33:21Z
dc.date.available	2014-12-19T13:33:21Z
dc.date.created	2010-09-03	nb_NO
dc.date.issued	2006	nb_NO
dc.identifier	348178	nb_NO
dc.identifier	ntnudaim:1428	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/251070
dc.description.abstract	XML is a tremendous popular markup language in internet applications as well as a storage format. XML document access is often done through an API, and perhaps the most important of these is the W3C DOM. The recommendation from W3C defines a number of interfaces for a developer to access and manipulate XML documents. The recommendation does not define implementation specific approaches used behind the interfaces. A problem with the W3C DOM approach however, is that documents often are loaded in to memory as a node tree of objects, representing the structure of the XML document. This tree is memory consuming and can take up to 4-10 times the document size. Lazy processing have been proposed, building the node tree as it accesses new parts of the document. But when the whole document has been accessed, the overhead compared to traditional parsers, both in terms of memory usage and performance, is high. In this thesis a new approach is introduced. With the use of well known indexing schemes for XML, basic techniques for reducing memory consumption, and principles for memoryhandling in operation systems, a new and alternative approach is introduced. By using a memory cache repository for DOM nodes and simultaneous utilize principles for lazy processing, the proposed implementation has full control over memory consumption. The proposed prototype is called Demand Driven Dom Parser, D3P. The proposed approach removes least recently used nodes from the memory when the cache has exceeded its memory limit. This makes the D3P able to process the document with low memory requirements. An advantage with this approach is that the parser is able to process documents that exceed the size of the main memory, which is impossible with traditional approaches. The implementation is evaluated and compared with other implementations, both lazy and traditional parsers that builds everything in memory on load. The proposed implementation performs well when the bottleneck is memory usage, because the user can set the desired amount of memory to be used by the XML node tree. On the other hand, as the coverage of the document increases, time spend processing the node tree grows beyond what is used by traditional approaches.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.subject	ntnudaim	no_NO
dc.subject	SIF2 datateknikk	no_NO
dc.subject	Data- og informasjonsforvaltning	no_NO
dc.title	Development of a Demand Driven Dom Parser	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	160	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Tilhørende fil(er)

Filnavn:: 348178_COVER01.pdf
Størrelse:: 47.57Kb
Format:: PDF

Åpne

Filnavn:: 348178_FULLTEXT01.pdf
Størrelse:: 936.0Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6544]

Vis enkel innførsel