Query-based Software Integration for Industry

Bakken, Magnus

dc.contributor.advisor	Soylu, Ahmed
dc.contributor.advisor	Rutle, Adrian
dc.contributor.advisor	Sælid, Steinar
dc.contributor.author	Bakken, Magnus
dc.date.accessioned	2023-12-22T09:52:46Z
dc.date.available	2023-12-22T09:52:46Z
dc.date.issued	2023
dc.identifier.isbn	978-82-326-7350-6
dc.identifier.issn	2703-8084
dc.identifier.uri	https://hdl.handle.net/11250/3108758
dc.description.abstract	Industrial information models are ways of representing information in industrial settings. They are often established as a core part of industrial digitalization. When information models are standardised, they have the potential to make it easier to scale and reuse software across the portfolio of a company or industry. Knowledge graphs are representations of knowledge about the world, such as industrial assets, using nodes and edges between them. This thesis seeks to advance software engineering in industry through the use of information models encoded as knowledge graphs and queries over them. To do so, we consider three related challenges in the use of knowledge graphs for industry. The first of these challenges is that knowledge graph construction tools and languages provide little visibility and interactivity. Many data scientists prefer interactive computational notebooks for data analysis, citing the ease with which analyses can be written, interspersing code, documentation, and outputs. These benefits have, however, not been available to knowledge graphs construction. Existing approaches rely on atomic, end-to-end execution where intermediary stages cannot be inspected, and rely on file-based integration with de-facto standard data engineering tooling. Once knowledge graphs encoding standardised information models are constructed, they can be leveraged to engineer software pertaining to the asset. Industrial assets are highly modular, a feature that should be exploited to engineer reusable software. Queries are important for finding recurring patterns of modules in knowledge graphs. In the thesis, we focus on analytical software that consumes large amounts of time-series data, and real-time digital shadows for discrete production. Using queries over knowledge graphs to build such software is associated with additional challenges. The second challenge we address relates to analytical applications using time-series data. Knowledge graphs can facilitate uniform access to time-series data as the application is scaled, using a conceptual language close to the domain. Graph databases, however, are typically not suited for time-series data. Existing approaches to querying contextualised time-series data are either not general, not declarative, or assume the existence of an SQL-based integrator, not typically available in on-premise settings. The third challenge we address relates to digital shadows, which are systems that maintain a model of the real world. A real-time digital shadow is able to maintain the context necessary for smart factories, which perform production in a flexible and individualised way, allowing for mass customization. The standard approach is the agent-based approach which conceptualises on-line manufacturing as a quorum of experts that discuss and agree on an outcome. Such systems, as the discussions they emulate, have been found to have unpredictable performance and outcomes, limiting industry use. Instead of open-ended reasoning and communication, we can use queries to define the input of modules for maintaining parts of a real-time digital shadow and assemble them into a working system. This approach requires extensions to asset-oriented industrial knowledge graphs as well as a principled way of defining module assembly. Our first contribution is maplib, which makes interactive, literal knowledge graph construction possible, and which outperforms current approaches on a challenging benchmark from the literature involving graph construction and querying. Our second contribution is a query rewriting approach called chrontext, which works on hybrid architectures involving a graph database and an arbitrary timeseries database. Chrontext addresses the challenge of query-based analytical applications. We show that chrontext correctly processes a large fragment of SPARQL and that our solution is much faster and more scalable than the incumbent. Our third contribution is a modular, query-defined approach to defining a real-time digital shadow for discrete production. We extend an asset-oriented knowledge graph with information about the objects in a production process and use queries over this knowledge graph with annotations to define the input and permitted output for what we call event interpreters. This conceptualization allows us to define real-time digital shadows for discrete production using queries in a way that exploits the modular structure of production processes and to determine the communication patterns of the on-line system. If event interpreters are defined in a general way, the approach allows us to engineer real-time digital shadows using information models represented as knowledge graphs, allowing them to be reused without change across a portfolio of factories, and as factories are reconfigured. Using an example from the literature, we show that the approach achieves realtime performance. We are not aware of other approaches for the model-based engineering of such real-time digital shadows. In summary, our first contribution found a way of leveraging interactive computational notebooks to construct knowledge graphs, which may transfer any benefits of using such tools to knowledge graph construction, and in turn knowledge graph-based software engineering for industry. Our second contribution, chrontext, allows queries to be used to define inputs of analytical applications on infrastructures not covered by existing approaches and with higher performance than the incumbent. We assume that queries defined in a conceptual language that abstracts away technical details of storage makes it easier to access data. By making it easier to access time-series data, chrontext makes it easier to engineer timeseries intensive analytical applications. Our third contribution makes it possible to use information models to define real-time digital shadows for discrete production. If modular, model-based software engineering indeed makes it easier to scale and maintain software, we have extended these benefits to the engineering of real-time digital shadows for discrete production.	en_US
dc.language.iso	eng	en_US
dc.publisher	NTNU	en_US
dc.relation.ispartofseries	Doctoral theses at NTNU;2023:325
dc.title	Query-based Software Integration for Industry	en_US
dc.type	Doctoral thesis	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.description.localcode	Fulltext not available	en_US

Files in this item

Name:: Magnus Bakken.pdf
Size:: 11.90Mb
Format:: PDF

Locked

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6569]

Show simple item record