Performance and Interpretability of Entity Matching with Deep Learning

Barlaug, Nils

dc.contributor.advisor	Gulla, Jon Atle
dc.contributor.advisor	Nørvåg, Kjetil
dc.contributor.advisor	Gleim, Alexander
dc.contributor.author	Barlaug, Nils
dc.date.accessioned	2024-05-16T07:31:21Z
dc.date.available	2024-05-16T07:31:21Z
dc.date.issued	2024
dc.identifier.isbn	978-82-326-7937-9
dc.identifier.issn	2703-8084
dc.identifier.uri	https://hdl.handle.net/11250/3130650
dc.description.abstract	Entity matching is the problem of identifying which records refer to the same real-world entity. It is a key data integration task and, despite decades of research, is still challenging. In recent years, deep learning has emerged as the new state-of-the-art paradigm to tackle entity matching. This new paradigm brings about new strengths, weaknesses, trade-offs, and characteristics compared to classical methods. In this thesis, we explore the use of deep learning for entity matching with the goal of gaining insight into what these new methods contribute to the task, how they differ from classical methods, and what their current limitations are. We put special focus on interpretability and blocking because these are, in our opinion, aspects that highlight the contrasts the most. Through a combination of literature analysis and experimental work this thesis provides three main contributions: 1. Insight and overview of how new deep learning methods compare to classical methods for entity matching. 2. A state-of-the-art model-agnostic explainability method tailored to entity matching. 3. A state-of-the-art blocking method based on set similarity joins. We hope that these contributions are valuable to practitioners and the research community and further the development of deep learning for entity matching.	en_US
dc.language.iso	eng	en_US
dc.publisher	NTNU	en_US
dc.relation.ispartofseries	Doctoral theses at NTNU;2024:173
dc.relation.haspart	Paper A: Barlaug, Nils; Gulla, Jon Atle. Neural Networks for Entity Matching: A Survey. - This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Knowledge Discovery from Data 2021, Vol. 15, No. 3 s. 1-37 https://doi.org/10.1145/3442200	en_US
dc.relation.haspart	Paper B: Barlaug, Nils. LEMON: Explainable Entity Matching. IEEE Transactions on Knowledge and Data Engineering Volume: 35 Issue: 8 https://doi.org/10.1109/TKDE.2022.3200644 © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.relation.haspart	Paper C: Barlaug, Nils. ShallowBlocker: Improving Set Similarity Joins for Blocking arXiv:2312.15835v1	en_US
dc.title	Performance and Interpretability of Entity Matching with Deep Learning	en_US
dc.type	Doctoral thesis	en_US
dc.subject.nsi	VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551	en_US

Tilhørende fil(er)

Filnavn:: Nils Barlaug_PhD.pdf
Størrelse:: 3.364Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6788]

Vis enkel innførsel