Performance and Interpretability of Entity Matching with Deep Learning
Abstract
Entity matching is the problem of identifying which records refer to the same real-world entity. It is a key data integration task and, despite decades of research, is still challenging. In recent years, deep learning has emerged as the new state-of-the-art paradigm to tackle entity matching. This new paradigm brings about new strengths, weaknesses, trade-offs, and characteristics compared to classical methods.
In this thesis, we explore the use of deep learning for entity matching with the goal of gaining insight into what these new methods contribute to the task, how they differ from classical methods, and what their current limitations are. We put special focus on interpretability and blocking because these are, in our opinion, aspects that highlight the contrasts the most.
Through a combination of literature analysis and experimental work this thesis provides three main contributions:
1. Insight and overview of how new deep learning methods compare to classical methods for entity matching.
2. A state-of-the-art model-agnostic explainability method tailored to entity matching.
3. A state-of-the-art blocking method based on set similarity joins.
We hope that these contributions are valuable to practitioners and the research community and further the development of deep learning for entity matching.
Has parts
Paper A: Barlaug, Nils; Gulla, Jon Atle. Neural Networks for Entity Matching: A Survey. - This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in ACM Transactions on Knowledge Discovery from Data 2021, Vol. 15, No. 3 s. 1-37 https://doi.org/10.1145/3442200Paper B: Barlaug, Nils. LEMON: Explainable Entity Matching. IEEE Transactions on Knowledge and Data Engineering Volume: 35 Issue: 8 https://doi.org/10.1109/TKDE.2022.3200644 © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Paper C: Barlaug, Nils. ShallowBlocker: Improving Set Similarity Joins for Blocking arXiv:2312.15835v1