Reclaiming Data Ownership: Differential Privacy in a Decentralized Setting

In the field of privacy-preserving data mining the common practice have been to gather data from the users, centralize it in a single database, and employ various anonymization techniques to protect the personally identifiable information contained within the data. Both theoretical analyses and real-world examples of data breaches have proven that these methods have severe shortcomings in protecting an individual's privacy. A major breakthrough was achieved in 2006 when a method called differential privacy was proposed as a mathematical guarantee for the privacy of each record in a data set. Since then, an avenue of research has been to make this concept work in a distributed setting.

In this thesis we propose a decentralized framework that allows users to perform classification after aggregating their locally trained models in a privacy-preserving manner. We describe a series of experiments on the tuning of each major parameter involved, and show the effects of these on the privacy-utility trade-off. We also compare our classification performance to other cases in the literature and show how we achieve competitive performance.

Based on our results, we have produced a set of criteria for applying differential privacy to a machine learning application, as well as two business sectors where we see potential for a successful system. We hope that our research will pave the way for distributed applications where users maintain control of their own data, and use it for learning without giving up their privacy.

Utgiver

NTNU