K-Anonymity as a Service for Mobility Analytics - Optimizing the value of k in case of k-anonymity
Master thesis
Permanent lenke
http://hdl.handle.net/11250/2571690Utgivelsesdato
2018Metadata
Vis full innførselSamlinger
Sammendrag
The mobile network operators collect data from mobile communication networks about their users. The data collected is referred to as mobility data and contains the user's spatial positions within the network over time. A longstanding theory states that aggregation of mobility data is enough to protect the privacy of the users. However, in the recent year, studies have proved that sensitive information, including individual trajectories from aggregated mobility data, can be recovered.
To protect the users contained in mobility data additional privacy techniques must be utilized. \textit{K}-anonymity is a widely used location privacy technique, which states that it should not be possible to distinguish one user from \textit{k} - 1 other users. The value of \textit{k} is highly dynamic and changes in response to the characteristics of the mobility data.
This master thesis aims to dynamically determine what value of \textit{k} for \textit{k}-anonymity is the optimal value for a variety of mobility datasets. To achieve this, we propose a system that first attempts to recover individual trajectories using a modified version of the Hungarian algorithm. Then the mobility data is further protected by applying different levels of \textit{k}-anonymity until the percentage of recovered trajectories is close to zero.
To evaluate the method aggregated mobility data from four different locations in Norway with a duration of three weeks was used. Since the collection and storing of individual trajectories is a breach of privacy regulations, there is no way of testing the accuracy of the recovered trajectories against the real individual trajectories. To cope with this, synthetic trajectories, computationally created to behave like human trajectories, was added to the original mobility dataset before running the algorithm for trajectory recovery.
The proposed system implies that the level of \textit{k} for \textit{k}-anonymity is insignificant, as the percentage of recovered trajectories stay relatively consistent when different levels of k are applied. Further analysis suggests that the characteristics of the utilized mobility data alone protect the data from privacy attacks because it produces common movement patterns instead of individual trajectories.