Data Visualization for Discovery of Digital Evidence in Email
MetadataVis full innførsel
Digital recordings of our activities are constantly being stored and processed in information systems, and are increasingly getting more valuable for investigating illegal activities. A problem we are facing today is the amount of manual labor required in order to look through this evergrowing amount of data. This process is time consuming and error prone. Tools and methods for speeding up the process are needed, and they must be capable of aiding both identification and correlation of the evidence. An indirect benefit is reduction in human error by freeing up human cognitive processing. The topic of this thesis is email investigation and how visualization techniques can support it. Data visualizations can help humans spot trends, correlations and anomalies in the data. Anomalies are of particular interest, based on the assumption that illegal behavior can be detected indirectly based on outlier characteristics of meta data. Two sources of email data are used. The primary source is the Enron dataset consisting of approximately 150 corporate accounts. The second data source is a private Gmail account synchronized via Microsoft Outlook. The research methodology comprises design, implementation and testing of a modular framework for email investigation. It has three main parts: Standardization of email format, extraction of meta data and a web based interface. Preliminary testing comprises demonstration of designed techniques using the Enron dataset, verification of email parsing using the Gmail account and lastly verification of visualization results using a commercial tool called Tableau. Standardizing on a common format before meta data extraction eases the process of adding support for new or changing email storage formats. Important steps in meta data extraction are removal of duplicates, determination of owners’ email address(es) and determination of direction. Deleted messages that still remain in other user accounts can be recovered. Email accounts that have not been collected can partly be rebuilt based on the same principle. Extracted meta data have been imported to the commercial tool Tableau which proves to be an efficient environment for prototyping visualizations. Important findings for interaction and visualization are the benefits of interlinking visualizations with the underlying data. Interlinking both in time and across email accounts. The sending direction of a message is important when visualizing the time of day or number of messages per day since sent messages correlate stronger with user actions than received messages. Moving analysis of email from a desktop application to a web portal opens up for new ways to collaborate. Investigators can see what messages has been read by others and important messages can be added to a shared case timeline. A working prototype with support of Microsoft Outlook *.pst files has been prepared, and can be used for further research.