Visualization Techniques for Interactive Visual Analysis of Multidimensional Big Data
MetadataVis full innførsel
Visual analysis has been used in many fields of research, such as health, biology, chemistry, social science, astronomy, and physics, to solve data-driven problems. Visualization is an effective tool to communicate, understand, extract information, and interact with data. However, the physical limitations of human visual perception prevent the direct visualization and understanding of multidimensional data. Projecting the data into a lower-dimensional space with a variety of dimensionality reduction techniques or mapping the data to parallel coordinates are two of the most widely used methods for visualizing multidimensional data. Interactive visual analysis plays an essential role in visual analytics to integrate human intelligence into visualization for knowledge discovery. The amount of multidimensional data available in various fields of research has been growing at a tremendous rate. Multidimensional big data brings new challenges and opportunities to visualization techniques for supporting interactive visual analysis. This thesis presents a systematic review and two practical studies in the field of data visualization, focusing on interactive visual analysis of multidimensional big data. Specifically, it presents two scalable lightweight visualization techniques to solve the scalability challenge of parallel coordinates and dimensionality reduction techniques for supporting interactive visual analysis of multidimensional big data. The research for this thesis is based on several widely used benchmark multidimensional datasets obtained from public data repositories and synthesized datasets with hundreds of dimensions and millions of data points. In the systematic review, I propose a novel taxonomy of state-of-the-art visual analytics applications based on the dimensionality of data and visualization, and the types of interactions, and summarize the challenges and future directions for interactive visual analysis of multidimensional big data. The results of the systematic review lead to the two practical studies. In the first practical study, I propose a scalable lightweight bundling method to address the challenge of interactive visual analysis of multidimensional big data using parallel coordinates. It accelerates the clustering process of the data and helps users discover trends and detect outliers in the data by integrating human intelligence into the two-dimensional data binning using novel interactions. It uses the frequency-based representation to render the clusters as histogram-like bundles to reveal the distribution of the data, eliminate visual clutter and overplotting in parallel coordinates, and accelerate the rendering process. In the second practical study, I propose a scalable method, named ColorPCA, to address the challenge of automatically colorizing unlabeled multidimensional big data for discovering classes in the data. It combines principal component analysis and ray casting to compute the composite RGBA color of the data. It provides a fast way to enhance the visualization of the data in lower dimensional space and help the users find suitable parameters of dimensionality reduction algorithms to balance the running time and the projection results. Based on the two proposed visualization techniques, I have developed two web based applications to support interactive visual analysis of multidimensional big data with parallel coordinates and lower-dimensional projections. The usefulness and effectiveness of the two proposed visualization techniques were demonstrated by case studies and user studies using the applications with benchmark datasets. The scalability of the two proposed visualization techniques were evaluated via scalability analysis with synthesized datasets. The experimental results show that the two proposed visualization techniques are well scalable for multidimensional big data. For example, the bundling method can support real-time interactions for clustering millions of multidimensional data records without pre-computation of the data and real-time visualization of the bundling result in web-based parallel coordinates plot without hardware-accelerated rendering. With a one-time preprocessing of the data, ColorPCA can colorize millions of multidimensional data points in real-time without hardware acceleration.
Består avPaper 1: Cui, Wenqiang. Visual Analytics: A Comprehensive Overview. IEEE Access 2019 ;Volum 7. s. 81555-81573 https://doi.org/10.1109/ACCESS.2019.2923736 This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY)
Paper 2: Cui, Wenqiang; Strazdins, Girts; Wang, Hao. Web-based Scalable Visual Exploration of Large Multidimensional Data Using Human-in-the-Loop Edge Bundling in Parallel Coordinates. CEUR Workshop Proceedings 2020 ;Volum 2578. https://ceur-ws.org/Vol-2578/BigVis8.pdf This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY)
Paper 3: Cui, Wenqiang; Strazdins, Girts; Wang, Hao. Visual Analysis of Multidimensional Big Data: A Scalable Lightweight Bundling Method for Parallel Coordinates. IEEE Transactions on Big Data 2021 https://doi.org/10.1109/TBDATA.2021.312398210.1109/TBDATA.2021.3123982
Paper 4: Cui, Wenqiang ColorPCA: A Scalable Method for Colorizing Unlabeled Multidimensional Big Data