Flexible Subspace Clustering: A Joint Feature Selection and K-Means Clustering Framework

Regarding as an important computing paradigm, cloud computing is to address big and distributed databases and rather simple computation. In this paradigm, data mining is one of the most important and fundamental problems. A large amount of data is generated by sensors and other intelligent devices. Data mining for these big data is crucial in various applications. K-means clustering is a typical technique to group the similar data into the same clustering, and has been commonly used in data mining. However, it is still a challenge to the data containing a large amount of noise, outliers and redundant features. In this paper, we propose a robust K-means clustering algorithm, namely, flexible subspace clustering. The proposed method incorporates feature selection and K-means clustering into a unified framework, which can select the refined features and improve the clustering performance. Moreover, for the purpose of enhancing the robustness, the -norm is embedded into the objective function. We can flexibly choose appropriate p according to the different data and thus obtain more robust performance. Experimental results verify the presented method has more robust and better performance on benchmark databases compared to the existing approaches.

Utgiver

Elsevier

Tidsskrift

Big Data Research

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal