Representation Learning for Long Sequential Data with Circular Dilated Convolutional Neural Networks
Abstract
Sequential data is prevalent in many application domains. To effectively and efficiently learn meaningful representations from long sequences, neural networks need properties such as a full-receptive field, scalability, broad information flow, and symmetry. However, conventional neural networks often struggle with handling long-range dependencies and maintaining efficiency, making it challenging to process long sequential data effectively.
This research project aims to develop scalable neural networks for representation learning from long sequential data. We propose four novel neural networks, with this thesis focusing specifically on Circular Dilated Convolutional Neural Networks (CDIL-CNNs). CDIL-CNNs employ symmetric convolutions with exponentially increasing dilations and circular padding. This thesis demonstrates that each output element of the final layer in CDIL-CNNs has an equal chance to receive information from all input elements, with a computational complexity of O(N log2 N), where N represents the sequence length.
Additionally, we incorporate CDIL-CNNs into a self-supervised learning framework to leverage substantial amounts of unlabeled sequences. This thesis employs masked learning, a technique that utilizes unmasked parts of a sequence to predict the masked parts during the pre-training stage. This approach enables our method to learn contextual information within the data and extract meaningful features from unlabeled sequences, thereby enhancing its generalization capability for various downstream tasks.
We have tested CDIL-CNNs across various long sequential datasets, including synthetic data, images, text, time series, and DNA sequences. The experimental results demonstrate that our neural network architecture outperforms many state-of-the-art approaches.
Has parts
Paper A: Khalitov, Ruslan; Yu, Tong; Cheng, Lei; Yang, Zhirong. Sparse factorization of square matrices with application to neural attention modeling. Neural Networks 2022 ;Volum 152. s. 160-168 https://doi.org/10.1016/j.neunet.2022.04.014 This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).Paper B: u, Tong; Khalitov, Ruslan; Cheng, Lei; Yang, Zhirong. Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention. Open Access version provided by the Computer Vision Foundation.
Paper C: Khalitov, Ruslan; Yu, Tong; Cheng, Lei; Yang, Zhirong. ChordMixer: A Scalable Neural Attention Model for Sequences with Different Length. The Eleventh International Conference on Learning Representations; 2023 https://openreview.net/pdf?id=E8mzu3JbdR
Paper D: Cheng, Lei; Khalitov, Ruslan; Yu, Tong; Zhang, Jing; Yang, Zhirong. Classification of long sequential data using circular dilated convolutional neural networks. Neurocomputing 2023 ;Volum 518. s. 50-59 https://doi.org/10.1016/j.neucom.2022.10.054 This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).