dc.contributor.author | Khalitov, Ruslan | |
dc.contributor.author | Yu, Tong | |
dc.contributor.author | Cheng, Lei | |
dc.contributor.author | Yang, Zhirong | |
dc.date.accessioned | 2022-05-12T14:25:44Z | |
dc.date.available | 2022-05-12T14:25:44Z | |
dc.date.created | 2022-05-05T14:22:31Z | |
dc.date.issued | 2022 | |
dc.identifier.citation | Neural Networks. 2022, 152 160-168. | en_US |
dc.identifier.issn | 0893-6080 | |
dc.identifier.uri | https://hdl.handle.net/11250/2995499 | |
dc.description.abstract | Square matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only non-zero numbers for an full matrix. Our new method is especially useful for scalable neural attention modeling. Different from the conventional scaled dot-product attention methods, we train neural networks to map input data to the non-zero entries of the factorizing matrices. The sparse factorization method is tested for various square matrices, and the experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high-rank. As an attention module, our new method defeats Transformer and its several variants for long sequences in synthetic data sets and in the Long Range Arena benchmarks. Our code is publicly available2. | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Elsevier | en_US |
dc.rights | Navngivelse 4.0 Internasjonal | * |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/deed.no | * |
dc.title | Sparse factorization of square matrices with application to neural attention modeling | en_US |
dc.title.alternative | Sparse factorization of square matrices with application to neural attention modeling / Published by Neural Networks https://doi.org/10.1016/j.neunet.2022.04.014 | en_US |
dc.type | Peer reviewed | en_US |
dc.type | Journal article | en_US |
dc.description.version | publishedVersion | en_US |
dc.source.pagenumber | 160-168 | en_US |
dc.source.volume | 152 | en_US |
dc.source.journal | Neural Networks | en_US |
dc.identifier.doi | https://doi.org/10.1016/j.neunet.2022.04.014 | |
dc.identifier.cristin | 2021815 | |
cristin.ispublished | true | |
cristin.fulltext | original | |
cristin.qualitycode | 2 | |