Sparse factorization of square matrices with application to neural attention modeling

Khalitov, Ruslan; Yu, Tong; Cheng, Lei; Yang, Zhirong

dc.contributor.author	Khalitov, Ruslan
dc.contributor.author	Yu, Tong
dc.contributor.author	Cheng, Lei
dc.contributor.author	Yang, Zhirong
dc.date.accessioned	2022-05-12T14:25:44Z
dc.date.available	2022-05-12T14:25:44Z
dc.date.created	2022-05-05T14:22:31Z
dc.date.issued	2022
dc.identifier.citation	Neural Networks. 2022, 152 160-168.	en_US
dc.identifier.issn	0893-6080
dc.identifier.uri	https://hdl.handle.net/11250/2995499
dc.description.abstract	Square matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only non-zero numbers for an full matrix. Our new method is especially useful for scalable neural attention modeling. Different from the conventional scaled dot-product attention methods, we train neural networks to map input data to the non-zero entries of the factorizing matrices. The sparse factorization method is tested for various square matrices, and the experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high-rank. As an attention module, our new method defeats Transformer and its several variants for long sequences in synthetic data sets and in the Long Range Arena benchmarks. Our code is publicly available2.	en_US
dc.language.iso	eng	en_US
dc.publisher	Elsevier	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Sparse factorization of square matrices with application to neural attention modeling	en_US
dc.title.alternative	Sparse factorization of square matrices with application to neural attention modeling / Published by Neural Networks https://doi.org/10.1016/j.neunet.2022.04.014	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.source.pagenumber	160-168	en_US
dc.source.volume	152	en_US
dc.source.journal	Neural Networks	en_US
dc.identifier.doi	https://doi.org/10.1016/j.neunet.2022.04.014
dc.identifier.cristin	2021815
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2

Files in this item

Name:: Sparse+factorization+of+square ...
Size:: 2.013Mb
Format:: PDF
Description:: Khalitov

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6819]
Publikasjoner fra CRIStin - NTNU [38518]

Show simple item record

Except where otherwise noted, this item's license is described as Navngivelse 4.0 Internasjonal