Show simple item record

dc.contributor.authorKhalitov, Ruslan
dc.contributor.authorYu, Tong
dc.contributor.authorCheng, Lei
dc.contributor.authorYang, Zhirong
dc.date.accessioned2022-05-12T14:25:44Z
dc.date.available2022-05-12T14:25:44Z
dc.date.created2022-05-05T14:22:31Z
dc.date.issued2022
dc.identifier.citationNeural Networks. 2022, 152 160-168.en_US
dc.identifier.issn0893-6080
dc.identifier.urihttps://hdl.handle.net/11250/2995499
dc.description.abstractSquare matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only  non-zero numbers for an  full matrix. Our new method is especially useful for scalable neural attention modeling. Different from the conventional scaled dot-product attention methods, we train neural networks to map input data to the non-zero entries of the factorizing matrices. The sparse factorization method is tested for various square matrices, and the experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high-rank. As an attention module, our new method defeats Transformer and its several variants for long sequences in synthetic data sets and in the Long Range Arena benchmarks. Our code is publicly available2.en_US
dc.language.isoengen_US
dc.publisherElsevieren_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleSparse factorization of square matrices with application to neural attention modelingen_US
dc.title.alternativeSparse factorization of square matrices with application to neural attention modeling / Published by Neural Networks https://doi.org/10.1016/j.neunet.2022.04.014en_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.source.pagenumber160-168en_US
dc.source.volume152en_US
dc.source.journalNeural Networksen_US
dc.identifier.doihttps://doi.org/10.1016/j.neunet.2022.04.014
dc.identifier.cristin2021815
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode2


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Navngivelse 4.0 Internasjonal
Except where otherwise noted, this item's license is described as Navngivelse 4.0 Internasjonal