Vis enkel innførsel

dc.contributor.authorKhalitov, Ruslan
dc.contributor.authorYu, Tong
dc.contributor.authorCheng, Lei
dc.contributor.authorYang, Zhirong
dc.date.accessioned2022-05-12T14:25:44Z
dc.date.available2022-05-12T14:25:44Z
dc.date.created2022-05-05T14:22:31Z
dc.date.issued2022
dc.identifier.citationNeural Networks. 2022, 152 160-168.en_US
dc.identifier.issn0893-6080
dc.identifier.urihttps://hdl.handle.net/11250/2995499
dc.description.abstractSquare matrices appear in many machine learning problems and models. Optimization over a large square matrix is expensive in memory and in time. Therefore an economic approximation is needed. Conventional approximation approaches factorize the square matrix into a number matrices of much lower ranks. However, the low-rank constraint is a performance bottleneck if the approximated matrix is intrinsically high-rank or close to full rank. In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only  non-zero numbers for an  full matrix. Our new method is especially useful for scalable neural attention modeling. Different from the conventional scaled dot-product attention methods, we train neural networks to map input data to the non-zero entries of the factorizing matrices. The sparse factorization method is tested for various square matrices, and the experimental results demonstrate that our method gives a better approximation when the approximated matrix is sparse and high-rank. As an attention module, our new method defeats Transformer and its several variants for long sequences in synthetic data sets and in the Long Range Arena benchmarks. Our code is publicly available2.en_US
dc.language.isoengen_US
dc.publisherElsevieren_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleSparse factorization of square matrices with application to neural attention modelingen_US
dc.title.alternativeSparse factorization of square matrices with application to neural attention modeling / Published by Neural Networks https://doi.org/10.1016/j.neunet.2022.04.014en_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.source.pagenumber160-168en_US
dc.source.volume152en_US
dc.source.journalNeural Networksen_US
dc.identifier.doihttps://doi.org/10.1016/j.neunet.2022.04.014
dc.identifier.cristin2021815
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode2


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal