Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention

Yu, Tong; Khalitov, Ruslan; Cheng, Lei; Yang, Zhirong

dc.contributor.author	Yu, Tong
dc.contributor.author	Khalitov, Ruslan
dc.contributor.author	Cheng, Lei
dc.contributor.author	Yang, Zhirong
dc.date.accessioned	2023-02-07T10:05:26Z
dc.date.available	2023-02-07T10:05:26Z
dc.date.created	2022-11-16T16:38:17Z
dc.date.issued	2022
dc.identifier.issn	2575-7075
dc.identifier.uri	https://hdl.handle.net/11250/3048820
dc.description.abstract	Self-Attention is a widely used building block in neural modeling to mix long-range data elements. Most self-attention neural networks employ pairwise dot-products to specify the attention coefficients. However, these methods require O(N 2 ) computing cost for sequence length N. Even though some approximation methods have been introduced to relieve the quadratic cost, the performance of the dot-product approach is still bottlenecked by the lowrank constraint in the attention matrix factorization. In this paper, we propose a novel scalable and effective mixing building block called Paramixer. Our method factorizes the interaction matrix into several sparse matrices, where we parameterize the non-zero entries by MLPs with the data elements as input. The overall computing cost of the new building block is as low as O(N log N). Moreover, all factorizing matrices in Paramixer are full-rank, so it does not suffer from the low-rank bottleneck. We have tested the new method on both synthetic and various real-world long sequential data sets and compared it with several state-of-the-art attention networks. The experimental results show that Paramixer has better performance in most learning tasks.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.title	Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention	en_US
dc.title.alternative	Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention	en_US
dc.type	Journal article	en_US
dc.description.version	acceptedVersion	en_US
dc.rights.holder	© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.source.journal	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)	en_US
dc.identifier.doi	10.1109/CVPR52688.2022.00077
dc.identifier.cristin	2075105
cristin.ispublished	true
cristin.fulltext	postprint

Files in this item

Name:: Yu_Paramixer_Parameterizing_Mi ...
Size:: 573.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6568]
Publikasjoner fra CRIStin - NTNU [37362]

Show simple item record