Show simple item record

dc.contributor.authorMuhammad, Khan
dc.contributor.authorMustaqeem, .
dc.contributor.authorUllah, Amin
dc.contributor.authorImran, Ali Shariq
dc.contributor.authorSajjad, Muhammad
dc.contributor.authorKiran, Mustafa Servet
dc.contributor.authorSannino, Giovanna
dc.contributor.authorAlburguerque, Victor Hugo C., de
dc.date.accessioned2022-10-04T06:27:37Z
dc.date.available2022-10-04T06:27:37Z
dc.date.created2021-07-02T13:32:42Z
dc.date.issued2021
dc.identifier.citationFuture generations computer systems. 2021, 125 820-830.en_US
dc.identifier.issn0167-739X
dc.identifier.urihttps://hdl.handle.net/11250/3023452
dc.description.abstractHuman action recognition in videos is an active area of research in computer vision and pattern recognition. Nowadays, artificial intelligence (AI) based systems are needed for human-behavior assessment and security purposes. The existing action recognition techniques are mainly using pre-trained weights of different AI architectures for the visual representation of video frames in the training stage, which affect the features’ discrepancy determination, such as the distinction between the visual and temporal signs. To address this issue, we propose a bi-directional long short-term memory (BiLSTM) based attention mechanism with a dilated convolutional neural network (DCNN) that selectively focuses on effective features in the input frame to recognize the different human actions in the videos. In this diverse network, we use the DCNN layers to extract the salient discriminative features by using the residual blocks to upgrade the features that keep more information than a shallow layer. Furthermore, we feed these features into a BiLSTM to learn the long-term dependencies, which is followed by the attention mechanism to boost the performance and extract the additional high-level selective action related patterns and cues. We further use the center loss with Softmax to improve the loss function that achieves a higher performance in the video-based action classification. The proposed system is evaluated on three benchmarks, i.e., UCF11, UCF sports, and J-HMDB datasets for which it achieved a recognition rate of 98.3%, 99.1%, and 80.2%, respectively, showing 1%–3% improvement compared to the state-of-the-art (SOTA) methods.en_US
dc.language.isoengen_US
dc.publisherElsevieren_US
dc.titleHuman action recognition using attention based LSTM network with dilated CNN featuresen_US
dc.typePeer revieweden_US
dc.typeJournal articleen_US
dc.description.versionpublishedVersionen_US
dc.rights.holderThis version of the article will not be available due to copyright restrictions by Elsevieren_US
dc.source.pagenumber820-830en_US
dc.source.volume125en_US
dc.source.journalFuture generations computer systemsen_US
dc.identifier.doi10.1016/j.future.2021.06.045
dc.identifier.cristin1920069
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode2


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record