Vis enkel innførsel

dc.contributor.authorKastrati, Zenun
dc.contributor.authorKurti, Arianit
dc.contributor.authorImran, Ali Shariq
dc.date.accessioned2022-11-25T07:35:27Z
dc.date.available2022-11-25T07:35:27Z
dc.date.created2020-01-09T10:02:59Z
dc.date.issued2020
dc.identifier.issn2352-3409
dc.identifier.urihttps://hdl.handle.net/11250/3033986
dc.description.abstractIn this article, we present a dataset containing word embeddings and document topic distribution vectors generated from MOOCs video lecture transcripts. Transcripts of 12,032 video lectures from 200 courses were collected from Coursera learning platform. This large corpus of transcripts was used as input to two well-known NLP techniques, namely Word2Vec and Latent Dirichlet Allocation (LDA) to generate word embeddings and topic vectors, respectively. We used Word2Vec and LDA implementation in the Gensim package in Python. The data presented in this article are related to the research article entitled “Integrating word embeddings and document topics with deep learning in a video classification framework” [1]. The dataset is hosted in the Mendeley Data repositoryen_US
dc.language.isoengen_US
dc.publisherElsevieren_US
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleWET: Word embedding-topic distribution vectors for MOOC video lectures dataseten_US
dc.title.alternativeWET: Word embedding-topic distribution vectors for MOOC video lectures dataseten_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionpublishedVersionen_US
dc.source.volume28en_US
dc.source.journalData in Briefen_US
dc.identifier.doi10.1016/j.dib.2019.105090
dc.identifier.cristin1769024
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal