Tag Prediction in Social Media - Predicting Image Tags with Computer Vision and Word Embedding

2018

Social Media produces vast amounts of user-generated content (UGC) every second, and

images are increasingly part of enriching this content. The need for effective ways to organize

and categorize content is bigger than ever. The proliferation of Big Data also offer

new opportunities in regards to utilizing UGC in recommender systems. Considering the

noisy and unstructured nature of user-generated text however, extracting valuable knowledge

from it is not an easy task. Therefore, this thesis looks in the direction of images.

With the goal to extract some usable knowledge from these Social Media images, this

thesis proposes a novel approach to predicting the tags and content of an image from

Social Media with the help of deep convolutional neural networks (deep CNNs) and word

embedding models.

A pre-trained model for computer vision is used to classify an image and extract predictions

of its most likely content, and then evaluated against the image s tags to discover

the model s tag prediction ability. Each of the predictions are used to produce similar syntactic

and semantic information from a word embedding model. Using this aggregated information,

the model s prediction ability is re-evaluated and performances are compared.

In addition, the predictions are studied qualitatively to understand their degree of relevance.

The model is evaluated on a subset of the MIRFLICKR25000 data set, which consists

of 25000 images under the Creative Commons licence gathered from the Social Media

platform Flickr. Although image auto-tagging is thoroughly researched, the task of tag

prediction from images using computer vision and word embedding in this way is not

done previously. The evaluation of this model on the data subset shows that comparable

accuracy to state-of-the-art is achieved. Although they are not groundbreaking in terms of

accuracy, results show a significant increase when expanding queries using a word embedding

model.

NTNU