Political Profiling of Nepali Twitter Users using Vector Model
Keywords:Nepali language, Political profiling, Vector model, TF-IDF, Word-embedding, Word2Vec, Doc2Vec, Cosine similarity
Everyday people in social networks create a huge amount of data as posts, blogs, tweets, articles, comments, etc. in the form of text, images, audios and videos. The number of social media users and the data they are adding up in cloud is increasing drastically day by day. People from all over the globe with different region, culture, language, education, public figures posts or blogs reflecting their vision and opinion. These micro-blogs are now being used by researchers and business houses for assessing customer opinion to their implicit intension and behavior. Using the tweet contents, this research is to classify a Nepali twitter user to one of the pre-defined class of political parties in Nepal using vector space model. In this approach a set of words is defined as document class that represents to a political party. A number of steps for text-preprocessing is to be done based on morphological structure of Nepali language for the better result. TF-IDF and Doc2Vec methods are used to extract the feature of the terms being used in tweets. Similarity measure is used to match the tweeter's profile with political party's class through similarity matching score. Vector model-based TF-IDF and Doc2Vec methods are compared for their effectiveness in the domain of tweets in Nepali language.