Multimodal Approach to Analyze Disaster Related Information by using Image & Text Classifications Model on Twitter Data

Authors

  • Bansaj Pradhan Department of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Nepal
  • Sanjeeb Prasad Panday Department of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Nepal
  • Aman Shakya Department of Electronics and Computer Engineering, Pulchowk Campus, Institute of Engineering, Tribhuvan University, Nepal

Keywords:

Image classification, Text classification, Multimodal fusion, Disasters and analysis, Crisis computing, Social media

Abstract

When a natural disaster occurs, we all are eager to know about it, someone willing to help and donate, and someone maybe just curious about it. Multimedia content on social media platforms provides essential information during disasters. Information transmitted includes reports of missing or found people, infrastructure damage, and injured or dead people. Despite the fact that numerous studies have shown how important text and image contents are for disaster response, past research has mostly focused on the text modality with little success with multimodality. The most recent study on the multimodal classification of tweets about disasters makes use of fairly simple models like CNN and VGG16. In order to improve the multimodal categorization of disaster-related tweets, we have gone further in this study work and used cutting-edge text and image classification models. The study focused on two distinct classification tasks: determining if a tweet is informative or not. The various feature extraction techniques from the textual data corpus and the pre processing of the corresponding image corpus are incorporated into the multimodal analysis process. We then use various classification models to train and predict the output and compare their performances while adjusting the parameters to enhance the outcomes. Models like ResNet and Bi-LSTM for text classification and image classification, respectively, were trained and examined. The Bi-LSTM and ResNet multimodal architecture performs better than models developed utilizing a single modality (ResNet for image or Bi-LSTM for text alone), according to the results. Additionally, it demonstrates that for both classification tasks, our Bi-LSTM and ResNet model outperforms the FastText and VGG-16 baseline model by a respectable margin.

Published

2023-04-12

How to Cite

[1]
B. Pradhan, S. P. Panday, and A. Shakya, “Multimodal Approach to Analyze Disaster Related Information by using Image & Text Classifications Model on Twitter Data”, JIE, vol. 17, no. 1, pp. 110-123, Apr. 2023.