COVID-19 and Misinformation: A Large-Scale Lexical Analysis on Twitter

Abstract

Social media is often used by individuals and organisations as a platform to spread misinformation. With the recent coronavirus pandemic we have seen a surge of misinformation on Twitter, posing a danger to public health. In this paper, we compile a large COVID-19 Twitter misinformation corpus and perform an analysis to discover patterns with respect to vocabulary usage. Among others, our analysis reveals that the variety of topics and vocabulary usage are considerably more limited and negative in tweets related to misinformation than in randomly extracted tweets. In addition to our qualitative analysis, our experimental results show that a simple linear model based only on lexical features is effective in identifying misinformation-related tweets (with accuracy over 80%), providing evidence to the fact that the vocabulary used in misinformation largely differs from generic tweets.

Type
Publication
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop
Dimosthenis Antypas
Dimosthenis Antypas
PhD Student & Teaching Associate
Jose Camacho-Collados
Jose Camacho-Collados
Professor & UKRI Future Leaders Fellow
Alun Preece
Alun Preece
Professor & Co-Director Crime and Security Research Institute