COVID-19 and Misinformation: A Large-Scale Lexical Analysis on Twitter

Dimosthenis Antypas, Jose Camacho-Collados, Alun Preece, David Rogers

August 2021

Abstract

Social media is often used by individuals and organisations as a platform to spread misinformation. With the recent coronavirus pandemic we have seen a surge of misinformation on Twitter, posing a danger to public health. In this paper, we compile a large COVID-19 Twitter misinformation corpus and perform an analysis to discover patterns with respect to vocabulary usage. Among others, our analysis reveals that the variety of topics and vocabulary usage are considerably more limited and negative in tweets related to misinformation than in randomly extracted tweets. In addition to our qualitative analysis, our experimental results show that a simple linear model based only on lexical features is effective in identifying misinformation-related tweets (with accuracy over 80%), providing evidence to the fact that the vocabulary used in misinformation largely differs from generic tweets.

Type

Publication

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

COVID-19 and Misinformation: A Large-Scale Lexical Analysis on Twitter

Abstract

Dimosthenis Antypas

PhD Student & Teaching Associate

Jose Camacho-Collados

Professor & UKRI Future Leaders Fellow

Alun Preece

Professor & Co-Director Crime and Security Research Institute