Each Arabic-speaking community distinguishes between two varieties of the language: a standardized variety known as Modern Standard Arabic (MSA) and their local variety of Dialectal Arabic (DA). While MSA is generally perceived as a common variety among Arabic speakers, different varieties of DA exist across the vast geographical area over which Arabic speakers are distributed. The sociolinguistic theory of “Dialect Levels” informs us about the different levels between MSA and DA. In this talk, I will explain our efforts to computationally model this theory, which we operationalized as a continuous variable in the range [0, 1], that we call “Arabic Level of Dialectness (ALDi)”. I will then demonstrate how ALDi allows for quantitatively studying the different styles that Arab presidents employ in their speeches. Lastly, I will show how ALDi is a good predictor of interannotator agreement, when samples are randomly routed to speakers of different dialects.
Invited Speaker: Amr Keleg (University of Edinburgh)
Bio: Amr is a final-year PhD student at the University of Edinburgh. His research has so far focused on improving how the variation among Arabic dialects is modeled in NLP. He made multiple contributions to this end and received an outstanding paper award at ACL 2024. His work on Arabic dialects has increased his awareness of the cultural differences across Arabic-speaking communities. In the future, he is keen on extending his research to multilingual and multicultural settings.