Contemporary Natural Language Processing is largely rooted in language resources, e.g., for training models in supervised machine learning. Even is few- or zero-shot settings, the importance of good quality data for benchmarking is paramount. However, most language resources encode majority-driven knowledge, hiding the often informative variation coming from a diversity of human backgrounds. This is particularly relevant when dealing with highly subjective aspects of natural language such as irony or undesirable language. A recent line of research proposes to never aggregate annotations [1], but rather to leverage the worth of knowledge found in disagreement for building models [2] and evaluating them [3]. In this talk, I will present the perspectivist paradigm in NLP, and the results of ongoing research focusing on building perspective-aware predictive models and automatically extract human perspectives from annotated data.
Invited Speaker: Valerio Basile (University of Turin, Italy)
Bio: Valerio Basile is an Assistant Professor at the Computer Science Department of the University of Turin, Italy, member of the Content-centered Computing group and the Hate Speech Monitoring group. His work spans across several areas such as: formal representations of meaning, linguistic annotation, natural language generation, commonsense knowledge, semantic parsing, sentiment analysis, and hate speech detection, perspectives and bias in supervised machine learning, from data creation to system evaluation. He is currently PI of the project BREAKhateDOWN “Toxic Language Understanding in Online Communication”, and among the main proponents of the Perspectivist Data Manifesto.
[1] The Perspectivist Manifesto. https://pdai.info/
[2] Basile et al. 2021. We Need to Consider Disagreement in Evaluation
[3] Cabitza et al. AAAI-23. Toward a perspectivist turn in ground truthing for predictive computing