Seminar: "Prudent NLG Evaluation with Humans"

Abstract

Annually, research teams spend large amounts of money to evaluate the quality of NLG systems (WMT for machine translation, inter alia). We’ll first look at how to speed up and improve the quality of the annotators’ work by pre-filling annotations with automatic quality estimation (ESA, ESAᴬᴵ). In the second part, we’ll take the automatization a step further and try to determine which segments do not need to be evaluated at all. For this, we make use of methods from psychometrics for efficient yet informative testset construction for human students. In our case, the students to be tested are NLG systems.

Date
Jan 16, 2025 13:00 — 14:00
Location
Abacws

Invited Speaker: Vilém Zouhar (ETH Zürich, Switzerland)

Bio: Vilém is a PhD student at ETH Zürich working on both human and automatic evaluation of MT/NLG systems, balancing costs, quality, and bias.

Fernando Alva-Manchego
Fernando Alva-Manchego
Lecturer

My research interests include text adaptation, evaluation of natural language generation, and NLP for education.