Seminar: "SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation"

Abstract

Smartphone control agents, based on (Multimodal) Large Language Models, operate smartphones in a human-like manner by observing the screen and performing actions such as tapping or typing. These agents hold great promise for assisting users with everyday tasks, from setting alarms to booking hotels. In this talk, I will present SPA-Bench, our newly proposed benchmark for systematic smartphone agent evaluation, accepted as an ICLR 2025 Spotlight. SPA-Bench supports agent interaction with Android devices across a wide range of realistic tasks in both English and Chinese, and features a scalable, automated evaluation pipeline. I will share insights from evaluating eleven existing trending agents and discuss how SPA-Bench helps identify their strengths and limitations, offering a foundation for future research and real-world deployment.

Date
Mar 27, 2025 13:00 — 14:00
Location
Abacws

Invited Speaker: Jingxuan Chen (Huawei)

Nedjma Ousidhoum
Nedjma Ousidhoum
Lecturer