In this talk, I will first introduce MERT, an acoustic music understanding model based on large-scale self-supervised training and is akin to the “BERT moment” in NLP, but for Music. We have successfully trained a family of MERT models (with model sizes including 95M, 330M, and 1B parameters), which demonstrated excellent performance on 14 Music Information Retrieval (MIR) tasks. Since releasing on HuggingFace in mid-2023, MERT has received over 500K downloads. To address the significant absence of a universal and community-driven benchmark for music understanding, we further developed MARBLE, a universal MIR benchmark. MARBLE facilitates the benchmarking of pre-trained music models for 18 tasks (with more being added) on 12 publicly available datasets, offering an easy-to-use, extendable, and reproducible evaluation suite for this burgeoning community.
Invited Speaker: Chenghua Lin (University of Manchester)
Bio: Chenghua Lin is Professor of Natural Language Processing in the Department of Computer Science at The University of Manchester. His research focuses on integrating NLP and machine learning for language generation and understanding, with current key interests including AI for science, robustness in LLMs, evaluation methods and benchmarks, metaphor processing, and representation learning for music. He currently serves as the Secretary of the ACL SIGGEN Board, a member of the IEEE Speech and Language Processing Technical Committee, and is a founding advisor for the Multimodal Art Projection community. He has received several awards for his research, including the CIKM Test-of-Time Award and the INLG Best Paper Runner-up Award. He has also held numerous program and chairing roles for *ACL conferences, and is the lead organiser of the 1st and 2nd editions of the Lay Summarisation shared task, co-located with the BioNLP Workshop.