Transformer language models, the cornerstone of the recently popular large language models (LLMs), have revolutionized the fields of artificial intelligence (AI) and natural language processing (NLP). However, we still understand relatively little about their computational mechanisms and the reasons for their effectiveness. The internal workings of these models remain enigmatic ‘black boxes’. Nevertheless, we are living in an exciting era of interpretability research. We are on the cusp of unlocking this black box and translating interpretability into practical impact: from controlling reasoning trajectories to unlocking performance gains, to steering models toward safer behavior. In this talk, I will focus on a central transition that has begun to take shape: from attempts to explain LLMs through pseudo-cognitive theories to interpreting model on their own terms. This transition has led to some exciting new developments in areas such as circuit discovery and representational geometry, and has spawned a range of promising applications to improve the safety, robustness, and even performance of contemporary LLMs. Taken together, these developments point to a promising new direction of composable modular intelligence, where we identify ’naturally’ appearing structures as the modular building blocks for composition and targeted modification. This points toward a future in which AI systems are not only powerful, but also controllable and interpretable.
Invited Speaker: Jingcheng (Frank) Niu (UKP, TU Darmstadt)
Short Bio: Jingcheng (Frank) Niu is a Postdoctoral Researcher at the UKP Lab at TU Darmstadt, where he works on mechanistic interpretability to uncover the internal structures and computational mechanisms that enable large language models (LLMs) to perform a wide range of tasks. He received his PhD in Computer Science from the University of Toronto. His research has been recognized with multiple awards, including an ACL 2025 Outstanding Paper Award, a TMLR J2C Certification.