In this talk I will present some initial results on evaluating the spatial reasoning capabilities of Large Language Models (LLMs). Whilst LLMs have shown remarkable apparent abilities in many areas of question answering, their abilities to perform reasoning is less clear. I will present some results, particularly focussing on qualitative spatial representations and reasoning, showing the degree of their capabilities. The approaches include (1) the use of fixed benchmarks; (2) the use of synthetic worlds in which arbitrary configurations can be set up and the correct answer easily determined; (3) conducting an extended conversation (which we call “dialectical evaluation”) to probe the limits of the LLM capabilities.
Invited Speaker: Anthony Cohn (University of Leeds)