Large language models (LLMs) are really effective when adapted in various downstream NLP tasks. However, pre-training requires access to large compute resources. In this talk, I will present our work on (1) speeding up pre-training with simple objectives compared to the widely used masked language modelling, (2) how the choice of pre-training objective affects LLMs ability in capturing linguistic information, and (3) how we can use an unlimited vocabulary with only a relatively small number of parameters.
Invited Speaker: Nikos Aletras