Improving Embeddings Representations for Comparing Higher Education Curricula: A Use Case in Computing


We propose an approach for comparing curricula of study programs in higher education. Pre-trained word embeddings are fine-tuned in a study program classification task, where each curriculum is represented by the names and content of its courses. By combining metric learning with a novel course-guided attention mechanism, our method obtains more accurate curriculum representations than strong baselines. Experiments on a new dataset with curricula of computing programs demonstrate the intuitive power of our approach via attention weights, topic modeling, and embeddings visualizations. We also present a use case comparing computing curricula from USA and Latin America to showcase the capabilities of our improved embeddings representations.

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Fernando Alva-Manchego
Fernando Alva-Manchego

My research interests include text adaptation, evaluation of natural language generation, and NLP for education.