Title: Modelling Computer-Based Formative Assessment Data With Graph Neural Networks Authors: Benjamin Garzón Jimenez De Cisneros, Lisi Qarkaxhija, Vincenzo Perri, Ingo Scholtes, Martin J. Tomasik Abstract: Computer-based formative feedback (CBFA) systems are software tools designed to enable tasks of data collection and performance evaluation in the classroom with the aim of providing feedback and supporting instructional decisions. CBFA systems enable acquiring large-scale datasets that can be used to study academic abilities and their development in everyday settings, as opposed to the less ecological conditions found in standardised assessments. In the present work, we model a dataset obtained from the MINDSTEPS CBFA system, which serves a population of tens of thousands of students in Northwestern Switzerland. The system includes an item bank covering topics and competences over several years of mandatory schooling, from grade 3 to grade 9. The items span four school subjects: mathematics, German, English and French, with a further sub-categorisation in competence domains (e.g., German grammar or German reading). The dataset analysed contains over 20 million responses from ~ 89000 students in ~ 18000 different items, forming a large and highly sparse student-item response matrix. A natural representation in this scenario is a graph in which nodes stand for students or items and an edge between a student and an item represents a particular response of the student to that particular item (edge label: correct/incorrect). The task that concerns us is to learn to predict unobserved edge labels from observed ones. For this purpose, we resort to graph neural networks (GNNs), a class of machine learning methods recently developed to model graph-structured data for node-level or edge-level prediction tasks. GNNs constitute a more expressive alternative to other techniques that are typically used for test scoring (e.g., item response theory), which may not be flexible enough to capture the complexity of large-scale datasets. The specific model we use consists of an encoder module with two graph convolutional layers followed by a decoder module which outputs the probability of a correct response. Nodes and edges are represented as embeddings in a multidimensional space, and the model also can incorporate features at the student (e.g., gender, mother tongue), item (e.g., competence domain) and response (e.g., age when responding) levels. After fitting the GNN, the learned item embeddings recover properties of the school curriculum: item embeddings of domain competences that belong to the same subject tend to cluster together, as do the embeddings of sub-competences within a competence domain. The main source of variation in these item embeddings corresponds to item difficulty. Besides, the dimension accounting for most of the variation in edge embeddings shows a common age progression across subjects, revealing the increase in ability over time. The model parameters, which capture both the structure of the academic curriculum and the evolution of abilities, can thus be used to inform curriculum development in a data-driven manner and examine learning trajectories. We conclude by discussing advantages and disadvantages of the proposed approach with respect to more established alternatives.