Title: To Split or to Mix? Uncovering Group Structures with Trees and Finite Mixture Models Authors: Hannah Frick (1), Carolin Strobl (2), Achim Zeileis (1) Affiliation: (1) Universität Innsbruck, (2) Universität Zürich, Abstract: Measurement invariance -- or parameter stability in a more general setting -- is an important assumption in psychometric models. A large variety of tests and other approaches have been developed to check this assumption. However, most approaches require the researcher to make a specification of the subjects groups to be checked. Model-based recursive partitioning and finite mixture models are two ways to establish such groups in a data-driven way. Finite mixture models can be used to detect latent classes that are associated with different sets of parameters. Class memberships can be estimated from data, even in the absence of any covariates providing information which observation belongs to which class. However, if covariate information is available, it can be leveraged to improve estimates of the class membership through a concomitant variable model. In contrast, model-based recursive partitioning conducts binary sample splits as long as parameter stability is found in the available covariates. In the resulting tree structure, one set of model parameters holds for each terminal node/subject cluster. So while both methods can be used to establish groups of subjects for which a common set of parameters holds, they differ in the way potential covariates are utilized to establish those groups: from a covariate-free plain mixture to a smooth transition between latent classes in a mixture with concomitant model to sample splits in trees. In addition, the number of classes in a mixture model is typically selected via information criteria, while the number of terminal nodes of a tree is usually based on the significance tests utilized in each splitting decision. Given that both mixtures and trees are an option for uncovering potential groupings a natural question is: How do they compare? We assess their relative performances in Bradley-Terry models for paired comparisons using various patterns and intensities of differences in the preference scales between the latent classes: How does the power of detecting groups with different scales compare to a false alarm rate in the absence of groups? How well are the groups recovered? How well are the parameters recovered? Relative advantages and disadvantages of the two approaches are investigated in a simulation study and illustrated using empirical examples.