Title: Are Longer Verbal Expressions Really Semantically More Similar to Each Other? An Investigation of the Elaboration-Bias in Vector-Based Models of Word Meaning Authors: Forthmann, B., Günther, F., Hass, R., Benedek, M., Doebler, P. Affiliation: Institut für Psychologie in Bildung und Erziehung, WWU Münster Abstract: Vector-based models of word meaning such as latent semantic analysis (LSA), the hyperspace analogue to language model (HAL), and the continuous bag of words model (CBOW) are widely used in cognitive psychology and related disciplines to quantify semantic similarity of verbal expressions. However, applications of LSA to automatic text-grading systems and research on the scoring of divergent thinking tests revealed that semantic similarity can be biased for longer verbal expressions. That is, semantic similarity which is calculated based on vector cosines is a monotone increasing function of the number of words (i.e., the elaboration-bias). We demonstrate that this bias occurs also for HAL and CBOW semantic spaces by means of a simulation study. We further show that using rank or inverse quantile transformations yield unbiased semantic similarities. Importantly, cosines based on transformed semantic spaces yielded at least comparable or better validity results on several benchmarks for both English and German language semantic spaces. The suggested transformations of the semantic spaces were implemented in the statistical software R and the transformed spaces can be easily used for various computations by means of the R package LSAfun.