Title: Implementing Latent Semantic Analysis Functions in R Author: Fritz Günther Affiliation: University of Tuebingen Abstract: Latent Semantic Analysis (LSA, Landauer & Dumais, 1997) belongs to a class of models for semantics called Distributional Semantic Models. In such a model, words and documents are represented through vectors in a high-dimensional vector space, derived from a large corpus of text documents. This representation allows for a formal computation of the semantic similarity of words and documents, for example by using the cosine value between two such vectors. The presented package comes with three different 300-dimensional LSA spaces based on blog entries, newspaper articles or literature, each consisting of about 50,000 documents in German language. It also provides a collections of functions to perform computations on the basis of these spaces: For instance, the semantic neighbourhood of a given word and document can be computed. Such a semantic neighbourhood can also be plotted in a three-dimensional approximation of this part of the LSA space. Furthermore, pairwise similarities can be computed for two given vectors of words, as well as the similarity between two whole text documents and the similarities between a whole text document and a vector of words. The latter can for example be used for an automatic chracterization of books using a list of keywords. The package also includes functions for the computation of a document's coherence (adapted from Landauer & Dumais, 1997) and for several computation methods for vectors of two-word phrases (adapted from Mitchell & Lapata, 2008), including the predication process by Kintsch (2001). Such vectors for two-word phrases can also be used in the aforementioned functions of the package. References: Kintsch, W. (2001). Predication. Cognitive science, 25, 173-202. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240. Mitchell, J., & Lapata, M. (2008). Vector-based models of semantic composition. In J. Moore, S. Teufel, J. Aller, & S. Furui (Eds.). Proceedings of ACL-08: HLT (pp. 236-244). Columbus, OH: ACL Press.