Title: Simulated and Real Data Examples for Teaching Psychometrics Authors: Patricia Martinkova, Adela Drabinova Affiliation: Institute of Computer Science, Czech Academy of Sciences; Faculty of Education, Charles University; Faculty of Mathematics and Physics, Charles University Abstract: Teaching psychometric concepts and methods to groups of various backgrounds may be a challenge. Interesting and relevant examples illustrating important concepts may help enforce understanding, keep student attention or even motivate research developing more flexible methods. In this talk we describe several such examples. First, we introduce two cases pointing to the importance of differential item functioning (DIF) detection (Martinkova, Drabinova, et al., 2018). While some practitioners have tried to base proofs of test fairness on insignificance of differences in total score, we provide example showing that, hypothetically, two groups may have an identical distribution of total scores, yet there may be a DIF and thus potentially unfair item present in the data. Contrary, a real data example is provided, whereby the two groups differ significantly in their overall ability, yet no item is detected as DIF. Second, we describe real-data example motivating development of a more flexible model-based estimate of inter-rater reliability (IRR) (Martinkova, Goldhaber, & Erosheva, 2018). In this example, IRR calculated on stratified data is not able to detect group differences, while a more flexible model-based approach shows significant difference in IRR. Examples will be demonstrated with an R package ShinyItemAnalysis (Martinkova & Drabinova, 2018) providing interactive features useful for teaching psychometrics. References: Martinkova P, Drabinova A, Liaw Y-L, Sanders EA, McFarland JL, & Price RM (2017). Checking equity: Why DIF analysis should be a routine part of developing conceptual assessments. CBE Life Sciences Education, 16 (2), rm2. http://dx.doi.org/10.1187/cbe.16-10-0307 Martinkova P, Goldhaber D, & Erosheva E (2018). Disparities in ratings of internal and external applicants: A case for model-based inter-rater reliability. PLOS ONE, 13(10), e0203002. https://doi.org/10.1371/journal.pone.0203002 Martinkova P, & Drabinova A (2018). ShinyItemAnalysis for teaching psychometrics and to enforce routine analysis of educational tests. The R Journal. Accepted.