Title: Simulated and Real Data Examples for Teaching Psychometrics

Authors: Patricia Martinkova, Adela Drabinova

Affiliation: Institute of Computer Science, Czech Academy of Sciences; Faculty
of Education, Charles University; Faculty of Mathematics and Physics, Charles
University

Abstract:
Teaching psychometric concepts and methods to groups of various backgrounds
may be a challenge. Interesting and relevant examples illustrating important
concepts may help enforce understanding, keep student attention or even
motivate research developing more flexible methods. In this talk we describe
several such examples. First, we introduce two cases pointing to the
importance of differential item functioning (DIF) detection (Martinkova,
Drabinova, et al., 2018). While some practitioners have tried to base proofs
of test fairness on insignificance of differences in total score, we provide
example showing that, hypothetically, two groups may have an identical
distribution of total scores, yet there may be a DIF and thus potentially
unfair item present in the data. Contrary, a real data example is provided,
whereby the two groups differ significantly in their overall ability, yet no
item is detected as DIF. Second, we describe real-data example motivating
development of a more flexible model-based estimate of inter-rater reliability
(IRR) (Martinkova, Goldhaber, & Erosheva, 2018). In this example, IRR
calculated on stratified data is not able to detect group differences, while a
more flexible model-based approach shows significant difference in IRR.
Examples will be demonstrated with an R package ShinyItemAnalysis (Martinkova
& Drabinova, 2018) providing interactive features useful for teaching
psychometrics.

References:
Martinkova P, Drabinova A, Liaw Y-L, Sanders EA, McFarland JL, & Price RM
(2017). Checking equity: Why DIF analysis should be a routine part of
developing conceptual assessments. CBE Life Sciences Education, 16 (2), rm2.
http://dx.doi.org/10.1187/cbe.16-10-0307

Martinkova P, Goldhaber D, & Erosheva E (2018). Disparities in ratings of
internal and external applicants: A case for model-based inter-rater
reliability. PLOS ONE, 13(10), e0203002.
https://doi.org/10.1371/journal.pone.0203002

Martinkova P, & Drabinova A (2018). ShinyItemAnalysis for teaching
psychometrics and to enforce routine analysis of educational tests. The R
Journal. Accepted.