ORCID
Constanza Mardones-Segovia: https://orcid.org/0000-0001-8204-4426
Shiyu Wang: https://orcid.org/0000-0001-7747-1028
Allan S. Cohen: https://orcid.org/0000-0002-8776-9378
Abstract
Natural language processing (NLP) has become an increasingly popular approach for analyzing textual responses in educational assessments. An important part of NLP involves cleaning and structuring examinees' written responses to create input data that conserves the syntax, semantics, and pragmatics of the words, thereby enabling the extraction of these features. This paper provides foundational knowledge on the steps needed for using NLP in educational measurement tasks, guiding researchers and practitioners through text preprocessing, feature extraction, and analyzing textual data from constructed response items. Additionally, an R-based example using Latent Dirichlet Allocation is provided, illustrating each step in the pipeline.
Recommended Citation
Mardones-Segovia, Constanza; Wang, Shiyu; and Cohen, Allan S.
(2025)
"Natural Language Processing Pipeline for Assessment Data: An R-Based Tutorial,"
Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语期刊: Vol. 6:
Iss.
2, Article 3.
DOI: https://doi.org/10.59863/SDYZ2049
Available at:
https://www.ce-jeme.org/journal/vol6/iss2/3
DOI
https://doi.org/10.59863/SDYZ2049