Natural Language Processing Pipeline for Assessment Data: An R-Based Tutorial

ORCID

Constanza Mardones-Segovia: https://orcid.org/0000-0001-8204-4426

Shiyu Wang: https://orcid.org/0000-0001-7747-1028

Allan S. Cohen: https://orcid.org/0000-0002-8776-9378

Abstract

Natural language processing (NLP) has become an increasingly popular approach for analyzing textual responses in educational assessments. An important part of NLP involves cleaning and structuring examinees' written responses to create input data that conserves the syntax, semantics, and pragmatics of the words, thereby enabling the extraction of these features. This paper provides foundational knowledge on the steps needed for using NLP in educational measurement tasks, guiding researchers and practitioners through text preprocessing, feature extraction, and analyzing textual data from constructed response items. Additionally, an R-based example using Latent Dirichlet Allocation is provided, illustrating each step in the pipeline.

Recommended Citation

Mardones-Segovia, Constanza; Wang, Shiyu; and Cohen, Allan S. (2025) "Natural Language Processing Pipeline for Assessment Data: An R-Based Tutorial," Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语期刊: Vol. 6: Iss. 2, Article 3.
DOI: https://doi.org/10.59863/SDYZ2049
Available at: https://www.ce-jeme.org/journal/vol6/iss2/3

DOI

https://doi.org/10.59863/SDYZ2049

Download

Included in

Quantitative Psychology Commons

COinS

Natural Language Processing Pipeline for Assessment Data: An R-Based Tutorial

ORCID

Abstract

Recommended Citation

DOI

Included in

Special Issues:

Search

Natural Language Processing Pipeline for Assessment Data: An R-Based Tutorial

Authors

ORCID

Abstract

Recommended Citation

DOI

Included in

Share

Special Issues:

Search