Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items

Maggie Beiting-Parrish, The Federation of American ScientistsFollow
John Whitmer, The Federation of American ScientistsFollow

Abstract

Natural language processing (NLP) is widely used to predict human scores for open-ended student assessment responses in various content areas (Johnson et al., 2022). Ensuring algorithmic fairness based on student demographic background factors is crucial (Madnani et al., 2017). This study presents a fairness analysis of six top-performing entries from a data challenge involving 20 NAEP reading comprehension items that were initially analyzed for fairness based on race/ethnicity and gender. This study describes additional fairness evaluation including English Language Learner Status (ELLs), Individual Education Plans, and Free/Reduced-Price Lunch. Several items showed lower accuracy for predicted scores, particularly for ELLs. This study recommends considering additional demographic factors in fairness scoring evaluations and that fairness analysis should consider multiple factors and contexts.

Recommended Citation

Beiting-Parrish, Maggie and Whitmer, John (2023) "Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items," Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语期刊: Vol. 4: Iss. 3, Article 5.
DOI: https://doi.org/10.59863/NKCJ9608
Available at: https://www.ce-jeme.org/journal/vol4/iss3/5

DOI

https://doi.org/10.59863/NKCJ9608

Download

Included in

Accessibility Commons, Educational Assessment, Evaluation, and Research Commons, Educational Methods Commons, Educational Psychology Commons, Educational Technology Commons, Language and Literacy Education Commons

COinS

Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items

Abstract

Recommended Citation

DOI

Included in

Special Issues:

Search

Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items

Authors

Abstract

Recommended Citation

DOI

Included in

Share

Special Issues:

Search