ORCID
Yue Huang: https://orcid.org/0000-0003-2175-9852
Corey Palermo: https://orcid.org/0000-0003-1921-5127
Abstract
Automated writing evaluation (AWE) has long supported assessment and instruction, yet existing systems struggle to capture deeper rhetorical and pedagogical aspects of student writing. Recent advances in generative language models (GLMs) such as GPT and Llama present new opportunities, but their effectiveness remains uncertain. This review synthesizes 29 studies on automated essay scoring and 14 on automated writing feedback generation, examining how GLMs are applied through prompting, fine-tuning, and adaptation. Findings show GLMs can approximate human scoring and deliver richer, rubric-aligned feedback, but fairness, validity, and ethical issues remain largely unaddressed. We conclude that GLMs hold promise to enhance AWE, provided that future work establishes robust evaluation frameworks and safeguards to ensure responsible, equitable use.
Recommended Citation
Huang, Yue; Palermo, Corey; Liu, Ruitao; and He, Yong
(2025)
"An Early Review of Generative Language Models in Automated Writing Evaluation: Advancements, Challenges, and Future Directions for Automated Essay Scoring and Feedback Generation,"
Chinese/English Journal of Educational Measurement and Evaluation | 教育测量与评估双语期刊: Vol. 6:
Iss.
2, Article 5.
DOI: https://doi.org/10.59863/FAMJ7696
Available at:
https://www.ce-jeme.org/journal/vol6/iss2/5
DOI
https://doi.org/10.59863/FAMJ7696
Included in
Educational Assessment, Evaluation, and Research Commons, Educational Technology Commons