Computational methods can significantly predict humans’ scoring of story recall details

Poster Session D - Monday, March 9, 2026, 8:00 – 10:00 am PDT, Fairview/Kitsilano Ballroom
Also presenting in Data Blitz Session 1 - Saturday, March 7, 2026, 10:30 am – 12:00 pm PST, Salon ABC.

Sevda Hasanli¹ (), Mete Ismayilzada², Vanessa Taler¹, Patrick Davidson¹; ¹School of Psychology, University of Ottawa, ²EPFL - Swiss Federal Technology Institute of Lausanne

Story recall remains a primary method of assessing episodic memory, but manual scoring of participants’ responses can be laborious and time consuming. Computational methods offer an efficient, scalable, automated alternative for recall scoring. We asked whether two computational approaches could match traditional human scoring of story recall. The first approach employed a classical natural language processing (NLP) pipeline using rule-based techniques, including n-gram extraction, named entity recognition, and dependency parsing (Python Natural Language Toolkit). The second approach leveraged the advanced language understanding capabilities of large language models (LLMs). We used the classical NLP method and two LLMs (OpenAI’s GPT-4o and Google’s Gemini 2.0/2.5) separately to score n = 160 young and n = ~95 older adults’ immediate and delayed recall of pairs of stories created by Taler et al. (2021). Responses were classified as veridical (i.e., word-for-word), gist (i.e., general idea), or distorted (i.e., clearly errors). Remarkably, both approaches closely matched the human scoring of veridical recall. The classical NLP pipeline failed to capture gist and distortion scores, whereas the LLMs demonstrated moderate correlations with human ratings on both dimensions. We explored different LLM prompting methods and received similar results each time. Computational methods are approaching human-level performance in scoring memory veridical recall, though they still need improvement in capturing the subtleties of memory errors. Further development of these automated tools will help make cognitive and neuropsychological testing more efficient and accessible, and may be especially useful with large datasets.

Topic Area: LONG-TERM MEMORY: Episodic

CNS Account Login

March 7 – 10, 2026

CNS Account

Computational methods can significantly predict humans’ scoring of story recall details

CNS Account Login

Recent Posts

Archives