Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks
Project Overview
The document explores the role of generative AI, specifically large language models (LLMs), in education, emphasizing their potential to replicate human-like reasoning in tasks such as planning and generating explanations. It acknowledges the limitations of LLMs, particularly their difficulties with novel and out-of-distribution challenges, which can hinder their effectiveness in educational applications. To address these shortcomings, the document advocates for a hybrid model that combines LLMs with symbolic reasoning techniques, suggesting that this integration could enhance the overall learning experience and improve educational outcomes. Through these advancements, generative AI has the potential to transform teaching and learning processes, making them more adaptive and personalized, while also acknowledging the need for careful implementation to ensure reliability and accuracy in educational contexts.
Key Applications
Hybrid Parse-and-Solve model
Context: Educational settings focused on reasoning tasks involving planning and explanations, targeting students and researchers in AI.
Implementation: A benchmark task was created to evaluate human and LLM performance in generating plans and explanations. The LLM was augmented with a symbolic reasoning module to improve adaptability to complex problems.
Outcomes: The hybrid model showed improved robustness in generating solutions compared to LLMs alone, particularly in constrained conditions requiring novel responses.
Challenges: LLMs struggle with common-sense reasoning and generating coherent solutions for novel problems, leading to significant performance gaps compared to human responses.
Implementation Barriers
Technical
Large language models (LLMs) are less robust at solving tasks that require novel reasoning beyond predictable responses.
Proposed Solutions: Implementing hybrid models that combine statistical language processing with symbolic reasoning to enhance problem-solving capabilities.
Project Team
Katherine M. Collins
Researcher
Catherine Wong
Researcher
Jiahai Feng
Researcher
Megan Wei
Researcher
Joshua B. Tenenbaum
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Katherine M. Collins, Catherine Wong, Jiahai Feng, Megan Wei, Joshua B. Tenenbaum
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai