Skip to main content Skip to navigation

Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks

Project Overview

The document explores the role of generative AI, specifically large language models (LLMs), in education, emphasizing their potential to replicate human-like reasoning in tasks such as planning and generating explanations. It acknowledges the limitations of LLMs, particularly their difficulties with novel and out-of-distribution challenges, which can hinder their effectiveness in educational applications. To address these shortcomings, the document advocates for a hybrid model that combines LLMs with symbolic reasoning techniques, suggesting that this integration could enhance the overall learning experience and improve educational outcomes. Through these advancements, generative AI has the potential to transform teaching and learning processes, making them more adaptive and personalized, while also acknowledging the need for careful implementation to ensure reliability and accuracy in educational contexts.

Key Applications

Hybrid Parse-and-Solve model

Context: Educational settings focused on reasoning tasks involving planning and explanations, targeting students and researchers in AI.

Implementation: A benchmark task was created to evaluate human and LLM performance in generating plans and explanations. The LLM was augmented with a symbolic reasoning module to improve adaptability to complex problems.

Outcomes: The hybrid model showed improved robustness in generating solutions compared to LLMs alone, particularly in constrained conditions requiring novel responses.

Challenges: LLMs struggle with common-sense reasoning and generating coherent solutions for novel problems, leading to significant performance gaps compared to human responses.

Implementation Barriers

Technical

Large language models (LLMs) are less robust at solving tasks that require novel reasoning beyond predictable responses.

Proposed Solutions: Implementing hybrid models that combine statistical language processing with symbolic reasoning to enhance problem-solving capabilities.

Project Team

Katherine M. Collins

Researcher

Catherine Wong

Researcher

Jiahai Feng

Researcher

Megan Wei

Researcher

Joshua B. Tenenbaum

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Katherine M. Collins, Catherine Wong, Jiahai Feng, Megan Wei, Joshua B. Tenenbaum

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies