Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting
Project Overview
The document explores the role of generative AI, specifically large language models (LLMs), in education, focusing on the effectiveness of Chain-of-Thought (CoT) prompting. It finds that while CoT prompting can significantly enhance the performance of non-reasoning models, it may lead to increased variability and errors, raising concerns about its reliability. For reasoning models, the advantages of CoT prompting are minimal, often not offsetting the additional processing time and resource expenditure required. Consequently, the report advocates for a careful and balanced approach to integrating AI in educational settings, emphasizing the need to consider both the potential improvements in reasoning capabilities and the inherent challenges. Overall, the findings indicate that while generative AI holds promise for educational applications, its implementation requires thoughtful consideration to ensure effective outcomes.
Key Applications
Chain-of-Thought (CoT) prompting in large language models (LLMs)
Context: Educational testing with PhD-level questions across biology, physics, and chemistry
Implementation: Implemented across multiple trials using the GPQA Diamond dataset, comparing various prompting methods.
Outcomes: Improved average performance in non-reasoning models, with notable improvements in certain models like Gemini Flash 2.0 and Sonnet 3.5. However, reasoning models showed only marginal gains.
Challenges: Increased response times and potential decrease in accuracy due to variability introduced by CoT prompting.
Implementation Barriers
Technical
The requirement for more tokens and longer processing times when using CoT prompting.
Proposed Solutions: Evaluate the balance between desired accuracy improvements and acceptable response latency.
Effectiveness
CoT prompting may not yield significant benefits for reasoning models, raising questions about its utility.
Proposed Solutions: Use tailored CoT prompts aimed at specific problems rather than generic approaches.
Project Team
Lennart Meincke
Researcher
Ethan Mollick
Researcher
Lilach Mollick
Researcher
Dan Shapiro
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Lennart Meincke, Ethan Mollick, Lilach Mollick, Dan Shapiro
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai