Skip to main content Skip to navigation

Prompting Science Report 2: The Decreasing Value of Chain of Thought in Prompting

Project Overview

The document explores the role of generative AI, specifically large language models (LLMs), in education, focusing on the effectiveness of Chain-of-Thought (CoT) prompting. It finds that while CoT prompting can significantly enhance the performance of non-reasoning models, it may lead to increased variability and errors, raising concerns about its reliability. For reasoning models, the advantages of CoT prompting are minimal, often not offsetting the additional processing time and resource expenditure required. Consequently, the report advocates for a careful and balanced approach to integrating AI in educational settings, emphasizing the need to consider both the potential improvements in reasoning capabilities and the inherent challenges. Overall, the findings indicate that while generative AI holds promise for educational applications, its implementation requires thoughtful consideration to ensure effective outcomes.

Key Applications

Chain-of-Thought (CoT) prompting in large language models (LLMs)

Context: Educational testing with PhD-level questions across biology, physics, and chemistry

Implementation: Implemented across multiple trials using the GPQA Diamond dataset, comparing various prompting methods.

Outcomes: Improved average performance in non-reasoning models, with notable improvements in certain models like Gemini Flash 2.0 and Sonnet 3.5. However, reasoning models showed only marginal gains.

Challenges: Increased response times and potential decrease in accuracy due to variability introduced by CoT prompting.

Implementation Barriers

Technical

The requirement for more tokens and longer processing times when using CoT prompting.

Proposed Solutions: Evaluate the balance between desired accuracy improvements and acceptable response latency.

Effectiveness

CoT prompting may not yield significant benefits for reasoning models, raising questions about its utility.

Proposed Solutions: Use tailored CoT prompts aimed at specific problems rather than generic approaches.

Project Team

Lennart Meincke

Researcher

Ethan Mollick

Researcher

Lilach Mollick

Researcher

Dan Shapiro

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Lennart Meincke, Ethan Mollick, Lilach Mollick, Dan Shapiro

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies