Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback

Project Overview

The document explores the innovative use of Reinforcement Learning with Human and AI Feedback (RLHAIF) to enhance the capabilities of Large Language Models (LLMs) in addressing complex physics problems within the educational sphere. It identifies the inherent challenges that LLMs encounter in reasoning, specifically in the context of physics, and proposes RLHAIF as a solution to elevate their performance through a synergistic integration of human and AI feedback. The findings indicate that the Mistral-PPO model significantly improved both physics reasoning accuracy and overall reasoning scores, thereby demonstrating the transformative potential of generative AI in education. This research underscores the promising role of AI in fostering better educational outcomes by addressing specific cognitive challenges faced by students in mastering difficult subjects like physics.

Key Applications

Reinforcement Learning with Human and AI Feedback (RLHAIF)

Context: High school physics education, targeting students and educators

Implementation: LLMs were fine-tuned using a dataset of physics questions and answers, integrating human and AI feedback for training the Reward Model and optimizing with reinforcement learning algorithms.

Outcomes: Improved reasoning accuracy and engagement in solving physics problems, with Mistral-PPO achieving a METEOR score of 58.67 and a reasoning score of 0.74.

Challenges: High computational costs, reliance on quality human feedback, and difficulties in generalizing to diverse physics topics.

Implementation Barriers

Technical Barrier

High computational costs associated with training models using reinforcement learning techniques.

Proposed Solutions: Implementing more efficient training methods and exploring computational optimizations.

Resource Barrier

Dependency on high-quality human feedback makes the process resource-intensive. Utilizing AI-generated feedback to augment human inputs can reduce the burden on human evaluators.

Proposed Solutions: Utilizing AI-generated feedback to augment human inputs and reduce the burden on human evaluators.

Generalization Barrier

Challenges in scaling the approach across diverse physics topics and problems. Developing broader training datasets and refining model architectures can enhance adaptability.

Proposed Solutions: Developing broader training datasets and refining model architectures to enhance adaptability.

Project Team

Avinash Anand

Researcher

Kritarth Prasad

Researcher

Chhavi Kirtani

Researcher

Ashwin R Nair

Researcher

Mohit Gupta

Researcher

Saloni Garg

Researcher

Anurag Gautam

Researcher

Snehal Buldeo

Researcher

Rajiv Ratn Shah

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Avinash Anand, Kritarth Prasad, Chhavi Kirtani, Ashwin R Nair, Mohit Gupta, Saloni Garg, Anurag Gautam, Snehal Buldeo, Rajiv Ratn Shah

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

← Back to Projects