ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
Project Overview
The document presents ARES, a hybrid algorithm that integrates Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) to bolster multi-modal reasoning in AI applications within education. By leveraging feedback from advanced AI models such as GPT-4 and Claude 3 Opus, ARES aims to enhance the quality of rationale reasoning, specifically in educational datasets like ScienceQA and A-OKVQA. This innovative approach tackles the limitations of existing reinforcement learning methods, showcasing significant improvements in model performance, reasoning quality, and inference accuracy. Overall, the findings indicate that generative AI, through techniques like ARES, can effectively support educational tools by providing more accurate and contextually relevant responses, thereby enriching the learning experience and aiding educators in delivering personalized instruction.
Key Applications
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning
Context: Educational settings utilizing multi-modal datasets such as ScienceQA and A-OKVQA, targeting students from elementary to high school levels.
Implementation: The ARES algorithm alternates between RL using sentence-level feedback from an AI model and SFT for correcting errors, stabilizing the model's outputs.
Outcomes: Achieved around 70% win rate in rationale reasoning quality compared to baseline models and a 2.5% increase in inference answer accuracy.
Challenges: Challenges include hyperparameter tuning during RL leading to repetitive or truncated sentences and the need for correction feedback to stabilize the model.
Implementation Barriers
Technical barrier
Instability in Reinforcement Learning that requires extensive hyperparameter tuning, leading to issues like repetitive and truncated sentences.
Proposed Solutions: Using Supervised Fine-Tuning after RL to correct errors and stabilize the model.
Resource barrier
Cost and usage limits associated with API access to advanced AI models for feedback.
Proposed Solutions: Developing public or lower-cost alternatives for accessing necessary AI feedback.
Knowledge barrier
Difficulty in addressing complex tasks requiring external knowledge beyond the capabilities of the model.
Proposed Solutions: Future research to integrate external knowledge sources into the model.
Project Team
Ju-Seung Byun
Researcher
Jiyun Chun
Researcher
Jihyung Kil
Researcher
Andrew Perrault
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Ju-Seung Byun, Jiyun Chun, Jihyung Kil, Andrew Perrault
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai