REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning
Project Overview
The document explores the innovative use of generative AI in education through the development of REMOR, an AI-driven system designed to generate peer reviews that are deeper and more constructive. It addresses significant challenges faced by existing AI peer review systems, including the tendency to provide superficial feedback and the presence of biases. By leveraging large language models (LLMs) and employing reasoning alongside multi-objective reinforcement learning, REMOR aims to automate the peer review process while enhancing the relevance and quality of the reviews produced. The paper details the training methodology, evaluation criteria, and the comparative performance of REMOR against both human-generated and other AI-generated reviews, demonstrating its potential to improve educational assessment and foster a more effective learning environment. Overall, the findings suggest that REMOR could significantly advance the peer review process in educational contexts, offering a promising alternative to traditional methods.
Key Applications
REMOR: Automated Peer Review Generation
Context: Academic peer review process for scientific manuscripts, targeting researchers and reviewers in the scientific community.
Implementation: The system uses a multi-objective reinforcement learning approach, fine-tuning a reasoning LLM on a novel dataset called PeerRT, which includes enriched peer reviews and reasoning traces.
Outcomes: REMOR generates peer reviews that achieve higher quality metrics than both human reviewers and existing AI systems. It addresses issues of reviewer fatigue and superficiality in feedback.
Challenges: Limitations include a narrow focus in the dataset, potential biases in training data, and the reliance on a constructed reward function rather than direct human feedback.
Implementation Barriers
Technical Barrier
The lack of a comprehensive dataset for training AI systems leads to biases and limited contextual understanding.
Proposed Solutions: Future work aims to enhance the dataset by incorporating multimodal information such as citations and figures.
Evaluation Barrier
Evaluating the quality of AI-generated peer reviews is complex and subjective, relying on constructed reward metrics rather than direct human feedback.
Proposed Solutions: A blinded human evaluation by experts is proposed to confirm the findings and enhance the evaluation process.
Project Team
Pawin Taechoyotin
Researcher
Daniel Acuna
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Pawin Taechoyotin, Daniel Acuna
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai