Skip to main content Skip to navigation

REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning

Project Overview

The document explores the innovative use of generative AI in education through the development of REMOR, an AI-driven system designed to generate peer reviews that are deeper and more constructive. It addresses significant challenges faced by existing AI peer review systems, including the tendency to provide superficial feedback and the presence of biases. By leveraging large language models (LLMs) and employing reasoning alongside multi-objective reinforcement learning, REMOR aims to automate the peer review process while enhancing the relevance and quality of the reviews produced. The paper details the training methodology, evaluation criteria, and the comparative performance of REMOR against both human-generated and other AI-generated reviews, demonstrating its potential to improve educational assessment and foster a more effective learning environment. Overall, the findings suggest that REMOR could significantly advance the peer review process in educational contexts, offering a promising alternative to traditional methods.

Key Applications

REMOR: Automated Peer Review Generation

Context: Academic peer review process for scientific manuscripts, targeting researchers and reviewers in the scientific community.

Implementation: The system uses a multi-objective reinforcement learning approach, fine-tuning a reasoning LLM on a novel dataset called PeerRT, which includes enriched peer reviews and reasoning traces.

Outcomes: REMOR generates peer reviews that achieve higher quality metrics than both human reviewers and existing AI systems. It addresses issues of reviewer fatigue and superficiality in feedback.

Challenges: Limitations include a narrow focus in the dataset, potential biases in training data, and the reliance on a constructed reward function rather than direct human feedback.

Implementation Barriers

Technical Barrier

The lack of a comprehensive dataset for training AI systems leads to biases and limited contextual understanding.

Proposed Solutions: Future work aims to enhance the dataset by incorporating multimodal information such as citations and figures.

Evaluation Barrier

Evaluating the quality of AI-generated peer reviews is complex and subjective, relying on constructed reward metrics rather than direct human feedback.

Proposed Solutions: A blinded human evaluation by experts is proposed to confirm the findings and enhance the evaluation process.

Project Team

Pawin Taechoyotin

Researcher

Daniel Acuna

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Pawin Taechoyotin, Daniel Acuna

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies