Re$^2$: A Consistency-ensured Dataset for Full-stage Peer Review and Multi-turn Rebuttal Discussions
Project Overview
The document explores the role of generative AI in education, emphasizing its applications in enhancing the peer review process within academia, particularly in the fields of AI and computer science. It identifies the shortcomings of current peer review datasets and introduces the Re2 dataset as a robust alternative that not only encompasses initial submissions but also features a multi-turn conversation format for rebuttals. This innovative approach aims to improve the quality of feedback generated by Large Language Models (LLMs), enabling them to provide more constructive insights for authors and reviewers. By leveraging such datasets, the document suggests that generative AI can significantly refine academic peer review practices, foster better communication, and ultimately enhance the quality of scholarly work. The findings indicate that the integration of these advanced AI tools can lead to a more efficient and effective review process, contributing to the overall advancement of knowledge in the academic community.
Key Applications
Re2 dataset for peer review and rebuttal discussions
Context: Academic peer review process, targeting researchers and authors in AI and computer science fields.
Implementation: The Re2 dataset was created by crawling publicly accessible papers and their review records from OpenReview, ensuring data consistency by using only initial submissions.
Outcomes: Improved ability of LLMs to assist in peer review, enhancing feedback quality and reducing author resubmission rates.
Challenges: Limited data diversity and quality from existing datasets; challenges in standardizing review formats across different conferences.
Implementation Barriers
Data Quality
Existing peer review datasets are often based on revised submissions rather than initial ones, leading to inconsistencies.
Proposed Solutions: The Re2 dataset ensures that all data consists of initial submissions, improving data reliability.
Data Diversity
Current datasets often lack diversity in data sources, limiting their usefulness for training models.
Proposed Solutions: Re2 includes data from 24 conferences and 21 workshops to enhance diversity.
Complexity of Rebuttal Processes
Many existing datasets do not effectively capture the rebuttal and discussion stages of peer review.
Proposed Solutions: Re2 treats rebuttals as multi-turn conversations, aiming to provide a more realistic training environment for LLMs.
Project Team
Daoze Zhang
Researcher
Zhijian Bao
Researcher
Sihang Du
Researcher
Zhiyi Zhao
Researcher
Kuangling Zhang
Researcher
Dezheng Bao
Researcher
Yang Yang
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Daoze Zhang, Zhijian Bao, Sihang Du, Zhiyi Zhao, Kuangling Zhang, Dezheng Bao, Yang Yang
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai