Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants
Project Overview
The document examines the role of generative AI in education, specifically focusing on four AI-based code assistants: GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in the context of generating Java methods. It highlights the potential of these tools to enhance productivity in software development through code suggestions, with GitHub Copilot emerging as the most effective among them. However, the study also points out significant challenges, such as difficulties in handling complex code dependencies and a high rate of generated code being incorrect or invalid. These findings indicate a need for collaboration among different AI tools to improve code quality, as the generated outputs often fall short of developer standards. Overall, the paper underscores both the promise and limitations of using generative AI in educational settings, particularly in teaching programming and software development skills.
Key Applications
AI-based code assistants (GitHub Copilot, Tabnine, ChatGPT, Google Bard)
Context: Software development, targeting programmers and developers
Implementation: Tools were assessed by generating Java methods from real-world projects on GitHub, analyzing correctness, complexity, efficiency, and adherence to developer standards.
Outcomes: Copilot generated the most correct methods (32%), while ChatGPT (23%), Bard (15%), and Tabnine (13%) were less effective. The code generated sometimes improved upon developer implementations but often struggled with dependencies.
Challenges: Generated code frequently had issues related to correctness, especially when dealing with inter-class dependencies, and there was a significant gap in generating code that adhered to the developers' style.
Implementation Barriers
Technical barrier
AI-based code assistants struggle with generating code that has dependencies on other parts of the codebase, leading to high rates of incorrect or invalid code. A substantial portion of the generated code is either incorrect or invalid, with 53% of methods being non-functional for the best performing assistant.
Proposed Solutions: Improving the algorithms to better handle context and dependencies, exploring collaboration between multiple AI assistants, and implementing rigorous testing and validation frameworks to enhance the training data of AI models.
Cultural barrier
AI-generated code often does not adhere to the coding styles and practices preferred by developers.
Proposed Solutions: Developing AI models that can learn and adapt to specific coding styles of projects.
Project Team
Vincenzo Corso
Researcher
Leonardo Mariani
Researcher
Daniela Micucci
Researcher
Oliviero Riganelli
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Vincenzo Corso, Leonardo Mariani, Daniela Micucci, Oliviero Riganelli
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai