Skip to main content Skip to navigation

Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants

Project Overview

The document examines the role of generative AI in education, specifically focusing on four AI-based code assistants: GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in the context of generating Java methods. It highlights the potential of these tools to enhance productivity in software development through code suggestions, with GitHub Copilot emerging as the most effective among them. However, the study also points out significant challenges, such as difficulties in handling complex code dependencies and a high rate of generated code being incorrect or invalid. These findings indicate a need for collaboration among different AI tools to improve code quality, as the generated outputs often fall short of developer standards. Overall, the paper underscores both the promise and limitations of using generative AI in educational settings, particularly in teaching programming and software development skills.

Key Applications

AI-based code assistants (GitHub Copilot, Tabnine, ChatGPT, Google Bard)

Context: Software development, targeting programmers and developers

Implementation: Tools were assessed by generating Java methods from real-world projects on GitHub, analyzing correctness, complexity, efficiency, and adherence to developer standards.

Outcomes: Copilot generated the most correct methods (32%), while ChatGPT (23%), Bard (15%), and Tabnine (13%) were less effective. The code generated sometimes improved upon developer implementations but often struggled with dependencies.

Challenges: Generated code frequently had issues related to correctness, especially when dealing with inter-class dependencies, and there was a significant gap in generating code that adhered to the developers' style.

Implementation Barriers

Technical barrier

AI-based code assistants struggle with generating code that has dependencies on other parts of the codebase, leading to high rates of incorrect or invalid code. A substantial portion of the generated code is either incorrect or invalid, with 53% of methods being non-functional for the best performing assistant.

Proposed Solutions: Improving the algorithms to better handle context and dependencies, exploring collaboration between multiple AI assistants, and implementing rigorous testing and validation frameworks to enhance the training data of AI models.

Cultural barrier

AI-generated code often does not adhere to the coding styles and practices preferred by developers.

Proposed Solutions: Developing AI models that can learn and adapt to specific coding styles of projects.

Project Team

Vincenzo Corso

Researcher

Leonardo Mariani

Researcher

Daniela Micucci

Researcher

Oliviero Riganelli

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Vincenzo Corso, Leonardo Mariani, Daniela Micucci, Oliviero Riganelli

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies