Skip to main content Skip to navigation

LLM-Gomoku: A Large Language Model-Based System for Strategic Gomoku with Self-Play and Reinforcement Learning

Project Overview

The document highlights the significant advancements of large language models (LLMs) in natural language processing, particularly their transformative applications in education. It details a study that developed a Gomoku AI system utilizing LLMs, which mimics human learning processes to enhance decision-making capabilities within the game. This system employs innovative techniques such as self-play and reinforcement learning to refine gameplay efficiency and accuracy, tackling challenges like the generation of illegal moves and improving strategic decision-making. Overall, the findings underscore the potential of generative AI to not only enhance educational tools and gaming applications but also to simulate and support complex cognitive processes, thereby indicating a promising future for AI-driven educational technologies that can adapt and learn similarly to humans.

Key Applications

Gomoku AI system based on LLMs

Context: Gaming education and strategic decision-making for learners and AI enthusiasts

Implementation: The AI learns to play Gomoku through self-play, reinforcement learning, and understanding game strategies and rules.

Outcomes: Significant improvement in decision-making speed and accuracy in move selection, resolving illegal move generation issues.

Challenges: Time-consuming self-play process, limitations in strategy selection depth, and the need for sophisticated evaluation frameworks.

Implementation Barriers

Technical Barrier

The self-play process is time-consuming, limiting the model's ability to quickly learn basic rules and strategies.

Proposed Solutions: Future research aims to combine multiple strategies and analytical logics for comprehensive evaluations and explore advanced reinforcement learning models.

Operational Barrier

Frequent interruptions during API calls lead to loss of information and require restarting self-play sessions, reducing efficiency.

Proposed Solutions: Implementing a state-action-reward database to save progress in real-time, allowing for recovery from interruptions.

Project Team

Hui Wang

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Hui Wang

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies