Crafting Narrative Closures: Zero-Shot Learning with SSM Mamba for Short Story Ending Generation
Project Overview
The document explores the innovative use of the generative AI tool, SSM Mamba, in the realm of education, specifically aimed at aiding authors facing writer's block by generating conclusions for short stories based on user prompts. Leveraging cutting-edge AI methodologies, including fine-tuning with state-space models and large language models such as GPT-3.5, SSM Mamba has been trained on a diverse dataset of short stories to facilitate creative writing and personal storytelling. Its applications extend beyond individual use, as the tool is made publicly accessible on HuggingFace, thereby contributing to the open-source community and encouraging collaborative learning. The findings indicate that not only does this tool enhance the writing process for students and educators, but it also fosters an environment for creativity and innovation in storytelling. Overall, SSM Mamba represents a significant advancement in how generative AI can be utilized in educational settings, promoting both personal expression and collective engagement in creative endeavors.
Key Applications
SSM Mamba for Short Story Ending Generation
Context: The tool is designed for authors experiencing writer's block and for parents creating personalized bedtime stories.
Implementation: The tool uses four different models, including a fine-tuned SSM-Mamba model and GPT-3.5, trained on a dataset of five-sentence stories.
Outcomes: The tool effectively generates coherent conclusions for short stories, improving the storytelling process and enhancing creativity.
Challenges: Challenges include resource limitations during training, such as GPU availability and memory constraints, which affected the model's capacity for fine-tuning.
Implementation Barriers
Resource Limitation
Limited access to GPU resources hindered the ability to train larger models and conduct extensive fine-tuning.
Proposed Solutions: Utilizing checkpointing techniques for training while managing SLURM dependencies to maximize GPU usage.
Technical Challenges
The need for efficient prompt design and fine-tuning configurations posed difficulties during model training.
Proposed Solutions: Iterative testing and adjustment of prompts and configurations helped refine model performance.
Project Team
Divyam Sharma
Researcher
Divya Santhanam
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Divyam Sharma, Divya Santhanam
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai