Skip to main content Skip to navigation

Crafting Narrative Closures: Zero-Shot Learning with SSM Mamba for Short Story Ending Generation

Project Overview

The document explores the innovative use of the generative AI tool, SSM Mamba, in the realm of education, specifically aimed at aiding authors facing writer's block by generating conclusions for short stories based on user prompts. Leveraging cutting-edge AI methodologies, including fine-tuning with state-space models and large language models such as GPT-3.5, SSM Mamba has been trained on a diverse dataset of short stories to facilitate creative writing and personal storytelling. Its applications extend beyond individual use, as the tool is made publicly accessible on HuggingFace, thereby contributing to the open-source community and encouraging collaborative learning. The findings indicate that not only does this tool enhance the writing process for students and educators, but it also fosters an environment for creativity and innovation in storytelling. Overall, SSM Mamba represents a significant advancement in how generative AI can be utilized in educational settings, promoting both personal expression and collective engagement in creative endeavors.

Key Applications

SSM Mamba for Short Story Ending Generation

Context: The tool is designed for authors experiencing writer's block and for parents creating personalized bedtime stories.

Implementation: The tool uses four different models, including a fine-tuned SSM-Mamba model and GPT-3.5, trained on a dataset of five-sentence stories.

Outcomes: The tool effectively generates coherent conclusions for short stories, improving the storytelling process and enhancing creativity.

Challenges: Challenges include resource limitations during training, such as GPU availability and memory constraints, which affected the model's capacity for fine-tuning.

Implementation Barriers

Resource Limitation

Limited access to GPU resources hindered the ability to train larger models and conduct extensive fine-tuning.

Proposed Solutions: Utilizing checkpointing techniques for training while managing SLURM dependencies to maximize GPU usage.

Technical Challenges

The need for efficient prompt design and fine-tuning configurations posed difficulties during model training.

Proposed Solutions: Iterative testing and adjustment of prompts and configurations helped refine model performance.

Project Team

Divyam Sharma

Researcher

Divya Santhanam

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Divyam Sharma, Divya Santhanam

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies