Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio
Project Overview
The document explores the advancements in Constrained Natural Language Generation (CNLG) and presents two innovative AI tools: Constrained Text Generation Studio (CTGS) and Gadsby, which cater to creative writing and educational purposes. CTGS enables users to generate text under specific lexical, semantic, and phonetic constraints, making it a valuable resource for both students and educators interested in enhancing their writing skills and engaging with linguistics creatively. The authors demonstrate the tools' effectiveness through a dataset named 'Lipogram-e', revealing that applying constraints leads to improved performance metrics. The findings suggest that these AI applications can significantly enrich creative writing experiences and educational outcomes, although the document also notes several challenges and areas that require further research. Overall, it highlights the promising potential of generative AI in transforming educational practices and fostering creativity among learners.
Key Applications
Constrained Text Generation Platform
Context: Educational settings for creative writers, poets, and linguists, providing accessible platforms for experimentation with text generation under specified lexical, semantic, and phonetic constraints.
Implementation: A GUI tool and web application that allows users to generate text while specifying various constraints, showcasing model robustness and offering user-friendly interfaces for experimentation.
Outcomes: Improved text generation results, coherence, and adherence to specified constraints. The platforms demonstrate model capabilities and provide a simplified interface for users to experiment with various filters and constraints in text generation.
Challenges: Potential for models to generate gibberish or irrelevant content when too many constraints are applied, and limitations in features when compared to more comprehensive tools.
Implementation Barriers
Technical
Language models often ignore constraints despite training.
Proposed Solutions: Creating datasets with hard lexical, semantic, or phonetic constraints to measure and improve model adherence.
Implementation
Subword tokenization complicates the application of constraints.
Proposed Solutions: Development of new subword tokenization schemes or using models with larger vocabularies that do not rely on subword tokenization.
Project Team
Allen Roush
Researcher
Sanjay Basu
Researcher
Akshay Moorthy
Researcher
Dmitry Dubovoy
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai