Skip to main content Skip to navigation

Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio

Project Overview

The document explores the advancements in Constrained Natural Language Generation (CNLG) and presents two innovative AI tools: Constrained Text Generation Studio (CTGS) and Gadsby, which cater to creative writing and educational purposes. CTGS enables users to generate text under specific lexical, semantic, and phonetic constraints, making it a valuable resource for both students and educators interested in enhancing their writing skills and engaging with linguistics creatively. The authors demonstrate the tools' effectiveness through a dataset named 'Lipogram-e', revealing that applying constraints leads to improved performance metrics. The findings suggest that these AI applications can significantly enrich creative writing experiences and educational outcomes, although the document also notes several challenges and areas that require further research. Overall, it highlights the promising potential of generative AI in transforming educational practices and fostering creativity among learners.

Key Applications

Constrained Text Generation Platform

Context: Educational settings for creative writers, poets, and linguists, providing accessible platforms for experimentation with text generation under specified lexical, semantic, and phonetic constraints.

Implementation: A GUI tool and web application that allows users to generate text while specifying various constraints, showcasing model robustness and offering user-friendly interfaces for experimentation.

Outcomes: Improved text generation results, coherence, and adherence to specified constraints. The platforms demonstrate model capabilities and provide a simplified interface for users to experiment with various filters and constraints in text generation.

Challenges: Potential for models to generate gibberish or irrelevant content when too many constraints are applied, and limitations in features when compared to more comprehensive tools.

Implementation Barriers

Technical

Language models often ignore constraints despite training.

Proposed Solutions: Creating datasets with hard lexical, semantic, or phonetic constraints to measure and improve model adherence.

Implementation

Subword tokenization complicates the application of constraints.

Proposed Solutions: Development of new subword tokenization schemes or using models with larger vocabularies that do not rely on subword tokenization.

Project Team

Allen Roush

Researcher

Sanjay Basu

Researcher

Akshay Moorthy

Researcher

Dmitry Dubovoy

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Allen Roush, Sanjay Basu, Akshay Moorthy, Dmitry Dubovoy

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies