Skip to main content Skip to navigation

TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models

Project Overview

The document explores the development of the TF1-EN-3M dataset, which comprises three million synthetic moral fables created using instruction-tuned language models. This initiative addresses the scarcity of moral storytelling resources by offering structured narratives that serve educational purposes, particularly aimed at young audiences to impart moral lessons. It details the methodologies employed in generating the dataset, including prompt design and model evaluation, emphasizing the advantages of utilizing smaller, more accessible models to produce high-quality educational content. The findings underscore the significant potential of this dataset in enhancing educational AI applications and highlight the critical role of moral reasoning in the generation of narratives, suggesting that such resources can effectively contribute to teaching ethical values in an engaging manner.

Key Applications

TF1-EN-3M Dataset

Context: Educational context targeting young readers (ages 4-7) for moral education through storytelling.

Implementation: Generated using a hybrid evaluation pipeline with instruction-tuned models, focusing on structured prompt design.

Outcomes: Produced three million diverse moral fables that are coherent and age-appropriate, facilitating moral reasoning in educational settings.

Challenges: Ensuring diversity in stories while maintaining coherence, and the potential for over-reliance on templates leading to repetitive narratives.

Implementation Barriers

Technical Barrier

The need for computational resources to run large models for generating quality narratives.

Proposed Solutions: Using smaller, instruction-tuned models to generate high-quality content on consumer-grade hardware.

Cultural Barrier

The dataset primarily reflects Western moral traditions, which may not generalize across different cultural contexts. Future expansions of the dataset could incorporate moral principles from diverse philosophical traditions.

Proposed Solutions: Incorporating moral principles from diverse philosophical traditions into the dataset.

Project Team

Mihai Nadas

Researcher

Laura Diosan

Researcher

Andrei Piscoran

Researcher

Andreea Tomescu

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Mihai Nadas, Laura Diosan, Andrei Piscoran, Andreea Tomescu

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies