Skip to main content Skip to navigation

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Project Overview

The document explores the role of generative AI in education, focusing on the implementation and evaluation of prompt templates to fine-tune large language models (LLMs) while prioritizing safety and performance. It introduces the 'Pure Tuning, Safe Testing' (PTST) strategy, which allows for effective model fine-tuning without compromising safety by using safety prompts solely during testing. Additionally, the document underscores the critical importance of reproducibility in AI research within educational contexts, advocating for open access to data and code, comprehensive experimental details, and ethical practices. It stresses the need for transparency in reporting, including the clarity of statistical significance and resource requirements, to ensure that findings are reliable and can be effectively utilized in educational applications. Collectively, these insights emphasize the potential of generative AI to enhance educational practices while maintaining safety and integrity in research.

Key Applications

Fine-tuning Llama 2 and GPT-3.5 Turbo on GSM8K and related datasets

Context: Educational contexts focused on enhancing mathematical problem-solving capabilities for students and educators, involving grade school math and assessment tasks.

Implementation: Fine-tuned both Llama 2 and GPT-3.5 Turbo on the GSM8K dataset using various prompt templates and chat-mode prompts for a total of 6 epochs (Llama 2) and 1 epoch (GPT-3.5 Turbo). The models were trained to improve their performance in generating helpful and safe responses in math problem-solving scenarios.

Outcomes: Significant improvement in helpfulness and safety alignment through careful prompt engineering and safety prompt transitions.

Challenges: Risk of safety degradation and potential unsafe behaviors if the same prompt template is used for both fine-tuning and inference.

Fine-tuning Llama 2 for medical consultation

Context: Simulating a medical chatbot based on real-world patient-physician conversations in educational settings for medical students and training purposes.

Implementation: Fine-tuned Llama 2 on a dataset of 100k conversations for 3 epochs, focusing on enhancing the model's ability to generate accurate and contextually appropriate medical advice.

Outcomes: Enhanced semantic similarity of responses to human-written answers, improving medical consultation capabilities for educational and training contexts.

Challenges: Maintaining safety while providing accurate medical advice and ensuring the model does not generate harmful or misleading information.

Fine-tuning Llama 2 on OpenOrca

Context: Improving reasoning and comprehension capabilities for educational contexts, applicable to various subjects and learning scenarios.

Implementation: Fine-tuned Llama 2 on the OpenOrca dataset selected from the FLAN collection to enhance reasoning tasks and comprehension abilities across different educational applications.

Outcomes: Significant improvements in reasoning tasks while preserving safety and providing useful educational outputs.

Challenges: Balancing the trade-off between helpfulness and safety in responses generated for educational purposes.

Implementation Barriers

Technical/Safety Barrier

Fine-tuning can lead to safety degradation even when using benign datasets. Models may generate unsafe responses if trained and tested with the same prompt template due to insufficient coverage of safety examples in training datasets.

Proposed Solutions: Implementing PTST strategy to separate training and testing templates, and using different templates for fine-tuning and inference to mitigate safety degradation. Adding diverse safety examples to the fine-tuning datasets.

Data Barrier

Insufficient coverage of safety examples in training datasets may lead to unsafe behaviors.

Proposed Solutions: Adding diverse safety examples to the fine-tuning datasets.

Reproducibility Barrier

Reproducibility may be difficult due to closed-source models or limited access, hindering other researchers from verifying results.

Proposed Solutions: Authors are encouraged to describe their methods for ensuring reproducibility.

Project Team

Kaifeng Lyu

Researcher

Haoyu Zhao

Researcher

Xinran Gu

Researcher

Dingli Yu

Researcher

Anirudh Goyal

Researcher

Sanjeev Arora

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies