Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Project Overview
The document explores the role of generative AI in education, focusing on the implementation and evaluation of prompt templates to fine-tune large language models (LLMs) while prioritizing safety and performance. It introduces the 'Pure Tuning, Safe Testing' (PTST) strategy, which allows for effective model fine-tuning without compromising safety by using safety prompts solely during testing. Additionally, the document underscores the critical importance of reproducibility in AI research within educational contexts, advocating for open access to data and code, comprehensive experimental details, and ethical practices. It stresses the need for transparency in reporting, including the clarity of statistical significance and resource requirements, to ensure that findings are reliable and can be effectively utilized in educational applications. Collectively, these insights emphasize the potential of generative AI to enhance educational practices while maintaining safety and integrity in research.
Key Applications
Fine-tuning Llama 2 and GPT-3.5 Turbo on GSM8K and related datasets
Context: Educational contexts focused on enhancing mathematical problem-solving capabilities for students and educators, involving grade school math and assessment tasks.
Implementation: Fine-tuned both Llama 2 and GPT-3.5 Turbo on the GSM8K dataset using various prompt templates and chat-mode prompts for a total of 6 epochs (Llama 2) and 1 epoch (GPT-3.5 Turbo). The models were trained to improve their performance in generating helpful and safe responses in math problem-solving scenarios.
Outcomes: Significant improvement in helpfulness and safety alignment through careful prompt engineering and safety prompt transitions.
Challenges: Risk of safety degradation and potential unsafe behaviors if the same prompt template is used for both fine-tuning and inference.
Fine-tuning Llama 2 for medical consultation
Context: Simulating a medical chatbot based on real-world patient-physician conversations in educational settings for medical students and training purposes.
Implementation: Fine-tuned Llama 2 on a dataset of 100k conversations for 3 epochs, focusing on enhancing the model's ability to generate accurate and contextually appropriate medical advice.
Outcomes: Enhanced semantic similarity of responses to human-written answers, improving medical consultation capabilities for educational and training contexts.
Challenges: Maintaining safety while providing accurate medical advice and ensuring the model does not generate harmful or misleading information.
Fine-tuning Llama 2 on OpenOrca
Context: Improving reasoning and comprehension capabilities for educational contexts, applicable to various subjects and learning scenarios.
Implementation: Fine-tuned Llama 2 on the OpenOrca dataset selected from the FLAN collection to enhance reasoning tasks and comprehension abilities across different educational applications.
Outcomes: Significant improvements in reasoning tasks while preserving safety and providing useful educational outputs.
Challenges: Balancing the trade-off between helpfulness and safety in responses generated for educational purposes.
Implementation Barriers
Technical/Safety Barrier
Fine-tuning can lead to safety degradation even when using benign datasets. Models may generate unsafe responses if trained and tested with the same prompt template due to insufficient coverage of safety examples in training datasets.
Proposed Solutions: Implementing PTST strategy to separate training and testing templates, and using different templates for fine-tuning and inference to mitigate safety degradation. Adding diverse safety examples to the fine-tuning datasets.
Data Barrier
Insufficient coverage of safety examples in training datasets may lead to unsafe behaviors.
Proposed Solutions: Adding diverse safety examples to the fine-tuning datasets.
Reproducibility Barrier
Reproducibility may be difficult due to closed-source models or limited access, hindering other researchers from verifying results.
Proposed Solutions: Authors are encouraged to describe their methods for ensuring reproducibility.
Project Team
Kaifeng Lyu
Researcher
Haoyu Zhao
Researcher
Xinran Gu
Researcher
Dingli Yu
Researcher
Anirudh Goyal
Researcher
Sanjeev Arora
Researcher
Contact Information
For information about the paper, please contact the authors.
Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora
Source Publication: View Original PaperLink opens in a new window
Project Contact: Dr. Jianhua Yang
LLM Model Version: gpt-4o-mini-2024-07-18
Analysis Provider: Openai