Skip to main content Skip to navigation

A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case

Project Overview

The document explores the integration of generative AI, particularly large language models (LLMs), in education, drawing parallels with their applications in the travel industry for personalized experiences. It highlights the advancements in fine-tuning techniques, such as Quantized Low Rank Adapter (QLoRA) and Retrieval-Augmented Fine Tuning (RAFT), which enhance model performance by improving data processing capabilities. Specific focus is given to models like LLaMa and Mistral, showcasing how these models can be adapted for educational purposes, providing tailored insights and support for learners. The findings indicate that while Mistral RAFT shows superior performance compared to other methods, traditional evaluation metrics often fall short of capturing the nuanced effectiveness of these models as perceived by humans, stressing the importance of human evaluation in assessing AI capabilities. The discussion also addresses challenges, including data quality and the necessity for diverse training datasets, while underscoring the significant potential of generative AI to transform educational contexts by simulating real-world interactions and enhancing learning experiences. Overall, the document emphasizes that despite existing hurdles, generative AI holds promise for revolutionizing educational practices through personalized, context-aware applications.

Key Applications

Travel Data Processing and Insights Generation

Context: Educational applications in AI research, focusing on travel industry data to provide personalized recommendations and insights for travelers. This includes leveraging user-generated content from platforms like Reddit to enhance understanding and engagement in travel-related topics.

Implementation: Fine-tuning LLaMa 2 and Mistral models using QLoRA and RAFT methodologies. This involves sourcing datasets from travel discussions, employing techniques to augment data quality and training efficiency, followed by Reinforcement Learning from Human Feedback (RLHF) to optimize model performance.

Outcomes: Significantly improved user engagement and recommendation accuracy, enhanced model performance in generating relevant travel insights, faster processing times, and reduced inference costs. The Mistral RAFT RLHF model was recognized as the best performing in its category.

Challenges: Dependence on high-quality training data, computational resource demands, ensuring accurate and contextually relevant responses, and the need for diverse training datasets.

Implementation Barriers

Technical & Resource Limitations

Traditional NLP metrics do not capture the complexities of LLMs and often misalign with human evaluations. Additionally, high computational costs and hardware requirements for fine-tuning large models present challenges.

Proposed Solutions: Utilizing human evaluation and advanced metrics like those provided by OpenAI GPT-4 to assess model performance, as well as adopting efficient fine-tuning methodologies like QLoRA to reduce the computational burden.

Data Quality

The necessity for high-quality, diverse training datasets to improve model fine-tuning, which requires strict quality standards in data collection.

Proposed Solutions: Implementing strict quality standards in data collection and possibly utilizing web scraping for real-time data.

Model Complexity

Challenges in tuning hyperparameters and ensuring efficient processing without noise in outputs.

Proposed Solutions: Experimenting with embedding models, pre-writing prompt templates, and other strategies to improve model predictions.

Project Team

Sonia Meyer

Researcher

Shreya Singh

Researcher

Bertha Tam

Researcher

Christopher Ton

Researcher

Angel Ren

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Sonia Meyer, Shreya Singh, Bertha Tam, Christopher Ton, Angel Ren

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies