Skip to main content Skip to navigation

Quality Assurance of Generative Dialog Models in an Evolving Conversational Agent Used for Swedish Language Practice

Project Overview

The document outlines the development and implementation of a generative dialog model (GDM) called Emely, aimed at aiding newcomers in Sweden to practice the Swedish language through interactive conversational agents. It underscores the critical role of quality assurance (QA) in enhancing the effectiveness of AI-driven language tools, detailing findings from action research that delineates the essential requirements for the GDM. The research identifies key criteria necessary for the model’s success and proposes automated testing protocols to evaluate its performance, while also addressing various challenges faced in maintaining the GDM's reliability and efficacy. Overall, the initiative highlights the potential of generative AI in educational settings, particularly in language acquisition, by providing personalized, responsive learning experiences tailored to individual user needs.

Key Applications

Emely - a conversational agent for Swedish language practice

Context: Support for newcomers in Sweden to practice Swedish and improve job interview skills

Implementation: Action research involving a multidisciplinary team to elicit requirements and design automated test cases for QA of the GDM.

Outcomes: Established 37 requirements for the GDM, developed automated test cases, and reported on the performance of different model versions.

Challenges: Complexity of natural language processing, ensuring meaningful user interactions, and the need for continuous model evaluation and selection.

Implementation Barriers

Technical Barrier

Quality assurance of generative dialog models is complex due to the variability of natural language and the challenges in defining quality metrics. Frequent retraining of models complicates the QA process, as it can change the model's behavior unpredictably.

Proposed Solutions: Develop automated frameworks for testing and benchmarking generative dialog models. Implement continuous integration practices and MLOps for managing model deployment and updates.

Project Team

Markus Borg

Researcher

Johan Bengtsson

Researcher

Harald Österling

Researcher

Alexander Hagelborn

Researcher

Isabella Gagner

Researcher

Piotr Tomaszewski

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Markus Borg, Johan Bengtsson, Harald Österling, Alexander Hagelborn, Isabella Gagner, Piotr Tomaszewski

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies