Skip to main content Skip to navigation

G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German

Project Overview

The document discusses the implementation of G-SciEdBERT, a specialized large language model tailored for scoring German-written science responses in educational contexts. It underscores the progress in automated scoring systems achieved through contextualized language models, focusing on the pre-training and fine-tuning techniques that have markedly improved scoring accuracy compared to the general-purpose G-BERT model. The findings illustrate G-SciEdBERT's effectiveness in comprehending and evaluating intricate scientific texts, showcasing its potential to enhance educational assessments significantly. Overall, the use of generative AI in education, exemplified by G-SciEdBERT, represents a transformative advancement in the accuracy and reliability of evaluating student responses, offering promising outcomes for future educational practices.

Key Applications

G-SciEdBERT: A contextualized large language model for scoring German-written science responses

Context: Used in science education for assessing written responses of secondary students in Germany, particularly those participating in the PISA assessments.

Implementation: Pre-trained on a corpus of 30,000 German-written science responses, and fine-tuned on an additional 20,000 responses to enhance scoring accuracy.

Outcomes: Achieved a 10.2% increase in scoring accuracy compared to G-BERT, demonstrating improved performance in understanding scientific language and context.

Challenges: The complexity of scientific language and the need for domain-specific knowledge posed challenges for general-purpose language models like G-BERT.

Implementation Barriers

Technical barrier

General-purpose language models lack the ability to accurately assess domain-specific responses due to their training on broader datasets.

Proposed Solutions: Developing domain-specific large language models such as G-SciEdBERT that are pre-trained on relevant educational and scientific data.

Project Team

Ehsan Latif

Researcher

Gyeong-Geon Lee

Researcher

Knut Neumann

Researcher

Tamara Kastorff

Researcher

Xiaoming Zhai

Researcher

Contact Information

For information about the paper, please contact the authors.

Authors: Ehsan Latif, Gyeong-Geon Lee, Knut Neumann, Tamara Kastorff, Xiaoming Zhai

Source Publication: View Original PaperLink opens in a new window

Project Contact: Dr. Jianhua Yang

LLM Model Version: gpt-4o-mini-2024-07-18

Analysis Provider: Openai

Let us know you agree to cookies