Datafication of Historical Sources with AI: a Practical Workshop

Image: Microsoft Copilot

Are you tired of routine data collection tasks, such as extracting specific facts from hundreds of pages of sources? Have you had a research idea based on large-scale data for a long time, but the sheer volume of manual work has held you back? Do you think Artificial Intelligence (AI) could be a valuable research assistant, but don’t know where to start? Join our workshop on datafication of sources using AI and find some answers.

Date: Thursday, February 5, 2026

Time: 10:30 am – 5:00 pm

Place: A03, George Green Library, University of Nottingham

The workshop is designed to introduce postgraduate students with little or no coding experience to the potential of AI for research in the humanities and social sciences. We will use the Large Language Models (LLMs), such as OpenAI’s ChatGPT and Google’s Gemini, to prepare sources for analysis. Our practical examples will use historical datasets provided by the Science Museum, but the approaches are applicable across disciplines.

The workshop will begin with an introductory talk on the advantages, limitations, and relevance of LLMs for research, delivered by Dr. Federico Nanni (The Alan Turing Institute). This will be followed by a roundtable featuring several short presentations on the practical use of AI in historical research, offering participants diverse perspectives and examples. Round table participants:

Daniel Belteki, Science Museum, ‘LLM Experiments with the Science Museum Collections’
Jacob Forward, University of Cambridge, ‘LLMs for Emotion Classification in US Presidential Speeches’
Liudmila Lyagushkina, University of Nottingham, ‘LLM as a Research Assistant: Experimenting with Historical Classifications’
Joseph Nockels, University of Sheffield, ‘Making the Past Readable - Advancing AI Transcription Approaches for Library Collections At-Scale’

In the afternoon, there will be two hands-on sessions. During the first, you will learn how to use AI to extract structured data, such as dates, biographical details, concepts, and more, from textual sources. In the second session, you will choose between applying these methods to your own sources under the guidance of workshop organisers and moderators or adapting scripts to the provided datasets. These sessions will be co-delivered by Dr. Daniel Belteki (Science Museum) and Liudmila Lyagushkina (University of Nottingham).

By the end of the workshop, you will:

1) Have a basic conceptual understanding of AI’s opportunities and limitations for research.

2) Learn how LLMs can be applied to textual data to extract structured information.

3) Have practical experience in adapting ready-made scripts to your own or sample datasets.

Participation is free, but places are limited. Priority will be given to ESRC-funded PhD students, though all are welcome to apply.

Participants must bring their own laptops, but no software installation is required.

Coffee breaks and lunch will be provided. Limited travel support is available for those without institutional funding. Please indicate this in the registration form. For questions about funding or other details, please contact workshop organisers Liudmila Lyagushkina (ahxll4@nottingham.ac.uk) and Finn Cadell (ahyfc3@nottingham.ac.uk).