Skip to main content

The 'Trabajadores' digitisation project: How was it done?

The Modern Records Centre holds many key archives relating to modern social, political and economic history, including a major collection belonging to the Trades Union Congress (TUC). The TUC archive doesn't just contain material relating to the day-to-day administration of the British trade union movement, but includes a huge range of sources collected by the organisation on almost every topic that could affect the lives of working men and women in Britain. The organisation's contacts within the British government, the Labour Party, and the international labour movement, as well as within the trade union movement itself, gave it a network of informed sources and access to privileged information.

The 1930s saw international unrest across the globe, culminating in the outbreak of the Second World War in September 1939. The international issue that caused most debate within the British labour movement during this period was that of the Spanish Civil War - started by a military revolt against a democratically elected left-wing government, the conflict encapsulated many of the wider geopolitical issues of the time - from the role of the European dictators (Communist and Fascist) to the British policy of appeasement. The TUC archive contains more than 40 files about the conflict, and the regular use of these files by researchers (and comparative lack of other significant online archival resources in English) made the subject of the Spanish Civil War an obvious choice for our pilot digitisation project.

Work on the project, which was funded by the University of Warwick, began in April 2011 and was completed in April/May 2012.

 

Some facts and figures:

Scanning the documents
  • The contents of 46 files from the Trades Union Congress archive, together with more than 100 publications from 4 other archive collections, have been digitised.
  • Over 4,000 documents, containing more than 13,000 pages, have been made available online.
  • The project took a total of 13 months to complete.
  • 9 members of staff (full and part-time) worked on the project (4 Project Assistants, 1 Archives Manager, 1 Assistant Archivist, 1 Data Services and Digital Production Manager, and 2 Metadata Librarians).

 

What has been digitised?

We have digitised 46 files from the archive collection of the Trades Union Congress. These contain correspondence, minutes, reports, memoranda and propaganda material produced by members of the British and Spanish governments; political groups; international, British and Spanish trade unions; pressure groups, aid organisations, and other interested parties. The subjects covered are diverse, and the collection includes materialAwaiting digitisation relating to the response of organised labour and the British left to the conflict, the social conditions and political situation in Spain, German and Italian intervention, the attitude of the British and French governments, the International Brigade, and medical aid and the care of refugees.

Whilst most of the material deals solely with the Spanish Civil War, we have also made available several files which have an indirect connection with the conflict - one file on conditions in Spain in 1934/5, after a left-wing revolt against the government was violently crushed, and another file which deals with preparations for the 1936 Peoples' Olympiad. The Peoples' Olympiad was to be held in Barcelona in July 1936, as an alternative to the 1936 Olympic games held in Nazi Germany. The outbreak of war five days before the event was due to start resulted in its inevitable cancellation.

In addition to the Trades Union Congress files, we have also digitised more than 100 publications and items of ephemera from the archive collections of Henry Sara and Frank Maitland, Hugo Dewar, Paul Tofahrn, and the Socialist Party. Many of these documents were written from a communist or anarchist point of view, and so provide a different view of the conflict to many of the TUC sources.

Although this project has resulted in over 13,000 pages of material being made available online, the Modern Records Centre still holds many undigitised sources on the Spanish Civil War. More information about these is included in our guide to sources.

 

What does digitisation involve (and why does it take so long)?

A common view of digitisation is that once a document has been scanned, it is ready to go online. Unfortunately, actually scanning the document is only a small part of the whole process. The project required us to:

  • Individually number every piece of paper, so that each item has its own individual reference number or code.
  • Digitally scan each item - the scanner that we used is capable of copying documents up to A2 size and photographs the documents from above, helping to minimise any damage to the archive material.
  • Produce metadata ("data about data") for each item and image. This effectively creates a new catalogue entry for every individual item - recording what it is, who produced it (and, if it is a letter, who received it), and when it was produced. Technical data about each digital image (jpg) was also recorded.
  • Produce a transcription of every item - the online database uses these to identify key words used in searches. We used optical character recognition (OCR) software to "read" the images and convert them into text files. Unfortunately OCR is unable to recognise handwriting and can struggle to accurately read 1930s printed or typescript text. We therefore had to go through and correct every individual OCR text file (more than 13,000) to ensure that it is as accurate a transcription of the original document as possible.
  • Attempt to identify copyright owners, and contact them to request permission to publish. According to UK copyright law, individual intellectual ownership of a "literary work" doesn't expire until 70 years after the creators' death. Many of the archives being digitised are therefore still in copyright. We contacted numerous organisations in order to request permission to reproduce the archive material in which they held copyright, and attempted to track down descendents of individual authors. Although we managed to contact many of the current copyright holders and obtain permission to publish, we have been unable to identify everyone - please contact us if you think that you may be the current copyright holder of any of the digitised material.
  • Upload the images, transcriptions and metadata on to digital collection management software, so that the information can be made available online.
  • And finally, write webpages which put the documents in context and explain some of the background to the Spanish Civil War.