Data management planning
At the start of any new project it is important to consider the issues relating to the data that you will create, gather and use in the course of the project.
What is Research Data Management
Data can have a longer lifespan than that of the research project that creates or collects it. You may continue to work on your data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers. So making sure you are properly managing your data through the whole lifecycle of the data is increasingly relevant.
You should ideally make plans for your data before you start to create and collect it. Many funders are now asking you to do this as part of their application process. Planning at an early stage can help you make the right decisions about creating, storing and sharing your data. But it is never too late to create a data management plan, even if you project is already underway.
Make sure you know about your funders' expectations
Data planning involves making decisions at the outset of your research to decide:
- Which software and file formats to use
- How to organise, store and manage your data
- What to include in the consent agreements you negotiate
These will all affect what it’s possible to do with your data in the future. Data planning is best done by writing a Data Management Plan
Research data can be easily considered as the building blocks of any research.
It is the information that you are using to answer your research questions, whether they are funded or not. Research data is often arranged, formatted or designed in such a way as to facilitate communication, interpretation and further processing.
The Digital Curation Centre defines research data as "a reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing."
Information becomes data through the context of research activity. See the following 2 examples:
- Tweets and other social media can be considered to be informal communication. But to the researchers looking to track our engagement with an event, election or TV programme then it becomes research data
- A 19th century newspaper advert in a historical archive is an archived image. But as soon as it is used by a researcher to study the history and development of advertising or gender attitudes over time it becomes research data for the researcher
Types of research data
Much research data is created ‘new’ for a specific project as it is answering a novel question but it may also be research data from a previous project that has been transformed, adjusted or reinterpreted to fit the needs of the new project. Five data types commonly used are:
- Observational: data captured in real time that is usually unique and irreplaceable. For example, remote sensing data, survey data, field recordings, sample data
- Experimental: data captured from lab equipment that is often reproducible. For example, gene sequences, chromatograms, magnetic field data
- Models or simulation: data generated from test models where model and metadata may be more important than output data from the model. For example, climate models, economic models
- Derived or compiled: resulting from processing or combining ‘raw’ data. For example, text and data mining, compiled databases, 3D models
- Reference or canonical: a static or organic conglomeration or collection of datasets, probably published and curated. For example, gene sequence databanks, collection of letters or archive of historical images
Examples of research data
Research data can be electronic or in hardcopy (e.g. paper) and it may include the following:
- Documents (text, Word, PDF), spreadsheets
- Laboratory notebooks, field notebooks, diaries
- Questionnaires, transcripts, codebooks
- Audiotapes, videotapes, photographs, films
- Test responses
- Slides, artefacts, specimens, samples
- Collection of digital objects acquired and generated during the process of research (including digitised archive material)
- Database contents (video, audio, text, images)
- Models, algorithms, scripts
- Contents of an application (input, output, logfiles for analysis software, simulation software, schemas)
- Methodologies, workflows and protocols
List adapted from Leeds University
Research data is generally consider not to include incidental or administrative information generated in the course of personal activities, desktop or mailbox backups, or data produced by non-research activities.