Organising research data
Organising your data is the most basic of all of the research data management functions. With very little planning or effort you can make your data files easy to store, find and use. Failing to organise your data can make it unusable even by the people who created the data!
Data storage
The best place to store, secure and ensure the long term life of your research data while you gather and use it is in one of the quality storage options available from IT Services. These options are designed to be flexible and fit a range of use cases and needs, get in touch with the storage team if you want to explore these options further.
Data formats
When you’re planning your research project it is essential that you consider the file formats you will choose to store your data. Your choice of file format will affect the usability and long term accessibility of your files and data. As technology changes, you should also plan for both hardware and software obsolescence.
File formats more likely to be accessible in the future have the following characteristics:
- Non-proprietary
- Open, documented standard
- Common usage by research community
- Standard representation (ASCII, Unicode)
- Unencrypted
- Uncompressed
Examples of preferred file format choices include:
- ODF, RTF or TXT, not Word (.doc or.docx)
- ASCII, not Excel (.xls or .xlsx)
- MPEG-4, not Quicktime
- TIFF, PNG or JPEG2000, not GIF or JPG
- XML or RDF, not RDBMS
If you are using proprietary software consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.
- The University of Cornell has a great guide on common file formats and when to use which
- Further guidance can be found from the Modern Records Centre
File naming and folders
How many times have you looked for a document and then found that you can’t remember which folder you stored it in? Imagine if you needed to find a file in the files of a research partner, where would you start? Starting a project with a strategy for the consistent naming of both files and folders can help research data avoid becoming disorganised. Creating appropriate file and folder structures will save time, avoid loss of data, allow re-use of the data, and assist in accurate location of data in the future.
To a certain extent it doesn’t matter what system you choose to use as long as everyone creating data for the project agrees on the system and you are all consistent in using it! Consider also if you will need to include version information in the file name.
- Jisc Digital Media has a guide on choosing file names
- Guide to renaming files and file extensions from Geek Girl's Plain English Computing
Documentation and metadata
Good documentation for your data is like creating a ‘user’s guide’ to the data and helps make data understandable, verifiable and reusable. Just making the data available does not make it useful, if you or others come back to your data at a later time they will need information on when, why and by whom the data was created, what methods were used and an explanation of any acronyms or jargon used.
Research funders demand that researchers make, at the very least, the metadata about their data openly available to facilitate the location and reuse of datasets. Documentation and metadata about a dataset is often mentioned together but can be very different things:
Metadata
This is more structured data about the dataset and will include the following key pieces of information:
Metadata field |
Description |
Title |
A name or title by which a resource is known |
Unique resource identifier |
For your working data this could be a project ID or a departmental identifier. Once you publish your data the unique resource identifier will be a persistent URL or DOI (Digital Object Identifier) depending on where you publish your data |
Description |
Description of the data set, like an abstract for a paper |
Subject |
Subject or classification code describing the resource chosen from one or more authoritative sources |
Creator(s) |
The main researchers involved in producing the data in priority order |
Funder |
Sources of financial support for the development of the resource, e.g. ESRC or Wellcome Trust |
Resource Language |
Default will be set to 'eng' (english) |
Publication date |
The date when the data was or will be made publicly available |
Publisher |
The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. For your working data this will be the University of Warwick |
Contact email address |
Person or service with knowledge of how to access, troubleshoot, or otherwise field issues or correspondence related to the data set |
Taken from the DataCite Metadata schema.
- Jisc Digital Media have a list of subject specific controlled subject vocabularies which may be useful in describing you data
Documentation
Documentation can be considered to be a more detailed equivalent of a ‘read me’ file for your data. Like the methodology for your publications this will include the following information and more:
- What hardware and software were used to create the data?
- What methodologies were used to create the data?
- What assumptions were made in your experiments?
- Why are there anomalies in your data?
Much of what you should include here will be found in project level documentation is likely to have already been included in the project application. Documentation content, such as the aims and objectives of the project, any hypotheses, the methodologies used in the project, can be created even before the project has begun and so replicating them for the publication with the dataset need not be very time consuming.
Backup and security
It is essential during your project that you have plans in place to ensure the safe storage of your data as well as a strategy for regular backups.
The level of risk and thus the level of care you should take with your data will in part depend on the ‘classification’ of the data. The University’s Information Security team have resources and training available about the classification of data and what actions you should take depending on the classification you’ve agreed on. This advice includes information on encryption software available from IT Services if this is necessary for your project.
If you are storing your data in the University’s storage options then they are automatically included in the main IT Services backup processes so can be an easy way to cover all your backup requirements.
Top tip! Do test your backups to make sure they open as you expect hem too!
Password managers
A password manager is a tool that remembers your passwords for you and in some cases can create more secure passwords for you to use. The idea being that you only need to remember the password for the password manager and then you can copy and paste all of the rest from the manager.
A couple of examples:
