I look to the UK Data Archive for best practice on describing the background to data and the content of data sets. I acknowledge that their work is behind this page.
For our purposes, student master's projects, the minimum complete Meta-data for a data set would be:
A. The data in an accessible format, e.g. *csv, *spss, *sas, *xls, *xlsx, (and for Stata). There might be a single file, or several files which students will have to merge. Example A.
B. A data dictionary Examples B.1, B.2.
C. A statement of how and why the data were collected. This might only be a paragraph, but if possible, a study protocol, or a reference to publications using the data should be given. Example C.
Ideally C would include:
What the study covers: When: Time Period or Dates of Fieldwork; Where: Country; Sampling Unit: e.g spatial, list of companies, list of patients; Observation Unit: e.g. Person, company, county; What type of data: e.g quantitative, qualitative, images.
The target population or theoretical universe which was sampled: Location of units of observation, population of individuals or items or departments . . . .
The methods: Time dimensions: single measurements or repeated measurements; Sampling procedure: e.g simple random sample, opportunistic, designed experiment; Number of units:t he number of individual people or companies or other units in each data file; Method of data collection: e.g face-to-face interview, web survey, routine company data, routine official data.
So, 'metadata' means the data itself, and the information required in order to be able to make sensible use of the data.
Note the distinction between 'Missing' and 'Not applicable'.
This has information on Cohort II, a user guide, the main report on the study and some additional reports and information.