Definition: Metadata is "data about data" or "data in context." Metadata is pieces of information that provide context for data. Having metadata helps when researchers re-analyze their own data, use other people's data, use existing data for a different project, or collaborate with others. Metadata is becoming increasingly important as the culture of data sharing spreads, although it's important to remember that metadata makes it easier for you to use your own data too. When documenting your research data, ask yourself if it passes the "Ward test" -- that is, if you disappeared, would someone else be able to access, interpret, and analyze your data? If the answer is "No," you should improve your documentation and metadata.
Creating metadata for your research projects and data leads to increased accessibility, helps data retain its context, accommodates version control (through distinguishing multiple versions), and can satisfy the legal requirements of repositories and funders. Quality metadata also makes data easier to preserve and more persistent over time.
When you create metadata for a piece of data -- whether that data be code, a paper, images, spreadsheets, et cetera -- it can help to answer the following questions:
Metadata standards exist to create consistency across research documentation. There are several "discipline agnostic" metadata standards that researchers can choose. They include DataCite, Project Open Data, and the Data Documentation Initiative (DDI). Standards like these can generally be applied to data in almost any discipline.
Specific disciplines often have their own popular sets of metadata standards. The Digital Curation Center (DCC) maintains a directory of standards that can be browsed by discipline. The Research Data Alliance (RDA) offers a community-maintained directory that is updated more frequently.
Finally, the choice of repository sometimes determines the metadata schema. For instance, Dash uses the discipline-agnostic DataCite schema by default. Discipline-specific repositories may use others.
This infographic provides a quick visual overview of different metadata standards by discipline:
If your data is intended for local use -- meaning that it will only be used by you and your co-authors, labmates, or collaborators -- it doesn't matter what standard(s) you use, as long as they are consistent. The following are some examples of local data documentation that you can immediately implement with your own projects -- that, in fact, you may already be using without realizing! You can (and should) also include this documentation when publishing your data in a repository or archive.
Laboratory or Field Notebooks
These are, as the name implies, physical (analog) or digital notebooks in which researchers document information that is relevant to their research process. Maintaining a notebook means that each researcher is able to localize all their relevant information in one place; it encourages thoughtful work; and it enables other researchers to pick up and continue a line of research if necessary.
Best practices for maintaining an effective lab notebook include dating each entry in a consistent format, listing names and contact information of collaborators, keeping notes from important meetings or discussions, and justifying methods and data source(s). Researchers should also note any corrections, calculations (with units), file names/locations, and the locations of any physical materials.
A codebook is a set of codes, definitions, and examples often used as a guide to provide context for and help analyze survey data. Codebooks are:
If your research involves administering surveys, you may use a codebook to ease interpretation and increase accessibility of the survey results. If you download an archived data file, that file often comes with some version of a codebook which explains each variable and its possible values.
A Data Dictionary is a collection of names, definitions, and attributes about data elements that are being used or captured in a database, information system, or part of a research project. It describes the meanings and purposes of data elements within the context of a project, and provides guidance on interpretation, accepted meanings and representation. A Data Dictionary also provides metadata about data elements. The metadata included in a Data Dictionary can assist in defining the scope and characteristics of data elements, as well the rules for their usage and application.
Data Dictionaries are useful for a number of reasons. In short, they:
This is a short (6:30) video explaining the concept and construction of data dictionaries, produced by the University of Wisconsin Data Services.
Like the title implies -- a file that users of your data are intended to read first, which explains all the information users need to know to understand your data
Make sure that whatever file naming convention you use associates each README file with the file or files that it references.
A brief overview of recommended README file content:
An example README template that you can download and customize to meet your needs is available from Cornell University (https://cornell.app.box.com/v/ReadmeTemplate).
Some data repositories may either require or recommend that you upload README files along with your data. UC Dash, for example, does not currently preserve the hierarchical structure of files and strongly recommends a README.
These three types of local data documentation -- codebooks, data dictionaries, and README files -- are crucial. They provide context not only for you, the researcher, in the future but also for anyone else who may ever need or want to use your data for any reason. Using other people's data can be either a breeze or a huge headache depending on the quality of the documentation.
If you'd like more information on research data curation and management, please schedule a consultation: