Skip to content Skip to navigation

Research Data Curation Glossary

Here is a glossary for frequently cited terms and resources for research data curation:

Archive: (noun) A data archive is a site where machine-readable materials are stored, preserved, and possibly redistributed to individuals interested in using the materials. (verb) To place or store in an archive. (ICPSR: Glossary of Social Science Terms)

Back-Up: A copy of your file(s) which can serve to restore them if your primary copy (or computer, or server) is destroyed, corrupted, or stolen. Note: back-up and preservation/archiving, are not the same thing – just because you have a copy doesn’t mean that it will be accessible over the long term. Good practice is to

  • back up your most important files regularly (daily or weekly, depending on how often they change),
  • keep at least two back-up copies,
  • preferably on different kinds of media (e.g. CD/DVD and portable hard drive or institutional server), and
  • to store the back-up copies in a different location (i.e. building, or even city) than the active versions to protect again fire, flood, or theft.(Cambridge University Data Management Glossary)

Born-Digital: Materials which were digital/electronic in their original form rather than scanned or re-entered from a paper version, e.g. most Microsoft Word documents are ‘born digital’ materials, unless they have been transcribed from a hand-written version. (Cambridge University Data Management Glossary) is an online file-sharing, and cloud storage service. All UC Merced users receive 10GB of storage space.

Creative Commons: A method of licensing information which encourages re-use. For example ‘By-Attribution, Non-Commercial’ is a common Creative Commons license – when you mark your file, image, or information with this, it means that anyone can use your information in any way they like, so long as they attribute it to you and don’t use it for commercial purposes. (Cambridge University Data Management Glossary)

Data Dictionary: A data dictionary is a descriptive compenium of all data elements being used for a project. More information on Data Dictionaries can be found here.

Data Preservation: Ensuring that data remain intact, accessible and understandable over time. This requires preserving the integrity of digital files themselves, and can be considerably more complicated. Preservation operations may include preserving the software required to interact with the data or emulating older systems, migrating data to new formats and new media, and ensuring there is sufficient metadata to understand, interpret, manage and preserve the data. (Cornell University Research Data Management Service Group)

Lossless: This term is used in reference to file compression, particularly with images. If a format has ‘lossless’ compression, that means that the file will not lose information when created or re-saved. For example, PNG files use lossless compression. (Cambridge University Data Management Glossary)

Lossy: This term is used in reference to file compression, particularly with images. If a format has ‘lossy’ compression, that means that when files are created, and every time they are re-saved, they will lose information. For example, JPEG has lossy compression; while they are good for efficiently reducing the size of an image, it will become grainy and blurred as you change and re-save it. (Cambridge University Data Management Glossary)

Metadata: A term that refers to structured data about data. Metadata is an old concept (e.g., card catalogs and indexes), but metadata is often essential for digital content to be useful and meaningful. Metadata can capture general or specific information about digital content that may define administrative, technical, or structural characteristics of the digital content. (ICPSR: Glossary of Social Science Terms)

Non-proprietary File Format: In the simplest cases, a non-proprietary format is a format which doesn't have restrictions on its use and over which no one (e.g. a company) claims substantial IPR restrictions. Preservation experts recommend using non-proprietary formats for the longer term primarily because a private software company can go out of business or stop producing a compatible version of the software in whose format your data was saved, and no one will have the rights or knowledge to provide it anymore. (Cambridge University Data Management Glossary)

Persistent Identifiers: A unique and long-lasting reference that allows for continued access to a digital object. Examples of persistent identifier systems include Digital Object Identifiers (DOIs), handles, and Archival Resources Keys (ARKs). Persistent identifiers support interoperability and the reliable citation of digital content. (Cornell University Research Data Management Service Group)

README file: A readme file provides information about a data file and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. (Cornell University Research Data Management Service Group; More information, and example README files can be found here.)

SDSC Cloud: SDSC's Cloud Storage provides academic and research partners with a convenient and affordable way to store, share, and archive data, including extremely large data sets. The object-based storage system and multiple interface methods make the SDSC Cloud easy to use for the average user, but also provide a flexible, configurable, and expandable solution to meet the needs of more demanding applications. (SDSC Cloud Storage)

Standards: Accepted methods or models of practice; these may be formally approved (as in NISO standards), or de facto standards. In the context of data management, standards typically apply to data or file formats, and to metadata. (Cornell University Research Data Management Service Group)