Following best practices for managing your research data can ensure it will be available to other researchers in the long term. Not all of these suggested guidelines will always apply to every discipline or project. Overall, however, these guidelines will streamline your data management activities and help prevent data loss.
Choose a consistent organizational structure for all of your project folders. Although it may seem obvious, thinking about the structure of your folders and planning effectively makes navigation much easier. Minimize the number of clicks necessary to reach files. In conjunction with a consistent file naming convention, an efficient structure saves a lot of time.
Establishing a consistent file naming convention early in your project and maintaining that convention throughout is an underrated but incredibly useful practice. Without a convention, it is easy to end up with a lot of files whose names tell you nothing about their contents -- this situation can require a lot of time and effort to locate a single file and can make it near-impossible to find things.
In a perfect world, you would be able to maintain a file naming convention from the very start of a project and never need to make a change. However, you may find that you need to add or remove information from your file names or make other changes. You have two options:
There are several batch renaming programs available. They include, but are not limited to:
The best file formats for research data are non-proprietary, "lossless," and unencrypted/uncompiled.
Researchers may sometimes encounter situations where they absolutely must use a problematic file format. In this case, they should make every possible effort to provide a backup version of the file in a different format. They should also provide documentation explaining how to use the problematic format.
If the program that created a file is the only option for reading or accessing the file, the file format is proprietary or not open. To help ensure that your data and files are accessible by a wide range of users for a long time, choose open, non-proprietary formats whenver possible. With proprietary formats, if the original software becomes unavailable or ceases to function, the files are lost.
Non-proprietary, or open, file formats are ones where the dsecription and/or development of the format are open to the public; they often can be opened by multiple software programs. Open formats are often community-maintained.
Some file formats compress the information in files. This can be useful because the files take up less disk space. However, for many such formats, the compression causes data from the file to be lost. These formats are "lossy." Formats that can compress files without losing any information are "lossless" and retain the original details of the data.
A "lossless" file that has been compressed can be completely restored to its original state, unchanged. A "lossy" file will be compromised in quality due to the deletion of some information.
Encrypting or password-locking a file may improve security, but if the encryption key or password is ever lost, the data in the file may also be lost.
Uncompiled source code is easier to re-use and is more likely to last a long time since it can be compiled on a range of architectures/platforms.
Here is a list of some non-proprietary file formats that are generally preferred for different types of files:
The Library of Congress' Sustainability of Digital Formats and Recommended Format Specifications provide more extensive information on formats, including guidance for preserving data sets, geospatial data, and web archives.
If you'd like more information on research data curation and management, please schedule a consultation: