As a general rule, platform independent, vendor independent, nonproprietary, stable, open and well supported formats can be readily accepted and preserved by the University of Florida Institutional Respository (IR@UF).
A list of recommended file types can be found here.
The NIH's Common Data Element (CDE) Resource Portal provides access to NIH-supported CDE initiatives and tools.
Common Data Elements (CDEs) are data elements common to multiple data sets across different studies. Use of CDEs can help improve data quality and the opportunity for comparison and combination of data from multiple studies.
NYU Health Sciences Libraries animation showing the importance of proper file formatting
Spend time planning out both folder hierarchy and file naming conventions in the beginning of a project. Consider how you or others will look for and access files at a later date. Do you think about them by type, location, study or something else?
Establish a folder hierarchy that aligns with the project. Example: [Project] / [Experiment] / [Instrument or Type of file]
Consider all aspects of the project and develop a file naming scheme that includes important metadata. Example: [Date]_[Run]_[SampleType]
Consider sorting when deciding what element of the file name will go first. File names starting with YYYYMM will sort differently than files starting with the MMDDYYYY format.
Provide a method for easy adoption. Consider a shared dropbox with the folder hierarchy in place and a readme file in onboarding documentation for new contributors.
Check for established file naming conventions. Many disciplines have recommendations, for example: DOE’s Atmospheric Radiation Measurement (ARM) program.
What type of data you're generating impacts how you will manage the data and how long it needs to be preserved.
There are four main types of research data:
Modified from: Texas Advanced Computing Center (2012). "Writing a Data Management Plan: a guide for the perplexed."
File formats in which data is created depend on:
Formatting your data for storage:
These formats are considered relatively stable and better for long-term preservation:
Types |
Examples |
File Extensions |
Text |
Acrobat PDF/A, Comma-Separated Values, Plain Text (US-ASCII, UTF-8), XML |
.pdf, .csv, .txt, .xml |
Image |
JPEG, JPEG2000, PNG, TIFF |
.jpg, .jp2, .png, .tif, .tiff |
Audio |
AIFF, WAVE |
.aif, .aiff, .wav |
Video |
AVI (uncompressed), Motion JPEG2000 |
.avi, .mj2, .mjp2 |
Modified from the Recommended file formats, Univ of Texas Libraries.
This is essential for accessibility and makes it easier to find and keep track of data files.
Best practices:
Additional suggestions may be found on Stanford's Best Practices for File Naming.