Skip to Main Content

Research Data Management at UF: Data Collection

This is a guide on resources available at the University of Florida and beyond on research data management. It includes information about tools for data management planning, data and file sharing, metadata and data standards, and data storage.

File types accepted by IR@UF

As a general rule, platform independent, vendor independent, nonproprietary, stable, open and well supported formats can be readily accepted and preserved by the University of Florida Institutional Respository (IR@UF).

A list of recommended file types can be found here.

Using Validated Survey Instruments

NIH Common Data Elements (CDE)

The NIH's Common Data Element (CDE) Resource Portal provides access to NIH-supported CDE initiatives and tools. 

Common Data Elements (CDEs) are data elements common to multiple data sets across different studies. Use of CDEs can help improve data quality and the opportunity for comparison and combination of data from multiple studies.

Hazards of Not Planning to Share Data

NYU Health Sciences Libraries animation showing the importance of proper file formatting

FIle Naming and Folder Hierarchy

Keeping track of research data and documentation is critical. Strategies include:

Spend time planning out both folder hierarchy and file naming conventions in the beginning of a project. Consider how you or others will look for and access files at a later date. Do you think about them by type, location, study or something else?

Establish a folder hierarchy that aligns with the project. Example: [Project] / [Experiment] / [Instrument or Type of file]

Consider all aspects of the project and develop a file naming scheme that includes important metadata. Example: [Date]_[Run]_[SampleType]

Consider sorting when deciding what element of the file name will go first. File names starting with YYYYMM will sort differently than files starting with the MMDDYYYY format.

Provide a method for easy adoption. Consider a shared dropbox with the folder hierarchy in place and a readme file in onboarding documentation for new contributors.

Check for established file naming conventions. Many disciplines have recommendations, for example: DOE’s Atmospheric Radiation Measurement (ARM) program.

Data Types

What type of data you're generating impacts how you will manage the data and how long it needs to be preserved. 

There are four main types of research data:

  • Observational data: captured in real time, typically can not be reproduced exactly
  • Experimental data: from labs and equipment, can often be reproduced but may be expensive to do so
  • Simulation data: from models, can typically be reproduced if the input data is known
  • Derived or compiled data: after data mining or statistical analysis has been done, can be reproduced if analysis is documented

Modified from: Texas Advanced Computing Center (2012). "Writing a Data Management Plan: a guide for the perplexed."

File Types

File formats in which data is created depend on:

  • Software in which research data are created and digitized
  • How researchers plan to analyze data
  • Hardware used
  • Availability of software
  • Discipline-specific

Formatting your data for storage:

  • Store data in nonproprietary software formats (e.g., comma delimited text file, .csv).

Formats of Data

These formats are considered relatively stable and better for long-term preservation:

  • open documentation
  • support across a range of software platforms
  • wide adoption
  • no compression (or lossless compression)
  • no embedded files or embedded programs/scripts
  • non-proprietary format

Types

Examples

File Extensions

Text

Acrobat PDF/A, Comma-Separated Values, Plain Text (US-ASCII, UTF-8), XML

.pdf,  .csv, .txt, .xml

Image

JPEG, JPEG2000, PNG, TIFF

.jpg, .jp2, .png, .tif, .tiff

Audio

AIFF, WAVE

.aif, .aiff, .wav

Video

AVI (uncompressed), Motion JPEG2000

.avi, .mj2, .mjp2

 

Modified from the Recommended file formats, Univ of Texas Libraries.

Organizing Files and Folders

This is essential for accessibility and makes it easier to find and keep track of data files.

Best practices:

  • Develop a system that works for your project
  • Use file names to classify broad types of files 
  • Create meaningful but brief names (“Corvallis_VegBiodiv_2007” is more clear than “Year01” or “Fall03”) 
  • Capitalize each word to differentiate it. 
  • Avoid using special characters in a file name. (\ / : * ? “ < > | [ ] & $) 
  • Underscore spaces or use hyphens (“_” or “-“) instead of periods or spaces
  • Capture place, time, and theme –  extremely useful, even if done in a highly abbreviated manner 
  • Reverse dates so they sort usefully YYYYMMDD e.g. filenaming_20080507 
  • Capture document version control (v01, v02, v03 instead of filenaming_latest)
  • Be consistent. It is only effective if everyone in the group follows the rules consistently


Additional suggestions may be found on Stanford's Best Practices for File Naming.

University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.