Skip to Main Content

MARC Records to UFDC: Home

outline the steps of converting OCLC and Alma records to UF Digital Collection Records.

Overview

Back to Metadata for Digital Collections Page

Updated October 2023 by Xiaoli Ma

Inspired by the libguide titled as "Using MarcEdit to Create an Excel Spreadsheet From a .dat File", the workflow converts MARC data to fit UF Digital Collections' spreadsheet template. 

MARC to UFDC

Steps

1. Organize the data set in spreadsheet. OCLC number of Electronic Records are required. Derive Electronic Records first before running this workflow. For Thesis and Dissertation that include RDTR and RDTS requests, please use this workflow. 

Digital Projects Item Information Sheet prepared by the Collection Managers and Cataloging Unit provides Metadata Unit the OCLC IDs needed to create electronic records. The OCLC IDs included in the Information Sheet could be the ones for Print or Microfiche or Electronic records. Depending on its type, Metadata Unit will start the work from here. For the first two types of records, the Unit creates Electronic records. For existing Electronic records, the Unit will directly apply it to the following steps.  For Alma records, the process of grouping records is different. 

 

2. Search, group and export records. 

  • Pull out the records via OCLC ID search
  • Save the whole batch in Local Save File (To save, use F4). (Please make sure Local Save File has been cleaned before starting a new batch.)
  • Search Local Save File (To open the search box: F3) --> OK --> Choose all files saved in  Local Save File and Export (F5) them as .dat file

3. Locate and rename the dat.file using the date and keywords from the project/batch, for instance, 20230605_Berbiceroyalgazette. This is to make sure all the saved files can be first sorted by the date and then by the project keywords.

.dat files are usually saved at the selected location when setting up OCLC on the work station. 


In MarcEdit

4. Convert  dat.file to txt files

Tools --> Export --> Export Tab Delimited Records

  • Set File Paths
    • Set the path for the .dat file: click the first file folder icon to locate and select the .dat file prepared during Step 2 and 3
    • Set the path for the exported txt file: click the second file folder icon to browse to the folder where the txt should be saved and name the txt file with the same name of the dat. file
    • Set In field delimiter to " ; " 
  • Load the mapping document
    • --> Next --> Settings...--> Load Setting --> choose the map file (latest version: MAPMARCtoUFDC_0826)
    • --> Export --> a pop up window will immediately notify the txt file has been created.

In OpenRefine

5. Create the new project in OpenRefine

  • Locate OpenRefine folder and Click the blue diamond to load OpenRefine in the browser as a new tab
  • In the browser OpenRefine tab, create the project
    • Get data from This Computer --> Choose File --> browse to locate the txt file prepared in Step 4--> Next
    • Parse data as CSV / TSV / separator-based files --> Character encoding --> UTF-8 --> check Trim leading & trailing whitespace from strings --> Create Project

6.Transform and Cleanup 

  • Adjust the number of rows showing on the same page to browse the loaded records

OpenRefine Adding Rows

 

  • Copy the json script to process the current batch
    • Undo/Redo --> Apply --> Paste the json script (filename) --> Perform Operations 
    • Browse to spot outliners, fix issues if any; if no issues, proceed to the next step
    • Export --> Excel 2007 --> tag the filename with "_refined"

In Excel

7. Rename Headers and Clean up values further

  • Open the downloaded Excel --> Copy the content of the first tab to a new tab and name the new tab as "InReady" --> Save as a new file in OneDrive-University of Florida> Documents>General-RDA Metadata Unit>03_UFDCIngestionReady, tag the new filename with "InReady" 
  •  
  • Check the values in 000$06 and 000$07, if "Delete marctarget term" appears in one row, delete the value in 008$22 of the same row and then delete both 000$06 and 000$07 columns
  • Delete dot at the end of YEAR and change YEAR to be text field.
  • Rename the headers as follows
Original Headers Rename to
001 1 OCLC
008$28 GENRE
022$a ISSN
All remaining 050 CLASSIFICATION
222$a, 246$a, 247$a ALTERNATE TITLE
245$c NOTE
250$a EDITION
310$a, 362$a, 440$a, 490$a NOTE
500,502,504, 505, 507, 515$a, 520$a, 540$a, 546, 580 NOTE (502, 504, Thesis and Dissertation Only)

All remaining 600$a, 610$a, 611$a, 630$a, 650$a, 651$a and 653$a

SUBJECT KEYWORD
All 655$a  GENRE
All 690 SUBJECT KEYWORD
All 700 CREATOR, use "Text to column" and split with "Fixed width" to have $e data on its individual column and name it CREATOR ROLE if any data come with $e; 
All 710,711, 720 CREATOR
730 UNIFORM TITLE
751 CREATOR
830 SERIES TITLE
  • If any data missing, for instance, CREATOR ROLE has value, but CREATOR is missing
  • CREATOR ROLE: Add "author" to CREATOR ROLE 1 if missing
  • SUBJECT KEYWORD:
    • Check if values in all terms in SUBJECT KEYWORD are in PROPER case, that is, the first letter of all words of a term are in capital; Use the excel function "=PROPER()" can simplify the process: Insert a new column, use the PROPER() on the first row and then apply the function to the whole column via dragging down the function, copy the value, paste to the original column use "paste value only" and delete the inserted column (a short video?) 
    • Check if any numbers included as a part of SUBJECT KEYWORD miss dot/decimal
    • Dedupe/Replace, for instance, when "Madison (fla)" and "Madison County (fla)" appear for one title, replace them with the latest FAST term "Florida--Madison County" and add a new "SUBJECT AUTHORITY" Column to capture the source of this newly introduced term, that is "fast".  
  • TITLE: For Titles in Foreign Languages, to facilitate the search in UFDC
    • if any accents in TITLE, copy the title to a new column named as "ALTERNATE TITLE" and in the original TITLE, replace the letter with the ones without accents
  • PUBLICATION LOCATION : It's repeatable, make sure one value, one cell. 
  • PUBLISHER:
    • if any accents in PUBLISHER, copy the name to a new column named as "PUBLISHER" and then in the original PUBLISHER replace the letter with accents with the ones without accents
    • It's repeatable, make sure one value, one cell. 
  • YEAR: make sure the spreadsheet field type is "text"; if no value, add "unknown". 
  • GENRE: dedupe based on the following table; sobek produces duplicates when seeing the following values in Genre
Terms in MARC 655 Replaced to
Periodicals, Journal(Publications), Magazines, Periodical publications

periodical as GENRE;

marcgt as GENRE AUTHORITY

 

Poetry, Fiction

poetry as GENRE

marcgt as GENRE AUTHORITY

8.  Add Additional Information

  • Update the InReady spreadsheet to include the following fields and then work on them: MATERIAL TYPE, AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTES
  • MATERIAL TYPE
    • One sheet allows multiple Material Type values. Material Type is a system field where historically was not consciously maintained and was populated by automated ingestion process.  
    • The following values are allowed for Material Type. 
      Aerial
      Archival
      Artifact
      Audio
      Book
      Dataset
      Map
      Mixed Material
      Newspaper
      Photograph
      Serial
      Video
  • For Thesis and Dissertation as well as related materials and projects, we usually use "Mixed Material".
  • Sobek adds Genre terms in the following instances, to avoid creating duplicates, remove the same Genre before the ingestion
MATERIL TYPE Sobek Adds to GENRE After Ingestion Remove from MARC Genre 655
Serial serial ( sobekcm ) Serial
Newspaper newspaper ( sobekcm ) Newspaper (happen most often)

 

  • AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTES ( like funding notes, if any)
    • The information should be available from the Item Information Sheet. 

9. Save the file in Metadata Unit Team Drive: University of Florida > Documents > General --RDA Metadata Unit >03_UFDCIngestionReady, end the filename with "_InReady". 

UFDC Manual Updates

If seeing the following three headers in the ingestion spreadsheet, open up the newly created bibids in UFDC to add values. 

  • NONE- SERIAL RELATED ITEM TYP: PRECEDING ENTRY
  • NONE- SERIAL RELATED ITEM TYPE: SUCCEEDING ENTRY

For these two fields, the updates should take place on Record Information tab under Related Items. Click to open Edit Related Items. Choose the right Relation Type from the drop-down list. For "Preceding Entry", use "Preceding" and for "Succeeding Entry", use "Preceding". 

Copy the Title from the spreadsheet to Title in the Edit Related Items. 

If OCLC number is available, please search in Worldcat with provided OCLC numbers as shown by the screenshots and copy the sharable URLs to UFDC, so live links can be created for UFDC records. 

 

  • NONE-COORDINATE

University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.