Skip to Main Content

MARC Records to UFDC: Home

outline the steps of converting OCLC and Alma records to UF Digital Collection Records.

Overview

Back to Metadata for Digital Collections Page

As we shape the new process, this document undergoes constant updates. Last updated in March 2024 by Xiaoli Ma. 

Inspired by the libguide titled as "Using MarcEdit to Create an Excel Spreadsheet From a .dat File", the workflow converts MARC data to fit UF Digital Collections' spreadsheet template. 

MARC to UFDC

Steps

1. Organize the data set in spreadsheet. OCLC number of Electronic Records are required. Derive Electronic Records first before running this workflow. For Thesis and Dissertation that include RDTR and RDTS requests, please use this workflow. 

Digital Projects Item Information Sheet prepared by the Collection Managers and Cataloging Unit provides Metadata Unit the OCLC IDs needed to create electronic records. The OCLC IDs included in the Information Sheet could be the ones for Print or Microfiche or Electronic records. Depending on its type, Metadata Unit will start the work from here. For the first two types of records, the Unit creates Electronic records. For existing Electronic records, the Unit will directly apply it to the following steps.  For Alma records, the process of grouping records is different. 

2. Search, group and export records. 

  • Pull out the records via OCLC ID search
  • Save the whole batch in Local Save File (To save, use F4). (Please make sure Local Save File has been cleaned before starting a new batch.)
  • Search Local Save File (To open the search box: F3) --> OK --> Choose all files saved in  Local Save File and Export (F5) them as .dat file

3. Locate and rename the dat.file using the date and keywords from the project/batch, for instance, 20230605_Berbiceroyalgazette. This is to make sure all the saved files can be first sorted by the date and then by the project keywords.

.dat files are usually saved at the selected location when setting up OCLC on the work station. 


In MarcEdit

4. Convert  dat.file to txt files

Tools --> Export --> Export Tab Delimited Records

  • Set File Paths
    • Set the path for the .dat file: click the first file folder icon to locate and select the .dat file prepared during Step 2 and 3
    • Set the path for the exported txt file: click the second file folder icon to browse to the folder where the txt should be saved and name the txt file with the same name of the dat. file
    • Set In field delimiter and Contextual delimiter to " / " 
  • Load the mapping document
    • --> Next --> Settings...--> Load Setting --> choose the map file (downloadable at the bottom of this guide)
    • --> Export --> a pop up window will immediately notify the txt file has been created.

In OpenRefine

5. Create the new project in OpenRefine

  • Locate OpenRefine folder and Click the blue diamond to load OpenRefine in the browser as a new tab
  • In the browser OpenRefine tab, create the project
    • Get data from This Computer --> Choose File --> browse to locate the txt file prepared in Step 4--> Next
    • Parse data as CSV / TSV / separator-based files --> Character encoding --> UTF-8 --> check Trim leading & trailing whitespace from strings --> Create Project

6.Transform and Cleanup 

  • Adjust the number of rows showing on the same page to browse the loaded records

OpenRefine Adding Rows

  • Copy the json script to process the current batch
    • Undo/Redo --> Apply --> Paste the json script (downloadable at the bottom of the guide) --> Perform Operations 
    • Browse to spot outliners, fix issues if any; if no issues, proceed to the next step
    • Export --> Excel 2007 --> tag the filename with "_refined"

In Excel

7. Rename Headers and Clean up values further

  • Open the downloaded Excel --> Copy the content of the first tab to a new tab and name the new tab as "InReady" --> Save as a new file in OneDrive-University of Florida> Documents>General-RDA Metadata Unit>03_UFDCIngestionReady, tag the new filename with "InReady" 
  • Delete the columns that don't have values
  • Rename the headers, if applicable, as follows. Since blank columns have been removed during the last step, then in this step, the columns in target are the ones have values. 
Original Headers Rename to
111$a CREATOR
245$c NOTE
246$a ALTERNATE TITLE
310$a, 362$a, 440$a, 490$a NOTE
500,502,504, 505, 507, 515$a, 520$a, 540$a, 546, 580,590 NOTE (502, 504, Thesis and Dissertation Only,590 local notes)

All remaining 600$a, 610$a, 611$a, 630$a, 650$a, 651$a and 653$a

SUBJECT KEYWORD
650$v GENRE
All 655$a  GENRE
All 690 SUBJECT KEYWORD
720$a CREATOR
730$a UNIFORM TITLE
751$a CREATOR
  • Check if any data missing, for instance, CREATOR ROLE has value, but CREATOR is missing. If any data missing, check the original OCLC records
  • CREATOR ROLE: Change to Title Case
  • SUBJECT KEYWORD:
    • If SUBJECT KEYWORD exist, please note, we DO NOT add "SUBJECT AUTHORITY" to the data
    • Check if values in all terms in SUBJECT KEYWORD are in PROPER case, that is, the first letter of all words of a term are in capital; Use the excel function "=PROPER()" can simplify the process: Insert a new column, use the PROPER() on the first row and then apply the function to the whole column via dragging down the function, copy the value, paste to the original column use "paste value only" and delete the inserted column (a short video?) 
    • Check if any numbers included as a part of SUBJECT KEYWORD miss dot/decimal
    • Dedupe/Replace, for instance, multiple instances of "Florida" in different cases; or when "Madison (fla)" and "Madison County (fla)" appear for one title, replace them with the latest FAST term "Florida--Madison County" and add a new "SUBJECT AUTHORITY" Column to capture the source of this newly introduced term, that is "fast".  Adding Authority Code only applies to adding new SUBJECT KEYWORD. For SUBJECT KEYWORD pulled out from OCLC, no authority codes needed. 
  • PUBLICATION LOCATION : It's repeatable, make sure one value, one cell. If multiple values separated by ";", use "Split Text to Columns" function to add more columns and name them all "PUBLICATION LOCATION". 
  • PUBLISHER:
    • It's repeatable, make sure one value, one cell. If multiple values separated by ";", use "Split Text to Columns" function to add more columns and name them all "PUBLISHER".
  • YEAR: make sure the spreadsheet field type is "text"; if no value, add "unknown". 
  • GENRE: 
    • If GENRE is available, please note we DO NOT add GENRE AUTHORITY
    • If no GENRE, try to assign one based on our limited knowledge and add the Authority Code; if not confident about the Genre term selection, please discuss with the Metadata Librarian
    • dedupe based on the following table; sobek produces duplicates when seeing the following values in Genre
Terms in MARC 655 Replaced to
Periodicals, Journal(Publications), Magazines, Periodical publications

periodical as GENRE;

marcgt as GENRE AUTHORITY

"Poetry" and "Fiction" appear in one record both as Genre

Remove Fiction

poetry as GENRE

marcgt as GENRE AUTHORITY

Newspapers (newspaper, newspapers etc.) remove from GENRE, add Newspaper as MATERIAL TYPE (Sobek then auto-generates Genre terms)

8.  Add Additional Information

  • Update the InReady spreadsheet to include the following fields and then work on them; if no NOTE needed, please delete it: MATERIAL TYPE, AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTE
  • MATERIAL TYPE
    • One sheet allows multiple Material Type values. Material Type is a system field where historically was not consciously maintained and was populated by automated ingestion process.  
    • The following values are allowed for Material Type (Mixed Material was removed from this list because that term will trigger Sobek to add a redundant Genre term: unknown (sobekcm)). 
      Aerial
      Archival
      Artifact
      Audio
      Book
      Dataset
      Map
      Newspaper
      Photograph
      Serial
      Video
  • For Thesis and Dissertation as well as related materials and projects, we usually use "Mixed Material".
  • Sobek adds Genre terms in the following instances, to avoid creating duplicates, remove the same Genre before the ingestion
MATERIL TYPE Sobek Adds to GENRE After Ingestion Remove from MARC Genre 655
Serial serial ( sobekcm ) Serial
Newspaper newspaper ( sobekcm ) Newspaper (happen most often)

  • AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTES ( like funding notes, if any)
    • The information should be available from the Item Information Sheet. 
    • Please note, no "i" in front of the Source Institution and Holding Location code. 
    • Use lower cases for all aggregation codes. 

9. Save the file in Metadata Unit Team Drive: University of Florida > Documents > General --RDA Metadata Unit >UFDCIngestionReady, end the filename with "_InReady". 

The process of preparing ingestion data ends here. 


In UFDC

Manual Updates, after the records were created in UFDC, that is when bibids are ready, the metadata unit will manually update the information for the following fields. 

  • NONE- SERIAL RELATED ITEM TYP: PRECEDING ENTRY
  • NONE- SERIAL RELATED ITEM TYPE: SUCCEEDING ENTRY

For these two fields, the updates should take place on Record Information tab under Related Items. Click to open Edit Related Items. Choose the right Relation Type from the drop-down list. For "Preceding Entry", use "Preceding" and for "Succeeding Entry", use "Preceding". 

Copy the Title from the spreadsheet to Title in the Edit Related Items. 

If OCLC number is available, please search in Worldcat with provided OCLC numbers as shown by the screenshots and copy the sharable URLs to UFDC, so live links can be created for UFDC records. 

  • NONE-COORDINATE

  • NONE-RELATED ITEM
University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.