Skip to Main Content

MARC Records to UFDC: Home

outline the steps of converting OCLC and Alma records to UF Digital Collection Records.

Overview

Back to Metadata for Digital Collections Page

As we shape the new process, this document undergoes constant updates. Last updated in March 2024 by Xiaoli Ma. 

Inspired by the libguide titled as "Using MarcEdit to Create an Excel Spreadsheet From a .dat File", the workflow converts MARC data to fit UF Digital Collections' spreadsheet template. 

MARC to UFDC

Steps

1. Organize the data set in spreadsheet. OCLC number of Electronic Records are required. Derive Electronic Records first before running this workflow. For Thesis and Dissertation that include RDTR and RDTS requests, please use this workflow. 

2. Search, group and export records. 

  • Pull out the records via OCLC ID search
  • Save the whole batch in Local Save File (To save, use F4). (Please make sure Local Save File has been cleaned before starting a new batch.)
  • Search Local Save File (To open the search box: F3) --> OK --> Choose all files saved in  Local Save File and Export (F5) them as .dat file

3. Locate and rename the dat.file using the date and keywords from the project/batch, for instance, 20230605_Berbiceroyalgazette. This is to make sure all the saved files can be first sorted by the date and then by the project keywords.

.dat files are usually saved at the selected location when setting up OCLC on the work station. 


In MarcEdit

4. Convert  dat.file to txt files

Tools --> Export --> Export Tab Delimited Records

  • Set File Paths
    • Set the path for the .dat file: click the first file folder icon to locate and select the .dat file prepared during Step 2 and 3
    • Set the path for the exported txt file: click the second file folder icon to browse to the folder where the txt should be saved and name the txt file with the same name of the dat. file
    • Set In field delimiter and Contextual delimiter to " / " 
  • Load the mapping document
    • --> Next --> Settings...--> Load Setting --> choose the map file (downloadable at the bottom of this guide)
    • --> Export --> a pop up window will immediately notify the txt file has been created.

In OpenRefine

5. Create the new project in OpenRefine

  • Locate OpenRefine folder and Click the blue diamond to load OpenRefine in the browser as a new tab
  • In the browser OpenRefine tab, create the project
    • Get data from This Computer --> Choose File --> browse to locate the txt file prepared in Step 4--> Next
    • Parse data as TSV --> Character encoding --> UTF-8 --> check Trim leading & trailing whitespace from strings --> Create Project

6.Transform and Cleanup 

  • Adjust the number of rows showing on the same page to browse the loaded records

OpenRefine Adding Rows

  • Copy the json script to process the current batch
    • Undo/Redo --> Apply --> Paste the json script (downloadable at the bottom of the guide) --> Perform Operations 
    • Browse to spot outliners, fix issues if any; if no issues, proceed to the next step
    • Export --> Excel 2007 --> tag the filename with "_refined"

In Excel

7. Rename Headers and Clean up values across the sheet

  • Open the downloaded Excel --> Copy the content of the first tab to a new tab and name the new tab as "InReady" --> Save as a new file in OneDrive-University of Florida> Documents>General-RDA Metadata Unit>03_UFDCIngestionReady, please choose the year and type folder based on the info the current batch, append the new filename with "_InReady" 
  • Delete the columns that don't have values
  • Rename the headers, if applicable, as follows. Since blank columns have been removed during the last step, then in this step, the columns in target are the ones have values. 
Original Headers Rename to
111$a CREATOR
245$c NOTE
246$a ALTERNATE TITLE
310$a, 362$a, 440$a, 490$a NOTE
500,502,504, 505, 507, 515$a, 520$a, 540$a, 546, 580,590 NOTE (502, 504, Thesis and Dissertation Only,590 local notes)

600$a, 610$a, 611$a, 630$a, 653$a

SUBJECT KEYWORD
650$a, 651$a NO Renaming needed, they are info for SUBJECT KEYWORD , will be deleted after finishing SUBJECT KEYWORD work
650$2, 651$2 NO Renaming needed, they are info for SUBJECT KEYWORD , will be deleted after finishing SUBJECT KEYWORD work
650$v GENRE
655$a  NO Renaming needed, they are info for GENRE , will be deleted after finishing GENRE work
655$2 NO Renaming needed, they are info for GENRE , will be deleted after finishing GENRE work
All 690 SUBJECT KEYWORD
710$a, 710$b if exist both exist, merge the values, separator "," (comma) and name the column CREATOR
711$a CREATOR
720$a CREATOR
730$a UNIFORM TITLE
751$a CREATOR
  • CREATOR ROLE: Change to Title Case
  • Check if any data missing, for instance, CREATOR ROLE has value, but CREATOR is missing. If any data missing, check the original OCLC records; 
  • Check if any unwanted punctuations and signs, for instance "\\", "$" or subfield letters, and remove them (Find and Replace function is one method for this step.)

8. Add UFDC Specific Information

  • Update the InReady spreadsheet to include the following fields and then work on them; if no NOTE info, please delete it: MATERIAL TYPE, AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTE
  • MATERIAL TYPE
    • One sheet allows multiple Material Type values. Material Type is a system field where historically was not consciously maintained and was populated by automated ingestion process.  
    • The following values are allowed for Material Type (Mixed Material was removed from this list because that term will trigger Sobek to add a redundant Genre term: unknown (sobekcm)). 
      Aerial
      Archival
      Artifact
      Audio
      Book
      Dataset
      Map
      Newspaper
      Photograph
      Serial
      Video
  • For Thesis and Dissertation as well as related materials and projects, we usually use "Mixed Material".
  • Sobek adds Genre terms in the following instances, to avoid creating duplicates, remove the same Genre before the ingestion
MATERIAL TYPE Sobek Adds to GENRE After Ingestion Remove from MARC Genre 655
Serial serial ( sobekcm ) Serial
Newspaper newspaper ( sobekcm ) Newspaper (happen most often)

  • AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTES ( like funding notes, if any)
    • The information should be available from the Item Information Sheet. 
    • Please note, no "i" in front of the Source Institution and Holding Location code. 
    • Use lower cases for all aggregation codes. 

9. Field Group Work

  • YEAR: make sure the spreadsheet field type is "text"; if no value, add "unknown". 
  • SUBJECT KEYWORD (650, 651) and GENRE (655):
    • Records with terms from known authorities (650$2, 650$2, 655$2 are populated with values)
      • Convert Rows to Columns: Copy the values of the extra rows that holds only SUBJECT KEYWORDS/GENRE and SUBJECT/GENRE AUTHORITY to existing blank Columns or add Columns to hold these values, if any
      • OpenRefine scripts have deduped the terms and pull in fast terms directly from Worldcat server, so the focus here is to make sure that most major topical SUBJECT KEYWORDs from OCLC records are included in the sheet. When choosing values during the MarcEdit step, the focus has been placed on major Topical Subjects, the ones captured by $a, so terms in $v,$x,$y, and $z, subdivision portions that emphasize the context of the topical terms, are not included in the process and DO NOT need to be added here either. Most of the info in those subdivision portions has been already captured by other fields.  
        • FAST terms have been listed in the sheet in their own cells followed by authority codes, the ones not in their own cells are terms from other authorities. 
        • locate the terms from other authorities in OCLC records. Frequently terms in OCLC records are hyperlinked and that means they are LC terms. Most of these terms are part of FAST terms, you can locate them in SearchFast and then add SUBJECT AUTHORITY column, adding "fast" as the authority code.  
        • copy and paste from OCLC them to empty cells of existing SUBJECT KEYWORD, if available or to new columns of SUBJECT KEYWORD, if needed
        • add SUBJECT/GENRE AUTHORITY code to existing columns or create new SUBJECT/GENRE AUTHORITY column to hold the code
        • common SUBJECT/GENRE AUTHORITY lists below
          Name Code for Metadata sheet Seen in OCLC records
          Université Laval rvm CaQQLa
          RAMEAU: répertoire d'autorité-matière encyclopédique et alphabétique unifié (Paris: Bibliothèque nationale) ram  
          Thésaurus des descripteurs de genre/forme de l'Université Laval (Québec, Québec: Bibliothèque de l'Université Laval) rvmgt  
          Biblioteca Nacional de Espana abne  

          Links to authority codes: 

Subjects: https://www.loc.gov/standards/sourcelist/subject.html

Genres:  https://www.loc.gov/standards/sourcelist/genre-form.html

  • 650$a, 650$2; 651$a, 651$2 and 655$a, 655$2  hold the original data from OCLC records. By reviewing these columns, we can know quickly the terms that should be deduped and kept. This understanding helps us do further work. 
    • $a lists the terms: this holds many duplicated terms
    • $2 lists the codes: the number of codes in this column can indicate the number of subjects and also if any terms are from other authorities, but some times $2 does not include all the codes. In this case, check $a and OCLC records to form a clear picture of the needs. 
    • if seeing other authority codes, find the terms from those authorities and copy them to empty cells of existing SUBJECT KEYWORD/GENRE columns of that row and add the authority codes to the cells from SUBJECT /GENRE AUTHORITY columns next to the chosen SUBJECT KEYWORD/GENRE columns. 
    • if needed, open OCLC records to identify terms from other authorities. Please note our process does not cover all subjects from OCLC records. When comparing to OCLC records, if seeing very important ones, we could consider add to the sheet, but this is not required. As well, OCLC 650 and 651 sometimes include GENRE terms, those should be moved to GENRE with GENRE AUTHORITY if applicable. 
    • $0 lists the IDs from each authority. This is helpful info. 
  • When Subjects/GENRE from other authorities all reside in their own cells followed by the proper authority codes, delete all 650$a, 650$2, 650$0; 651$a, 651$2, 650$0 and 655$a, 655$2, 655$0 columns. 
  • Records with terms coming with no authority info (650$2, 650$2, 655$2 have no values)
    • If multiple values in 650$a, 651$a, 655$a, copy the columns to a new excel sheet and then use "Split Text to Columns" under "Data" menu, delimiter ";" to split them into multiple columns and name all columns "SUBJECT KEYWORD"; 
    • Copy and Paste all "SUBJECT KEYWORD" columns back to the end of "InReady" sheet
    • Delete the original 650$a, 651$a, 655$a columns
    • Check if these terms are from SearchFast, if they are, add SUBJECT AUTHORITY columns and put in "fast"; if they are not, no further action required. 
  • GENRE, further work : 
    • If GENRE 1 is has values, we DO NOT add GENRE AUTHORITY for this instance;
    • If no values for any Genre fields, try to assign one based on our limited knowledge and add GENRE AUTHORITY columns to hold the proper authority code; if not confident about the Genre term selection, please discuss with the Metadata Librarian; 
    • dedupe based on the following table:  sobek produces duplicates when seeing the following values in Genre
Terms in MARC 655 Replaced to
Periodicals, Journal(Publications), Magazines, Periodical publications

periodical as GENRE;

marcgt as GENRE AUTHORITY

"Poetry" and "Fiction" appear in one record both as Genre

Remove Fiction

poetry as GENRE

marcgt as GENRE AUTHORITY

Newspapers (newspaper, newspapers etc.) remove from GENRE, add Newspaper as MATERIAL TYPE (Sobek then auto-generates Genre terms)

10. Save the file in Metadata Unit Team Drive: University of Florida > Documents > General --RDA Metadata Unit >UFDCIngestionReady, end the filename with "_InReady". 

The process of preparing ingestion data ends here. 


In UFDC

Manual Updates, after the records were created in UFDC, that is when bibids are ready, the metadata unit will manually update the information for the following fields. 

  • NONE- SERIAL RELATED ITEM TYP: PRECEDING ENTRY
  • NONE- SERIAL RELATED ITEM TYPE: SUCCEEDING ENTRY

For these two fields, the updates should take place on Record Information tab under Related Items. Click to open Edit Related Items. Choose the right Relation Type from the drop-down list. For "Preceding Entry", use "Preceding" and for "Succeeding Entry", use "Preceding". 

Copy the Title from the spreadsheet to Title in the Edit Related Items. 

If OCLC number is available, please search in Worldcat with provided OCLC numbers as shown by the screenshots and copy the sharable URLs to UFDC, so live links can be created for UFDC records. 

  • NONE-COORDINATE

  • NONE-RELATED ITEM
University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.