Guides @ UF: MARC Records to UFDC: Home

Steps

1. Organize the data set in spreadsheet. OCLC number of Electronic Records are required. Derive Electronic Records first before running this workflow. For Thesis and Dissertation that include RDTR and RDTS requests, please use this workflow.

2. Search, group and export records.

Pull out the records via OCLC ID search
Save the whole batch in Local Save File (To save, use F4). (Please make sure Local Save File has been cleaned before starting a new batch.)
Search Local Save File (To open the search box: F3) --> OK --> Choose all files saved in Local Save File and Export (F5) them as .dat file (make sure the export target has been set as MarcExport.dat as shown in the screenshot below: Tools -- Options

3. Locate and rename the dat.file using the date and keywords from the project/batch, for instance, 20230605_Berbiceroyalgazette. This is to make sure all the saved files can be first sorted by the date and then by the project keywords.

.dat files are usually saved at the selected location when setting up OCLC on the work station.

In MarcEdit

4. Convert dat.file to txt files

Tools --> Export --> Export Tab Delimited Records

Set File Paths
- Set the path for the .dat file: click the first file folder icon to locate and select the .dat file prepared during Step 2 and 3
- Set the path for the exported txt file: click the second file folder icon to browse to the folder where the txt should be saved and name the txt file with the same name of the dat. file
- Set In field delimiter and Contextual delimiter to " / "

Load the mapping document
- --> Next --> Settings...--> Load Setting --> choose the map file (downloadable at the bottom of this guide)
- --> Export --> a pop up window will immediately notify the txt file has been created.

In OpenRefine

5. Create the new project in OpenRefine

Locate OpenRefine folder and Click the blue diamond to load OpenRefine in the browser as a new tab
In the browser OpenRefine tab, create the project
- Get data from This Computer --> Choose File --> browse to locate the txt file prepared in Step 4--> Next
- Parse data as TSV --> Character encoding --> UTF-8 --> check Trim leading & trailing whitespace from strings --> Create Project

6.Transform and Cleanup

Adjust the number of rows showing on the same page to browse the loaded records

OpenRefine Adding Rows

Copy the json script to process the current batch
- Undo/Redo --> Apply --> Paste the json script (downloadable at the bottom of the guide) --> Perform Operations
- Browse to spot outliners, fix issues if any; if no issues, proceed to the next step
- Export --> Excel 2007 --> tag the filename with "_refined"

In Excel

7. Rename Headers and Clean up values across the sheet

Open the downloaded Excel --> Copy the content of the first tab to a new tab and name the new tab as "InReady" --> Save as a new file in OneDrive-University of Florida> Documents>General-RDA Metadata Unit>03_UFDCIngestionReady, please choose the year and type folder based on the info the current batch, append the new filename with "_InReady"
Delete the columns that don't have values
Rename the headers, if applicable, as follows. Since blank columns have been removed during the last step, then in this step, the columns in target are the ones have values.

Original Headers	Rename to
111$a	CREATOR
245$c	NOTE
246$a	ALTERNATE TITLE
310$a, 362$a, 440$a, 490$a	NOTE
500,502,504, 505, 507, 515$a, 520$a, 540$a, 546, 580,590	NOTE (502, 504, Thesis and Dissertation Only,590 local notes)
600$a, 610$a, 611$a, 630$a, 653$a	SUBJECT KEYWORD
650$a, 651$a	NO Renaming needed, they are info for SUBJECT KEYWORD , will be deleted after finishing SUBJECT KEYWORD work
650$2, 651$2	NO Renaming needed, they are info for SUBJECT KEYWORD , will be deleted after finishing SUBJECT KEYWORD work
650$v	GENRE
655$a	NO Renaming needed, they are info for GENRE , will be deleted after finishing GENRE work
655$2	NO Renaming needed, they are info for GENRE , will be deleted after finishing GENRE work
All 690	SUBJECT KEYWORD
710$a, 710$b	if exist both exist, merge the values, separator "," (comma) and name the column CREATOR
711$a	CREATOR
720$a	CREATOR
730$a	UNIFORM TITLE
751$a	CREATOR

CREATOR ROLE: Change to Title Case
Check if any data missing, for instance, CREATOR ROLE has value, but CREATOR is missing. If any data missing, check the original OCLC records;
Check if any unwanted punctuations and signs, for instance "\\", "$" or subfield letters, and remove them (Find and Replace function is one method for this step.)

8. Add UFDC Specific Information

Update the InReady spreadsheet to include the following fields and then work on them; if no NOTE info, please delete it: MATERIAL TYPE, AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTE

MATERIAL TYPE

One sheet allows multiple Material Type values. Material Type is a system field where historically was not consciously maintained and was populated by automated ingestion process.

The following values are allowed for Material Type (Mixed Material was removed from this list because that term will trigger Sobek to add a redundant Genre term: unknown (sobekcm)).

Aerial

Archival

Artifact

Audio

Book

Dataset

Map

Newspaper

Photograph

Serial

Video

~~For Thesis and Dissertation as well as related materials and projects, we usually use "Mixed Material".~~
Sobek adds Genre terms in the following instances, to avoid creating duplicates, remove the same Genre before the ingestion

MATERIAL TYPE	Sobek Adds to GENRE After Ingestion	Remove from MARC Genre 655
Serial	serial ( sobekcm )	Serial
Newspaper	newspaper ( sobekcm )	Newspaper (happen most often)

AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTES ( like funding notes, if any)
- The information should be available from the Item Information Sheet.
- Please note, no "i" in front of the Source Institution and Holding Location code.
- Use lower cases for all aggregation codes.

9. Field Group Work

YEAR: make sure the spreadsheet field type is "text"; if no value, add "unknown".

SUBJECT KEYWORD (650, 651) and GENRE (655):

Records with terms from known authorities (650$2, 650$2, 655$2 are populated with values)

Convert Rows to Columns: Copy the values of the extra rows that holds only SUBJECT KEYWORDS/GENRE and SUBJECT/GENRE AUTHORITY to existing blank Columns or add Columns to hold these values, if any

OpenRefine scripts have deduped the terms and pull in fast terms directly from Worldcat server, so the focus here is to make sure that most major topical SUBJECT KEYWORDs from OCLC records are included in the sheet. When choosing values during the MarcEdit step, the focus has been placed on major Topical Subjects, the ones captured by $a, so terms in $v,$x,$y, and $z, subdivision portions that emphasize the context of the topical terms, are not included in the process and DO NOT need to be added here either. Most of the info in those subdivision portions has been already captured by other fields.

FAST terms have been listed in the sheet in their own cells followed by authority codes, the ones not in their own cells are terms from other authorities.
locate the terms from other authorities in OCLC records. Frequently terms in OCLC records are hyperlinked and that means they are LC terms. Most of these terms are part of FAST terms, you can locate them in SearchFast and then add SUBJECT AUTHORITY column, adding "fast" as the authority code.
copy and paste from OCLC them to empty cells of existing SUBJECT KEYWORD, if available or to new columns of SUBJECT KEYWORD, if needed
add SUBJECT/GENRE AUTHORITY code to existing columns or create new SUBJECT/GENRE AUTHORITY column to hold the code

common SUBJECT/GENRE AUTHORITY lists below

Name	Code for Metadata sheet	Seen in OCLC records
Université Laval	rvm	CaQQLa
RAMEAU: répertoire d'autorité-matière encyclopédique et alphabétique unifié (Paris: Bibliothèque nationale)	ram
Thésaurus des descripteurs de genre/forme de l'Université Laval (Québec, Québec: Bibliothèque de l'Université Laval)	rvmgt
Biblioteca Nacional de Espana	abne

Links to authority codes:

Subjects: https://www.loc.gov/standards/sourcelist/subject.html

Genres: https://www.loc.gov/standards/sourcelist/genre-form.html

650$a, 650$2; 651$a, 651$2 and 655$a, 655$2 hold the original data from OCLC records. By reviewing these columns, we can know quickly the terms that should be deduped and kept. This understanding helps us do further work.
- $a lists the terms: this holds many duplicated terms
- $2 lists the codes: the number of codes in this column can indicate the number of subjects and also if any terms are from other authorities, but some times $2 does not include all the codes. In this case, check $a and OCLC records to form a clear picture of the needs.
- if seeing other authority codes, find the terms from those authorities and copy them to empty cells of existing SUBJECT KEYWORD/GENRE columns of that row and add the authority codes to the cells from SUBJECT /GENRE AUTHORITY columns next to the chosen SUBJECT KEYWORD/GENRE columns.
- if needed, open OCLC records to identify terms from other authorities. Please note our process does not cover all subjects from OCLC records. When comparing to OCLC records, if seeing very important ones, we could consider add to the sheet, but this is not required. As well, OCLC 650 and 651 sometimes include GENRE terms, those should be moved to GENRE with GENRE AUTHORITY if applicable.
- $0 lists the IDs from each authority. This is helpful info.
When Subjects/GENRE from other authorities all reside in their own cells followed by the proper authority codes, delete all 650$a, 650$2, 650$0; 651$a, 651$2, 650$0 and 655$a, 655$2, 655$0 columns.
Records with terms coming with no authority info (650$2, 650$2, 655$2 have no values)
- If multiple values in 650$a, 651$a, 655$a, copy the columns to a new excel sheet and then use "Split Text to Columns" under "Data" menu, delimiter ";" to split them into multiple columns and name all columns "SUBJECT KEYWORD";
- Copy and Paste all "SUBJECT KEYWORD" columns back to the end of "InReady" sheet
- Delete the original 650$a, 651$a, 655$a columns
- Check if these terms are from SearchFast, if they are, add SUBJECT AUTHORITY columns and put in "fast"; if they are not, add "local" as the authority code.
GENRE, further work :
- If GENRE 1 has values, we DO NOT add GENRE AUTHORITY for this instance;
- If no values for any Genre fields, try to assign one based on our limited knowledge and add GENRE AUTHORITY columns to hold the proper authority code; if not confident about the Genre term selection, please discuss with the Metadata Librarian;
- dedupe based on the following table: sobek produces duplicates when seeing the following values in Genre

Terms in MARC 655	Replaced to
Periodicals, Journal(Publications), Magazines, Periodical publications	periodical as GENRE; marcgt as GENRE AUTHORITY
"Poetry" and "Fiction" appear in one record both as Genre	Remove Fiction poetry as GENRE marcgt as GENRE AUTHORITY
Newspapers (newspaper, newspapers etc.)	remove from GENRE, add Newspaper as MATERIAL TYPE (Sobek then auto-generates Genre terms)

10. Save the file in Metadata Unit Team Drive: University of Florida > Documents > General --RDA Metadata Unit >UFDCIngestionReady, end the filename with "_InReady".

The process of preparing ingestion data ends here.

In UFDC

Manual Updates, after the records were created in UFDC, that is when bibids are ready, the metadata unit will manually update the information for the following fields.

NONE- SERIAL RELATED ITEM TYP: PRECEDING ENTRY
NONE- SERIAL RELATED ITEM TYPE: SUCCEEDING ENTRY

For these two fields, the updates should take place on Record Information tab under Related Items. Click to open Edit Related Items. Choose the right Relation Type from the drop-down list. For "Preceding Entry", use "Preceding" and for "Succeeding Entry", use "Preceding".

Copy the Title from the spreadsheet to Title in the Edit Related Items.

If OCLC number is available, please search in Worldcat with provided OCLC numbers as shown by the screenshots and copy the sharable URLs to UFDC, so live links can be created for UFDC records.

NONE-COORDINATE

NONE-RELATED ITEM

MarcEdit_Map
Can be opened up in any Text Editor, notepad and then save as a .txt to be used in MarcEdit
OpenRefineScript-Published Items
Can be opened up in any Text Editor, notepad, e.g., and then copy to OpenRefine.

MARC Records to UFDC: Home

Overview

Steps