1. Organize the data set in spreadsheet. OCLC number of Electronic Records are required.
Digital Projects Item Information Sheet prepared by the Collection Managers and Cataloging Unit provides Metadata Unit the OCLC IDs needed to create electronic records. The OCLC IDs included in the Information Sheet could be the ones for Print or Microfiche or Electronic records. Depending on its type, Metadata Unit will start the work from here. For the first two types of records, the Unit creates Electronic records. For existing Electronic records, the Unit will directly apply it to the following steps. For Alma records, the process of grouping records is different.
OCLC Electronic Records
2. Search, group and export records.
- Pull out the records via OCLC ID search
- Save the whole batch in Local Save File (To save, use F4). (Please make sure Local Save File has been cleaned before starting a new batch.)
- Search Local Save File (To open the search box: F3) --> OK --> Choose all files saved in Local Save File and Export (F5) them as .dat file
3. Locate and rename the dat.file using the date and keywords from the project/batch, for instance, 20230605_Berbiceroyalgazette. This is to make sure all the saved files can be first sorted by the date and then by the project keywords.
.dat files are usually saved at the selected location when setting up OCLC on the work station.
4. Convert dat.file to txt files
Tools --> Export --> Export Tab Delimited Records
- Set File Paths
- Set the path for the .dat file: click the first file folder icon to locate and select the .dat file prepared during Step 2 and 3
- Set the path for the exported txt file: click the second file folder icon to browse to the folder where the txt should be saved and name the txt file with the same name of the dat. file
- Set In field delimiter to " ; "
- Load the mapping document
- --> Next --> Settings...--> Load Setting --> choose the map file (latest version: MAPMARCtoUFDC_0826)
- --> Export --> a pop up window will immediately notify the txt file has been created.
5. Create the new project in OpenRefine
- Locate OpenRefine folder and Click the blue diamond to load OpenRefine in the browser as a new tab
- In the browser OpenRefine tab, create the project
- Get data from This Computer --> Choose File --> browse to locate the txt file prepared in Step 4--> Next
- Parse data as CSV / TSV / separator-based files --> Character encoding --> UTF-8 --> check Trim leading & trailing whitespace from strings --> Create Project
6.Transform and Cleanup
- Adjust the number of rows showing on the same page to browse the loaded records
- Copy the json script to process the current batch
- Undo/Redo --> Apply --> Paste the json script (filename) --> Perform Operations
- Browse to spot outliners, fix issues if any; if no issues, proceed to the next step
- Export --> Excel 2007 --> tag the filename with "_refined"
7. Rename Headers and Clean up values further
- Open the downloaded Excel --> Copy the content of the first tab to a new tab and name the new tab as "InReady" --> Save as a new file in OneDrive-University of Florida> Documents>General-RDA Metadata Unit>03_UFDCIngestionReady, tag the new filename with "InReady"
- Verify the data for the following field couples, check the completeness, dedupe, correct cases, and then delete the original; Delete all columns with no values or meaningless values.
|001 vs OCLC
||008$28 1 vs GENRE
||PUBLISHER vs PUBLISHER 1
||600$a vs other 600$a
|610$a vs other 610$a
||611$a vs other 611$a
||630$a vs other 630$a
||650$a vs other 650$a
|651$a vs other 651$a
||653$a vs other 653$a
||655$a vs other 655$a
700 vs other 700
Check the values in 000$06 and 000$07, if "Delete marctarget term" appears in one row, delete the value in 008$22 of the same row and then delete both 000$06 and 000$07 columns
- Delete dot at the end of YEAR and change YEAR to be text field.
- Rename the headers as follows
|001 1 |
|All remaining 050
|222$a, 246$a, 247$a
||NOTE (reserved for Thesis and Dissertations, if no values, delete the columns)(The values now can only be pasted from the MarcEdit produced txt files. OpenRefine scripts need to be revamped to keep the value in tact.)
|310$a, 362$a, 490$a, 505, 507, 515$a, 520$a, 540$a, 546, 580
All remaining 600$a, 610$a, 611$a, 630$a, 650$a, 651$a and 653$a
||CREATOR, split with $e to add CREATOR ROLE if any data come with $e;
|All 710,711, 720
- If any data missing, for instance, CREATOR ROLE has value, but CREATOR is missing
- CREATOR ROLE: Add "author" to CREATOR ROLE 1 if missing
- SUBJECT KEYWORD:
- Check if values in all terms in SUBJECT KEYWORD are in PROPER case, that is, the first letter of all words of a term are in capital; Use the excel function "=PROPER()" can simplify the process: Insert a new column, use the PROPER() on the first row and then apply the function to the whole column via dragging down the function, copy the value, paste to the original column use "paste value only" and delete the inserted column (a short video?)
- Check if any numbers included as a part of SUBJECT KEYWORD miss dot/decimal
- Dedupe/Replace, for instance, when "Madison (fla)" and "Madison County (fla)" appear for one title, replace them with the latest FAST term "Florida--Madison County" and add a new "SUBJECT AUTHORITY" Column to capture the source of this newly introduced term, that is "fast".
- TITLE: For Titles in Foreign Languages, to facilitate the search in UFDC
- if any accents in TITLE, copy the title to a new column named as "ALTERNATE TITLE" and in the original TITLE, replace the letter with the ones without accents
- PUBLICATION LOCATION : It's repeatable, make sure one value, one cell.
- if any accents in PUBLISHER, copy the name to a new column named as "PUBLISHER" and then in the original PUBLISHER replace the letter with accents with the ones without accents
- It's repeatable, make sure one value, one cell.
- YEAR: make sure the spreadsheet field type is "text"; if no value, add "unknown".
- GENRE: dedupe based on the following table; sobek produces duplicates when seeing the following values in Genre
|Terms in MARC 655
|Periodicals, Journal(Publications), Magazines, Periodical publications
periodical as GENRE;
marcgt as GENRE AUTHORITY
poetry as GENRE
marcgt as GENRE AUTHORITY
8. Add Additional Information
- Update the InReady spreadsheet to include the following fields and then work on them: MATERIAL TYPE, AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTES
- MATERIAL TYPE
- One sheet allows multiple Material Type values. Material Type is a system field where historically was not consciously maintained and was populated by automated ingestion process.
- The following values are allowed for Material Type.
- For Thesis and Dissertation as well as related materials and projects, we usually use "Mixed Material".
- Sobek adds Genre terms in the following instances, to avoid creating duplicates, remove the same Genre before the ingestion
||Sobek Adds to GENRE After Ingestion
||Remove from MARC Genre 655
||serial ( sobekcm )
||newspaper ( sobekcm )
||Newspaper (happen most often)
- AGGREGATION CODE, RIGHTS, SOURCE INSTITUTION STATEMENT, HOLDING LOCATION STATEMENT, SOURCE INSTITUTION CODE, HOLDING LOCATION CODE, NOTES ( like funding notes, if any)
- The information should be available from the Item Information Sheet.
9. Save the file in Metadata Unit Team Drive: University of Florida > Documents > General --RDA Metadata Unit >03_UFDCIngestionReady, end the filename with "_InReady".
UFDC Manual Updates
If seeing the following three headers in the ingestion spreadsheet, open up the newly created bibids in UFDC to add values.
- NONE- SERIAL RELATED ITEM TYP: PRECEDING ENTRY
- NONE- SERIAL RELATED ITEM TYPE: SUCCEEDING ENTRY
For these two fields, the updates should take place on Record Information tab under Related Items. Click to open Edit Related Items. Choose the right Relation Type from the drop-down list. For "Preceding Entry", use "Preceding" and for "Succeeding Entry", use "Preceding".
Copy the Title from the spreadsheet to Title in the Edit Related Items.
If OCLC number is available, please search in Worldcat with provided OCLC numbers as shown by the screenshots and copy the sharable URLs to UFDC, so live links can be created for UFDC records.