Skip to Main Content

Baldwin Publisher Cleanup: Home

Updated March 2024 by Armaan Kalkat



The purpose of this project is to ensure that publisher information on items in the Baldwin collection is reflected in the correct place in the metadata and is not duplicated elsewhere on the record, especially in the creator field. The reason publishers often show up in the creator field on UFDC has to do with the way the cataloging was done on these items and how Sobek ingests OCLC records to populate the UFDC metadata. Publishers are often added as additional corporate bodies under a 710 MARC field as well as the traditional 260 or 264 MARC fields on the MARC record. UFDC maps the entries in the 710 field to creators and the 260 or 264 to publisher, meaning the publisher can be listed twice on the record. This contributes to overloaded records and can make it more difficult to tell who was involved in what aspects of an item’s production. This LibGuide is primarily intended as an internal document for metadata staff working with the UFDC but is also available publicly to increase transparency of our workflows and activities.


  • Use UFDC List of Baldwin creators and publishers to identify publishers listed as creators
  • Find authorized name authority records for these publishers
  • Edit UFDC records to remove publishers from creator field and ensure all publisher names are authorized

For more detailed information, click "Steps" in the above menu.


  • Use legacy UFDC’s browse by creator and browse by publisher to make an Excel list of creators and publishers in the Baldwin collection. I do this one letter of the alphabet at a time. Since each of these creators is a facet value in UFDC, this will create a clickable list of links that will pull up the records associated with each creator.
    • The browse by creator list uses the tag "&" for the ampersand sign while the publisher list uses the ampersand. Since we will be comparing the two lists, use Excel's find and replace to replace "&" with "&". This also improves readability.

  • Use an Excel formula (see "Additional Resources" for code)  to find matches between the two lists, indicating that a given creator might be a publisher. Check each of the matches on UFDC and determine if they are a publisher or not. I use color coding to track the likelihood that a creator is actually a publisher, with green indicating high likelihood, yellow being “maybe,” blue indicating that the creator has a different role, such as engraver, lithographer, printer, etc. which I then note with a comment, and black indicating that there are no records associated with that creator.
    • Legacy UFDC treats entries in the manufacturer field as publishers for the sake of browsing by publisher, which is why these other roles often show up in the list.

  • Some publishers listed as creators may not match exactly in spelling with the ones in the publisher list or may not be on that list at all, so there are a couple of other methods for finding likely publishers in the creator list. I use Ctrl+F to search for common elements in publisher names, such as “&” “and” “son” “sons” “brother” “brothers” “firm” “publish” “co.” “company” “press” and “ltd.” Searching for these and checking UFDC to see if those entries are listed as publishers generally captures a majority of the publishers.
  • There may be additional publishers who are listed under personal names rather than corporate names, and these are more difficult to find as many of the creators in UFDC are written with surname first while publishers are written in direct order with first name first. It is possible but time-consuming to manually look through the publisher list and search for names that match the creators. I will usually do a quick scan to look for obvious personal names in the publisher list and search for them in the creator list using Ctrl+F but it’s important to remember that the previous steps have already captured the vast majority of records that need to be edited and it is very difficult to do a project like this completely comprehensively. The point is to make the situation better than it is currently, not solve it entirely.
  • Oftentimes, there will be different variants of the publisher name (one using “&” where the other spells out “and” or having the full version of "company" rather than the abbreviation “co.” etc.) These generally show up alongside each other due to alphabetization, but this is not always the case. When I notice different name variants, I highlight those cells in an orange color to mark that they are variants of the same name.

  • Once I have all my publishers color-coded, I filter the sheet by color to only show confirmed publishers. I then use a TinyTask macro to go down this list and search each entry in the Library of Congress name authority headings (make sure you are searching name authorities and not subject). This is to ensure that the names I am using for publishers when I make my edits are the controlled and authorized versions, which helps make facet search more effective by pooling records together rather than having them split between slight variations in name. It also helps to resolve which name variant to use in cases like those from the previous step. If I cannot find a name authority for a publisher in our system, I work with cataloging staff to see if they can find one or create one if necessary.
  • In a new Excel sheet to track my edits, I copy the list of authorized publisher names, using the adjacent columns in a row to add name variants if necessary.


  • I go down this list, clicking each name to pull up the records that require edits. Using Ctrl+Click, I open each of these records one by one, click on edit metadata on the top banner, and look at the creator and publisher fields.


  • Most of the time, the publisher will be present both as a creator and publisher, but sometimes it will be missing from the publisher field or there will be different name variants in each field. In the screenshot above, you can see that the publisher is present as both a creator and a publisher with different name variants. The goal here is to make sure the publisher entry is the authorized name and that this is not repeated in the creator field unless there is a valid reason, such as the publisher also being listed as a copyright holder, for example (a copyright holder could be included as a creator on UFDC as there is no other field for it).
    • Oftentimes, it is simple to tell which role each creator had based on the information in the parentheses (see screenshot above), but sometimes this information is missing. In cases like this, it is helpful to know that printers, stereotypers, electrotypers, etc. are often included in legacy UFDC under “manufacturer,” although this field doesn’t display on new UFDC. If the piece is digitized, you can also look at the title page where the publisher will often be listed with the publication location at the bottom. You can also use the OCLC number if available to pull up the OCLC record to see how the piece was originally cataloged.
  • If you believe you have found an error or otherwise have a question, you can contact cataloging staff to get more information or request updates to OCLC records if necessary.
  • Because I personally prefer the idea of making all necessary modifications to a record while it is open rather than returning to it over and over, I will also search other publishers on any given UFDC record I am working with in the LOC name authority file to make sure they are also the authorized version and check that they are not duplicated on the record. This will also mean less work to do further down the alphabetized list.
  • As publishers are removed from the creator field, the number of search results from your original query will diminish. Once there are no more records with the publisher in question listed as a creator, it is time to move on and repeat this process with each entry in the tracking sheet on Excel. Make sure to track how many records you edited for reporting purposes. Any notes or additional comments on changes made can also be included in this sheet.
  • Since this method only captures publishers who are also listed as creators, further merging of publisher name variants can be done directly from the list of publishers on Baldwin. My method is to highlight names that are close to each other on the list and appear to be variants of each other in yellow, then edit the records associated with each of the variants other than the LC name authority to also be under the LC name authority. This makes the list of publishers in Baldwin more uniform, shorter, and prevents records being split between slightly different spellings of a publisher’s name. Once I am done with a particular group of name variants, I change the highlight color to green to mark my progress. This can be done concurrently with the Publisher/Creator cleanup or as a separate side-project. One thing to remember is that some of the entries on the publisher list will not actually be publishers, so it can help to open a few records up to know when this is the case.

The following resources are often helpful when researching Baldwin publishers:

Excel Formula for finding matches between two columns:

=IF(COUNTIF($[publisherColumn]:$[publisherColumn], $[creatorColumn]2)=0, "", "MATCH"

Replace [publisherColumn] and [creatorColumn] with the respective column code in your spreadsheet, e.g. ($I:$I, $B2) and drag down the column to apply formula to all rows; this assumes your top row is for column names and your list starts on row 2. 


Metadata Assistant

University of Florida Home Page

This page uses Google Analytics - (Google Privacy Policy)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.