Work needs to address overall data workflow pipeline. Some tools are supported in the MAW (testmaw.uflib.ufl.edu as start of URL).
Production, Work in Planning
Items for roadmap/planning for overall data workflow pipeleine work Q1-Q4, 2022-2023.
- DONE 12/3: Report needed of all items with a blank publication date
- Rights metadata, make mods:accesscondition repeatable with fields for URI, rightsholder, ethicalstatement, ethicalURI, https://docs.google.com/document/d/1FQoQeA0wYlzp3NyAWF40HdBTtr4FmfHNsjPCmJ808IU/edit#heading=h.sf5vmopzkyiu
- Updating builder to properly encode text files in METS (proper order in structmap)
- Updating tesseract processor to properly encode text files in METS (proper order in structmap)
- Report needed for all items with any instances of http://digital.uflib.ufl.edu/metadata/ (should indicate non-upgraded/updated file where METS needs edit+save to auto-upgrade for XSD reference, MainThumbnail, Aggregations, etc)
- UFAR needs reporting
- Needs what was sent added to work history; if in Sobek, developer support documentation includes:
- http://sobekrepository.org/sobekcm/architecture/database/tracking
- http://sobekrepository.org/sobekcm/tracking
- Reporting on how much sent per month
- MAW shows report by each job, so need way to include this data within Sobek, and then for use in reports; or, could be reporting in MAW that could be separate report that could be joined with standard sobek report? Or, first task for new prod (or does Steward already support this in some way?)
- Speaks to larger questions of data workflow pipeline, and how preservation should be automated (and all should be in UFAR, with Tivoli decommissioned, or with explict, stated needs documented for why to continue with Tivoli)
- Need for data on how milestones are and are not used; what of this works and does not work for patron, DSS, etc.
- Ingest of external system records:
- OAI - tools and process in place, refinement in Feb. 2022
- APIs:
- Once new PCMI database is in place, Software Team will need to process batch deletes for items
- Sobek partner/collection aggregations:
- Determine if the restrictive aspects in Sobek (for only partner as holding/source) can/should be removed. The partner/collection distinction is not valid in how the ~460 aggregations are set, so any reason to retain this restriction in Sobek? Can the restriction be removed? Any concerns?
- Need reporting to enable correcting of aggregation codes
- Review active aggregation codes
- Determine which ones are subcollections, or which are connected (all dLOC partners should have partner aggregations and DLOC1)
- Generate report on all items without the parent collections or without DLOC1
- Tesseract
- Add process to validate completion for X # of files in TXT and X pages in PDF, and should match to X # of JPG images
- And, creates log for success/failure based on this
- No info when it fails
- Running smaller batches, so generally does not fail for that
- Generally does fail for items over 700 pages
- Want error reporting when failing, or at least alert with an email “Starting an item that is longer than 500 pages” so that DSS is prompted to check it
- Report on items with old digital xsd file, for knowing bad batches
- Report on items with incomplete mods type of resource - necessary for options for "Title Sets" display and function:
- Should have 2 lines, as shown on http://sobekrepository.org/help/type
- <mods:typeOfResource>text</mods:typeOfResource>
<SobekCM:Type>Book</SobekCM:Type>
- <mods:typeOfResource>text</mods:typeOfResource>
<SobekCM:Type>Newspaper</SobekCM:Type>
- <mods:typeOfResource>still image</mods:typeOfResource>
<SobekCM:Type>Aerial</SobekCM:Type>
- Yet, some only have <mods:typeOfResource>text</mods:typeOfResource>
- PDF items with/o thumbnails: For titles like Bohemia, auto-generate & add thumbnails to each item, or best done by DSS: https://patron-stage.uflib.ufl.edu/results?title=bohemia
- New College ingest processes, needs planning and dev; may be parallel how dLOC partners contribute
- NCF wants to be able to import from spreadsheet
- User from Miami-Dade Public Library System wants a list of all issues for specific title in FDNL, to identify gaps, to be able to send issues in for inclusion in FDNL; does existing reporting meet this need? Or, for LTS to support standard report
Other Production Notes