Notes from ALA Midwinter Meetings on Meetings Impacting MARC and Other Encoding Standards
Submitted by Jim Soe Nyun, Chair, Encoding Standards Subcommittee
March 5, 2018
MARC Advisory Committee (MAC)
Two sessions: Saturday, February 10 8:30 AM – 10:00 AM / Sunday, February 11 3:00 PM – 5:30 PM
Formal agenda: https://www.loc.gov/marc/mac/mw2018_age.html
Because of the governmental funding instability around the time of the conference very few staff from the Library of Congress attended ALA. Sally McCallum was the sole person from LC at this meeting. John Zagas, who handles much of preparatory work for the MAC meetings, was not able to attend.
Proposal No. 2018-01: Coding 007 Field Positions for Digital Cartographic Materials in the MARC 21 Bibliographic and Holdings Formats (https://www.loc.gov/marc/mac/2018/2018-01.html)
ACTION: MAC agreed with the need to provide ways to more clearly describe digital materials in the 007 field for cartographic materials. The techniques outlined in the proposal made sense and the committee approved the proposal.
Discussion Paper No. 2018-DP01: Defining New Subfield $i in Fields 600-630 of the MARC 21 Bibliographic Format (https://www.loc.gov/marc/mac/2018/2018-dp01.html)
ACTION: MLA supported the general goals of this paper, but the overall sense was that access point systems already in place in the 6XX block already did some of what was being proposed in this paper. It was felt that implementing relationship designators, including those for work-to-work relationships from Appendix M, would muddy the waters and lead to confusion and redundancy. The Library of Congress was clear that they did not support the paper. There will still need to be a way for catalogers to apply the Appendix M designators somewhere in the record, but the long MAC discussion made it clear that there was little support for the methods discussed in this paper. The parties responsible for the paper will regroup to see if there is an alternate method they would like to propose in a future discussion paper.
Discussion Paper No. 2018-DP02: Subfield Coding in Field 041 for Accessibility in the MARC 21 Bibliographic Format (https://www.loc.gov/marc/mac/2018/2018-dp02.html)
ACTION: There was much support for the goals and methods outlined in this paper. A MAC member proposed that it might be useful to also accommodate indicating the language of intertitles supplied for enhanced accessibility. The paper will be finalized and return at the next or future MAC meeting, likely as a proposal. This paper did not go nearly as far as the formal proposals which were rejected last year, and limiting accessibility feature discussion to the languages involved would require a person or system to deduce from the 041 that accessibility features are provided. There may be a complementary reworking of the rejected proposal at some point in the future, a paper that would give a way to directly code for what accessibility features are present.
Discussion Paper No. 2018-DP03: Inventory of Newer 3XX Fields that Lack Subfield $3 in the MARC 21 Bibliographic Format (https://www.loc.gov/marc/mac/2018/2018-dp03.html)
ACTION: This was a discussion paper put forward by MLA. MAC agreed that it would be good to have Subfield $3 defined in the MARC fields pointed out in the paper. MAC liked that many of the examples used terms that were descriptive of the content they referred to, e.g. a term like “Preface” or an abbreviated title, rather than relying on positional phrases like “1st work,” which would cease to make sense in linked data once the subfield is broken apart from the context of the record. Also, it was pointed out that the form of the work access points in section 2.4.3 were incorrect by current RDA standards, and that the LRM-based revisions might treat these works differently. The paper was advanced to a formal proposal on the spot, and provisos attached that the examples would include $3’s that were more descriptive and not tied to the carrier, and that the choreographic work example would be replaced. MAC approved the new proposal on the spot. MLA Encoding Standards will work on revising examples in the discussion paper, and other communities may also come forward with their own suggestions for examples that will be published in the final MARC documentation. (Historically some of the examples provided in proposals have been eliminated from the final published documentation. Some examples that are useful for discussion at MAC may not end up being essential to illustrate how fields and subfields are to be used.)
Discussion Paper No. 2018-DP04: Multiscript Records Using Codes from ISO 15924 in the Five MARC 21 Formats (https://www.loc.gov/marc/mac/2018/2018-dp04.html)
ACTION: This quickly turned into a rather technical discussion, and there was some confusion about the intention of this paper. Discussion returned several times to the background information that systems need to be told when MARC8 characters are present to decode them properly, but that Unicode scripts were self-describing, including the ISO 10646 standard currently used in MARC and ISO 15924 being put forward in this paper as an alternative. The self-describing nature of Unicode led to comments that the use of $6 described in the paper were likely not needed. The authors will consider this, and also think more about whether a new subfield in Field 066 could accomplish what they would like to define. A subsequent discussion paper or proposal may emerge from this discussion.
Discussion Paper No. 2018-DP05: Adding Institution Level Information to Subject Headings in the MARC 21 Bibliographic Format (https://www.loc.gov/marc/mac/2018/2018-dp05.html)
ACTION: Attendees imagined several scenarios where metadata provenance could be both useful and problematic. Some envisioned a scenario with multiple competing terms, and institutions could possibly end up appending $5’s for everyone who might be using a term. In the end these concerns seemed like they could be dealt with via best practices. This will probably come back as a more defined paper or proposal.
Discussion Paper No. 2018-DP06: Versions of Resources in the MARC 21 Bibliographic Format (https://www.loc.gov/marc/mac/2018/2018-dp06.html)
ACTION: Good support for this paper’s aims, with consensus that there are situations when recording a version of a resource is important. There was discussion of the various options offered, with a final determination that the recording method chosen should be designed to accommodate controlled vocabularies for states and versions of resources.
OCLC Linked Data Roundtable
Saturday, February 10
10:30 AM – 11:30 AM
Roy Tennant leading panel on OCLC linked data projects
#oclcldr
Sarah Newell, OCLC
Linked Data Prototypes
Last fall collaborated with UCD, Cornell and a Montana State Library about LD use cases, building on previous LD prototype projects. Develop a Reconciler, Minter, Relator to connect legacy data to LD entities, mint new entities, and one further function.
Working toward a centralized hub, part of an Entity Ecosystem. Developed database with a reconciler on top; next step is something to allow editing of entities. Used Wikibase for display/search, an out of the box tool for the project. Lesson: good communication to develop project was needed; needs more than just converting records, need to edit; scale matters, that enough data is needed.
Next: creating and editing entities; going from feedback to enhancing functionality; expanding project out from initial partners. Looking for partners: contact newells@oclc.org.
Sally McCallum
Talking about MARC, converting 19M MARC records into BIBFRAME
Looking at work records: In MARC authority of bib description? Currently in bib to permit “uniform title” transformation. They have a continuous flow of incoming MARC.
Wondering about what to add to authority format that is currently in the bib format.
Needs: 100 author/title strings that match standard strings
Place to records notes about formulation of the string, e.g. 670
Place to record title variations (4xx)
Place to record to all versions of the title, e.g. common title i law.
Adding 400s, 381s for e.g. translator and 377s for language, also 500s for related agent, e.g. translator.
Looked at how they are used, by catalogers, by end users.
What’s missing from the bib format: statement that a bib is for a work, series control fields, variant title fields, related title fields, related agent names.
The above MARC tags don’t relate to direct transcription from one format to another, authority 5xx may end up in 7xx.
Bib content missing from authorities: things like 520s, topics, etc, how much is really needed.
LC doesn’t make work records for everything. For all? For when there are 240s? A system should do this work, extracting from existing metadata.
MJ Hahn (Myung-Ja), Illinois
Linked data for user services
Why are we doing this?
As a producer, to make local resources discoverable on the web.
As a consumer, provide web content to local users.
Emblamatica Online project 2015
Portal for Renaissance texts and images, digitized resources from ca 1400 books and 28K emblems. Uses VIAF to provide additional information, e.g., gender. Generates a “More info” button from VIAF in the Emblamatica
Using IconClass to user services, has 4 language options (supporting international collaboration on project), terraced vocabularies at different levels.
Other project: LOD for Digitized Special Collections
18 mo exploratory study funded by Mellon Foundation
Special collections are silos, aimed to map shecmas to LOD to expose resources.
Theatre collections generating knowledge cards for entities, enabling browse by related resources. With Wiki links. In item page, developed a sidebar of related resources. Can generate link clouds, that users can annotate.
Lessons learned: Contextual information can help users with cards that leverage LOD sources; this really is
resource-intensive. Other things to do with LOD are still to be explored.
Questions: How is an entity ecosystem not a triple store? Ecosystem is a format-agnostic database, more functionality than a triple store.
In LC’s conversion, are they dropping data in the conversion? The conversion specs include some not-applicable data and other things that they chose not to encompass.
“Work?” Lots of definitions, but Sally was talking about a BF work.
Is there proof that all this LD work is helping users? The project includes assessment tools, and they will be continuing work next month. They are using Google Analytics. Results of analysis will be published soon. // Sarah Newell mentioned they have nothing to add // Sally acknowledges things are important.
Are there stats on LD projects at Illinois? No direct numbers … yet.
In addition to improving the user experience, are there things being done for machine discoverabilty? Using Schema.org to exchange SEO. Both Illinois projects used such schema.org.
Sally: How much of the LC work is manual versus automated. The projects presented are actually talking points, not work conducted.
Cards, etc., point outwards. Potential of losing lots of users. Confidence that users found the pages and that they’d be back.
How bib is the ecosystem? > 1M items. Small subset, could easily exceed 1B.
Are LC’s MARC discussions likely to head to mARC changes? These are just investigations. They hope to repurpose what’s extant. Exploratory. Changes to make LOD easier.
What to do with subject string conversion when there aren’t exact subject matches. LC converts them now. They use labels when there aren’t matches.
American/British items published w/o UTs, how to handle? Just weighing pros and cons of working the bib file.
If their project is successful what will you gained from it? MJ will publish scripts and everything else used in their project. (Github page.) // LC [aside: 15 years of a hybrid environment] may have some discussions about how to enhance description. // Sarah, a better understanding of what problems they’re solving.
MARC Formats Transition Interest Group
Saturday, February 10
3:00 PM – 4:00 PM
Program on ALA Connect: http://connect.ala.org/node/248329
Deborah Shapiro, moderator
Theme: Local Tools, what’s being done locally with linked data
Barbara Bushman, NLM
Preparing for the Future: National Library of Medicine’s Project to Add MESH RDF URIs to its bibliographic and authority records
Adding URIs doesn’t hurt current MARC implementation and prepares the catalog for the future. Focus on names and corporate bodies, but started with MESH, a product they own and understand. Each string must be controlled, unlike the open structure of LCSH, which allows for far more (almost unlimited) string combinations. They have 850K MeSH records and add 350 bib records with MeSH. Would like to add $0, and has dispensation to add them to BIBCO records, even though it[s not yet PCC practice. Started with MarcEdit MARC Next tool, had 100% matching. Voyager didn’t let the records be update without taking them out and reinserting. Made their own program to add RDF URIs for MeSH, first to authorities then bibs. The process took 36 hours for their catalog. They no do this for all current cataloging. Worked on community impacts, notifying users of their records, including the giant update that included all their database. CIP display is ugly, so they suppress the $0. OCLC didn’t want them since they figured out adding MeSH, though not in actionable form. Realized they have a very clean database, and then further cleaned up 4100 MeSH errors in their catalog.
Next steps: Collect feedback. So far only one library has inquired what these $0s are good for. Work with MarcEdit, Authority Toolkit, maybe other tools. Work with Casalini to publishe some NLM data as linked data.
Sally McCallum, NDMSO, LC
Transforming a Catalog—Converting MARC records to BIBFERAME
Jodi Williamson could not come to give this report, Sally presented her slides.
Transform components: BIBFRAME model revised to 2.0, revised all specs, including conversion programs, also infrastructure improvements (more servers, MarkLogic. Added Item information and Events added to model. New specs include retaining extinct LCCNs, ability for Duration to repeat. Specs also got URIs AND labels from code lists. (These also can feed into the editor and keying tools.) 18M MARC records converted to BF Works, Instances and Items; 1.2M uniform title works converted.
Developed Comparison Tool
LCCN entered into it will show how a MARC record is transformed into RDF.
Next step, Match and Merge
MATCH: Discover all the Instances (manifestations) for a Work; Discover if Work descriptions already in file (from MARC title authority records), if not creat Work. MERGE: Merge subjects and other info into record.
Title AF => Work record (BF work)
Instance subjects merged into single Work record.
Instance records link up to Work and down to Item.
LC BIBFRAME Editor: “Not flawless, does a pretty decent job.” Catalogers appreciate the type-ahead features. Some drop-downs. RDA rules hotlinked to element labels.
Entire workflow: Has a process to double-catalog things, MARC and BF. 200+ non-pilot catalogers, 60+ Pilot catalogers; 19.2 Milligan Work, 23.7 M BF Instances. RDF file of >4 billion tripes.
Close to offer LC’s BF file for others to explore; mapping from BF to MARC to simplify their workflow. Do works records in the bib file. Looking at more records with transliteration.
Brian Rennick, BYU
Enriching Author Records with Linked Data
Tyler Ashcroft wrote the code for this project
Talked about enrichments that live OUTSIDE the catalog; they have a federated search, Symphony from SirsiDynix
Inputs include MARC XML, XML from Wikidata (using SPARQL)
Processed with Apache Jena in Java, cached in Mongo database; Author landing pages linked from federated search. These components come from author landing page.
They tried out Zepheira’s Identifier Services, but found LC data to be easier to work with
Wikidata Enrichments: Birth/death place/date with geo coordinates, birth names, other facets; also pulls in IDS from other sources including IMDB.
Followed Wikidata and Blazegraph requirements to process Wikidata files
What’s next? This has been an experiment. Want to publicly share landing page for authors. Next to subjects and genres?
Questions (several asked, selections below):
Does NLM add component links to compound strings? No. Not in MARC bibs, but is in RDF, which includes allowable qualifiers.
Faceted Subject Analysis Interest Group
Saturday, February 10
4:30 PM – 5:30 PM
Notes on ALA Connect: http://connect.ala.org/node/273985
Kelly McGrath, Univeristy of Oregon
Using MARC Facets for Music with Primo: Strategies and Challenges (ALA Connect: http://connect.ala.org/files/Using%20MARC%20Facets%20for%20Music%20with%20Primo%20final.pptx)
LCSH difficult to use unless keywording for browsing, and then would need a system that does stemming
(saxophone/saxophones)
MARC 048 was never fully developed
LCMPT better implementation for medium of performance, and its uptake could be enchanted by the new Strawn/MLA MOP Toolkit.
Developed at Ball State, with Sue Weiland, a search for chamber music that used the 048. They generated 048s using the richer IAML instrument vocabulary. Also added local $z for Form, e.g. sn (Sonata) or chbr (Chamber music).
Oregon has Alma to Primo workflow, and Primo has ways to transform MARC in interesting ways. They’ve developed something similar to the Ball State project that uses MOP statements
They’ve developed a medium of performance statement facet, though it’s not perfect.
They’ve also developed something for numbering in the 383. Leads to messy displays when there are multiple $a’s or $b’s.
They also mine 700 strings for title elements.
Also a facet for subject strings, they find easier than LCDGT, partly because of prevalence in data.
Retrospective conversion might be needed to make these facets more useful because bib records are not consistently coded with facetable data.
Another problem is that there aren’t ways to navigate within hierarchies.
Questions about whether to record this information in authority or bib records. More contributors to bib records than authorities; authorities have the promise of less work/maintenance.
Messes can happen with compilations, collaborative works.
Diane Vizine-Goetz, OCLC
Update on FAST
They’ve developed a tool for proposing FAST headings. For names, corporates and topical. Short turnaround. Personal and corporate names aren’t added automatically, only when they appear as subjects because most names aren’t ever used as subjects.
Users must use existing LCSH strings. Can assemble components into a longer string. Doesn’t prevent bad string construction. But the plus that headings can be done at the point of cataloging is a big plus.
New to FAST: event headings from 111 fields into 147 fields, using new field. Structure of heading is sub fielded and different from 111 and LCSH forms.
There was a FAST survey front he end of last year. Results have been tabulated and will be distributed soon. Preview of results: Need tools to support ongoing maintenance; production tool for GAST heading look-up; seamless addition of FAST to records at the point of cataloging.
FAST does include a de-duping process, e.g. Biography and Biographies. But they haven’t done it in a while wile.
Does FAST record what LCSH record it was based on? Sometimes, when there are straightforward correlations, but not when FAST came about via rules.
Question about whether geographic will be harvestable from NAF. Not just yet. It’s more complicated to derive FAST geographics. They are still strongly linked to what’s in LCSH.
Question that non-English materials are not well served by LCSH. A future development opportunity.
Metadata Interest Group
Sunday, February 11
8:30 AM – 10:00 AM
Official IG minutes for those with access to ALA Connect: http://connect.ala.org/files/ALCTS-MIG-2018MW_minutes.pdf
Chew Chiat Naun
Head, Metadata Creation
Harvard University
Title: National Strategy for Shareable Local Name Authorities
Abstract: Libraries create local authorities to serve a variety of purposes, usually within an institutional context. It is becoming increasingly evident, however, that identities have much greater potential value if they can be shared. The IMLS Shareable Authorities forum brought together representatives from a wide range of stakeholders to explore themes including minimum viable specifications, data provider obligations, and reconciliation as a service. The objective of the forum is to identify services and practices that will be needed, and assumptions that will have to be made – or changed – to allow authorities to work across domains and at scale.
NOTES:
IMLS Shareable Authorities Forum
2 meetings held at Cornell and LC in 2016-17. IMLS grant to Cornell University, partnered with LC, OCLC, PCC, ORCID, CNI, SNAC, BIBFLOW, Stanford, Harvard
Deliverables: Report and reference model in progress, a few more months before final report
Why local authorities? Interest in linked data from supply end, ability to scale
Why a forum? IMLS has a grant program for forums, less stringent than other grant models; Cornell was already working on these issues. Opportunity to share knowledge, even if a solution to problems won’t be a product of the grant
“Linked data changes the game” Even with increased exposure and sharing, an authority file needed cross-platform
Related projects (Inter Alia)
IMLS Western Name Authority File Project
PCC
OCLC organizational Identities in ISNI report
Others…
40 people involved in forum
What they learned: Discussions focus a lot on use cases or institutional mandates. Data models can evolve from competing organizational needs. Persistence an issue, heavily in the experimental realm
Workflows
Technical needs
Social/organizational goals
Attendees at forum come from different places:
BL interested in developing workflows for expanded collections
OCLC, Getty interested in requirements for aggregation and LOD publishing
Publishers want to use identities, don’t want to manage
Self-registration (ORCID)
Modeling issues:
Provenance (important in archives community)
Preferred labels (big in library world, but can impede sharing LOD; ISNI ignores labels)
Granularity
Social, organizational issues:
Licensing, business modeling (solutions for now aren’t always maintained)
Privacy, confidentiality
Sustainability
Governance
Outcomes, directions
Algorithms for matching are important. Maybe everyone should share these. Will this really happen?
Develop a “Minimal viable product”
Reconciliation service
How to provide data in a way that’s suitable for aggregation
Emphasis moving to corroborating data
Best practices may be domain specific
Reconciliation as a service
A stack of software and data would:
Harvest authorities
Work for many use cases
Deal with wide range of quality
Work with degrees of confidence
Responsibilities of Data Providers:
Provide provenance
A voice redundancy
Code disambiguation data in machine actionable forms
Allow iteration based on reports
Responsibilities of Aggregators:
Err on the side of duplication than mis-conflate
Provide unique persistent cluster identifiers
Record provenance of individual data elements
Use but respect confidential proprietary data
Ayla Stein
Metadata Librarian
University of Illinois at Urbana-Champaign
Title: Developing A Framework for Measuring Reuse of Digital Objects: Project Update at the Metadata Interest Group, ALA Midwinter 2018
Abstract: Content reuse, or how often and in what ways digital library materials are utilized and repurposed, is a key indicator of the impact and value of a digital collection. Traditional library analytics focus almost entirely on simple access statistics, which do not show how users utilize or transform unique materials digital collections. This lack of distinction, combined with a lack of standardized assessment approaches, makes it difficult to develop user-responsive collections or highlight the value of these materials. The grant project, Developing a Framework for Measuring Reuse of Digital Objects, an IMLS-funded project (LG-73-17-0002-17) by the Digital Library Federation Assessment Interest Group, is working to address this critical area. This presentation will illustrate the variety of ways digital library objects (including metadata) are being reused; share the results of the grant team’s work, including preliminary findings from the initial survey results as well as in-person and virtual focus groups sessions. The presentation will conclude with the team’s early findings and will engage the audience to contribute their feedback on the project and deliverables.
NOTES:
Full team includes 6 organizations
Started out with white paper: Surveying the Landscape: Use and Usability Assessment of Digital Libraries (2015); 2017 National Leadership Grant from IMLS
Goals: ID sustainable and vetted assessment techniques that can support many kinds of digital collections / Support cultural heritage organizations … / 3rd goal
Data collection:
Initial survey
Focus groups: in-person, also virtual
Final survey
Deliverable:
Web site so far
Use vs. Reuse: Difference
No community definition
Developed working definitions
Use: Discovering and browsing objects in a digital library, often described as clicks or downloads, without knowing the specific context for this use.
Reuse: how often and in what ways digital library materials are utilized and repurposed. In this definition, we do know the context of the use.
How does metadata fit into this?
Metadata = Data
Collecting all types of digital data
So far:
Initial survey complete
In-person focus groups, 1 round done
Virtual focus groups, 1 round down
Final survey, informed by focus group data, not yet designed
First survey, concentration on how do cultural heritage organizations asses digital library reuse? What resources would help with reuse?
Some survey numbers, 409 responses to 19 questions
80% collect use statistics
40% only collect reuse statistics, citing lacks of methods, lack of staff
Highlighted ideas
Assessment metrics must demonstrate impact
Need training on how to gather stats, how to advocate for resources via metrics
How to deal with patron privacy
People are interested in metrics, would like best practices for what methods to use
Takeaways & next steps
Use survey results as benchmark for community progress
NO FREE TEXT in survey design (audience suggestion: test your survey before conducting it)
Collect and analyze focus group data
Slides on ALA Connect, linked to Metadata IG Blog
BUSINESS MEETING
Welcome
Officer Reports:
Program for ALA Annual, keynote (Philipp Schreur) on topic of LOD, have a call out for additional speakers
Blog for the IG: ALCTS does backups for the blog; 144 posts back to 2007; 30 legit comments out of much spam
Elections at ALA Annual, encouragement to run for office
Review of minutes:
Liaison reports
MLA (brief) report:
2 members from MLA were part of PMO project (component of the larger LD4P Mellon Grants) which issued
several draft papers this year about features of the Performed Music Ontology (“An extension of the BIBFRAME ontology for describing performed music, both for mainstream and archival performed music collections”) including a detailed, complex one on medium of performance. Available on BioPortal
(https://bioportal.bioontology.org/ontologies/PMO) and Github (https://github.com/LD4P/PerformedMusicOntology).
MLA’s Linked Data Working Group (LDWG) about to start up again after brief hiatus, main task to examine
Performed Music Ontology, including an effort to compare its products against use cases developed by LDWG.
MARC changes:
Joint MLA-OLAC task force to look at what to do with the 33x and 34x MARC fields.
33x fields will be left in their current compact form, following what the Library of Congress stated would
be its policy.
34x fields will be exploded out, one subfield per line, with source vocabulary for that subfield cited;
opportunity for the identifier for the term to be included in the field.
Discussion Paper on adding $3 to 5 MARC fields
Fast-track changes to Field 384, Key, made repeatable, added $3, Materials Specified
CC:DA report: Kathy Glennan incoming Chair; RSC NARDAC documentation; RDA Toolkit redesign previews issued; June release will be be thin, 1 year duration for keeping the old Toolkit alive; another meeting coming up tomorrow, program on new directions for CC:DA
Metadata Standards Committee: Mike Bolam, evaluated NISO standards about evaluating standards; looked at DPLA metadata application profile; they realized that there are high barriers for the committee to try to make good evaluations of metadata standards; looking to redefine direction, using it as a forum for how vendors and entities are dealing with metadata, including issues of diversity and inclusiveness. Meeting Monday 1-2:30. At annual will have a program: Metadata Experts Panel, addressing current challenges in metadata issues.
Discussion:
Job openings at NYU, University of Utah, 2 positions at Smithsonian Institution
Library of Congress BIBFRAME Update Forum
Sunday, February 11
10:30 AM 11:30 AM
Library of Congress Pilot 2, Sally McCallum NDMSO, LC
Beecher Wiggins not present, only 15 of the original 200 staffers from LC got to come
Pilot at LC is in full swing, after starting out in September
September saw a European BIBFRAME conference; they’re still exploring, just like in the US
BF Pilot, June 2017 start, 1 year period for evaluation
60 catalogers involved, books, serials, maps, music, moving image, rare, sound
Worked on training for Pilot participants in the new input tools, included some RDF training to doublecheck RDF that was generated from their cataloging.
Base file: converted 18 million MARC bibliographic records to BF Works, Instances and Items; 1.2 M uniform title authority records converted to BF Works, merged records into 19.2M Works, 23.7 M Instances
Continuous review and frequent reload of base file based on errors in ingest when detected
New input tool
Editor has efficiency features like dropdowns and lists, hot links to RDA rules; ways to view resulting RDF
Challenges with synchronizing Editor RDF with RDF of converted records; profile elements and MARC derived elements; complexity of RDF.
Want to explore: improving Editor, validating BIBFRAME from MARC; download of BF file; import BF RDF from an external source; experiment with extension profiles; conversion for BF to MARC; ?Works in MARC Bibliographic file rather than Authority file; reduced transliteration in descriptions
Community Explorations: www.loc.gov/bibframe has components: BF vocabulary, MARC to BF conversion.
Folio and BIBFRAME, Sebastian Hammer, President, Indexdata
BIBFRAME IN FOLIO : Thinking about bibliographic metadata in a new LSP
FOLIO a collaboration that began within EBSCO, worked towards developing a community library platform.
FOLIO manifesto: 1) a true platform is like an operating system that can evolve and grow over time 2) an open platform is common property, all have rights to use and improve 3) a true and open platform has the potential to gather the library community broader than any closed system
FOLIO consists of a UI toolkit (Stripes) and API gateway (Okapi)
FOILIO can include other components, e.g. different UIs or different API gateways
All being tested now. To work it needs to deal with a broad spectrum of metadata input. No native BF triple storage yet. Apps looking at things like metadata and resource management and circulation
Modeling includes a Codex that can access metadata in multiple ways
Came up with a data model that uses much of the BF model, uses work, instance, item/holding levels in the Codex, an abstraction, this would allow use of MARC, BF, or just the 40 elements in the base FOLIO model—or some other TBD standard
Not much for libraries yet, but maybe 812 months from an ILS
Alma, Linked data, and BIBFRAME, Amy Pemble, Product Manager, ExLibris
Linked Data Implementation at Ex Libris, Amy Pemble
Has worked with Linked Open Data Working Group (2011)
2017 worked with Harvard in a BF implementation. Now all Alma institutions can publish their collections in BF as of last December.
Can publish their collections in RDA/RDF as well
Linked data APIs provide API endpoints for JSONLD, RDA/RDF BF.
Includes linked data views of RDF, and can display MARC and BF side by side.
Primo can take URIs and provide discovery opportunity from external resources
Alma will continue with MARC, but will search linked data endpoints from within the editor, as well as catalog in BF. Can export data with schema.org elements.
Summon will be linked data aware later this year
Alma can export linkable data, still looking at options for reconciliation, a process that still requires much manual work
Achievements of 2016/2018 LD4P Project, Michelle Futornick, Program Manager, LD4P
Recap of LD4P grant
Focus on metadatacreation model of LD lifecycle
Recap of the component projects looking at nontext formats
Recap of data modeling efforts, partnering with domain expert communities, including Stanford developing the Performed Music Ontology
Example: in ArtFrame/RareMat groups, use case, Find all the resources connected to a specific award across various domains.
Links to models a LD4P.org
Tools development: Developing “editors” that actually are for content creation; comparing BF Editor, VitroLib, CEDAR to come up with desirable features for an editor
Looked at Stanford Trace Bullet project, looking at vendorsupplied copy, showed an implementation that includes MARC and BF metadata; developed “reactive pipeline” which allows for constant updating of content as source material evolves
Community meetings in April 2017, will have another in May 2018.
“We are in our infancy as far as linked data tools.”
Stay tuned.
Developed Biblioportal (bibliographic.ontoportal.org) a catalog of ontologies, an outgrowth of Bioportal, developed for libraries.
LD4P2 “Pathway to implementation” the next outgrowth of this project
Developing LD editor sandbox in collaboration with PCC. Expand on workflows to go fromMARC to BF. Want to build out to develop a selfsustaining model to maintain tools. Identifier management. Enhancements to Blacklight to show off LD components.
BIBFRAME and OCLC, John Chapman, OCLC
OCLC’s Work on Works
Why: works are critical for clustering; there are common use cases; interest arising from ongoing cooperation with OCLC and LC and there’s community interest.
PCC SCS/LDAC Task Group on the Work Entity, published in Fall
Cornell leveraging OCLC Work IDs in Blacklight
Efforts led to new MARC field to record the work identifier
Looking to test Work models. Some models may be less good than others, could be bad data, could be bad modeling
Generated entities feed into an “Entity Ecosystem”
These efforts can add data to improve clustering, using “synthetic authority records”
Concepts of Expression like language for translation are being really important.
These synthetic records, xR records now exist for 807,450 works; 1,151,773 expressions; <1200 manually created records
Manually built xR record can help create a cluster of related expressions, e..g, translations.
Whole/part, series issues can be partially dealt with these records.
LINKED DATA WORKFLOWS centered on entity data
Reconciler: Connect legacy data with LD entities
Minter: Create and edit entity records
Relator: Create and edit relationships
2018 Linked Data Prototypes, as mentioned earlier, it’s a triplestoreplus, is a system that permits interaction with a triple store, searching, for one thing
Future work: develop editing entities ; define an create more relationships; expand MARCbased data mining, series and wholepart modeling
ALCTS/LITA Metadata Standards Committee
ALCTS CAMMS Heads of Cataloging Departments Interest Group
Investigated OCLC’s implementation of FAST in WorldCat in order to understand OCLC’s reasons for developing FAST, articulate value and usefulness related to FAST, understand OCLC’s business model, identify obstacles to continuous development
Started by documenting use cases
Did conference calls, Columbia/Cornell folks plus OCLC people, learned that FAST wasn’t originally envisioned for large libraries, though its faceted nature is better for linked data than topical strings
Developed understanding
Generated a survey, look at usage/uptake/barriers/enhancements to user experience
Initial results to 14 question survey, 586 individual responses
More use of FAST than anticipated
Drawbacks to use may be changing, e.g. topic strings are used more and systems may not support FAST
Perceived benefits mirror “FAST 5’s use cases
Some critical features ID’ed: ongoing maintenance, easy Fast lookup, easier integration into cataloging workflows
Next steps: OCLC says Version 1 in production out this summer, developments will hinge on uptake / Look at more use cases (OCLC with large and British Library)
More analysis ahead, but fairly detailed results are forthcoming this week
Embedding OCLC Work Identifiers: and Example Workflow
Jackie Shieh, George Washington Libraries
GW Works Project, includes:
BIBFRAME experimentation
URI in $0
Authority control—authority control had take a back seat because of keyword searching, but adding links
brought back the value of authority control
Looked to integrate OCLC work identifiers. You can reveal OCLC works in WorldCat
At first didn’t have a place in MARC to hold a WorldCat identifier, and added it manually in 787 $a OCLC work ID $o [#]
Terry Reese at first did screen crawls to harvest the identifiers.
MARC Proposal 2017-09 defined field 758, Resource Identifier
OCLC Work IDs: Highly experimental, frozen since 12/2015, algorithm refinement continuing, IDs may change, if concepts change OCLC may do redirects from one ID to a more modern one, can be harvested via CURL or even web query
GW has 1,010,047 bib items, 1,010,047 work IDs, 823,895 FAST headings / can’t add more IDs because OCLC has frozen the project
GW would really like OCLC to keep going
VIAF pulls identifiers out of MusicBrainz
Bibliography on slide with good resources to get started
SHARE-VDE and BIBFRAME —Linked Data in Real Life
Philip Schreur, Eric Mitchell, Michele Casalini
2011 meeting at Stanford on linked data started many discussions, but still using MARC 7 years later….
Some discussion about MARC, how it started in the 1960s, MARC does things well but it’s limited
Reasons to leave MARC: It started out to re-present catalog card data, little semantic meaning in the structure. Problems with internal linkages in compilations.
Why linked data: We can join our patrons who are going to the web for information, linked data’s semantic structure can interact better with the web, can use tools like Blacklight to better synthesize and present data, take advantage of international library data, we can make the move to the web
SHARE-VDE: focused on cataloging workflows based on linked data
Casalini developed SHARE-VDE Project, key element with shared community data
Berkeley and the SHARE-VDE project
Working on Phase 3 of the project, ca 10 libraries are involved
Open community-scale, conversion, aggregation, access, working at the computation scale, not the human scale, ending barriers to adoption
Phase 1 started with small piles of records as a proof of concept
Phase 2 was doing the above at scale, included computer aggregation
Phase 3 required technical developments, developed use cases:
Publish all data sets on the platform, edit them, and have some tools to manage the dataset / SHARE-VDE should support batch or automated updating / support dissemination of data to contributing libraries automated, batch / Libraries should be able to employ the data in SHARE-VDE / Share-VDE should allow us to move to network-level editing (complex, metadata provenance issues?)/ Support library needs, somewhat an undefined point / support the ability to do cataloging activities using 3rd part tools
Michele Casalini:
ALIADA predecessor
Phase 1 of SHARE_VDE included data created under different rules
Phase 2 included enrichment of MARC records with URIs
Heading towards embedding multiple ontologies, not necessarily BIBFRAME
Displayed workflow of the components of the processes involved in conversions, including RDF conversion
Identity identification: Defining on the resource and the associated entities, pulling from VIAF ; automated reconciliation is important for conversion, manual for original description / Authify—service for “Relator term detection”
Manual workflows: Includes URI management system, with local library profiles, looking at issues of provenance
RDF conversion includes ALIADA tool
Michele Casalini:
ALIADA predecessor
Phase 1 of SHARE_VDE included data created under different rules
Phase 2 included enrichment of MARC records with URIs
Heading towards embedding multiple ontologies, not necessarily BIBFRAME
Displayed workflow of the components of the processes involved in conversions, including RDF conversion
Identity identification: Defining on the resource and the associated entities, pulling from VIAF ; automated reconciliation is important for conversion, manual for original description / Authify—service for “Relator term detection”
Manual workflows: Includes URI management system, with local library profiles, looking at issues of provenance
RDF conversion includes ALIADA tool
Some questions from the session:
$0s are keyword indexed in Voyager; $0 in Sierra is not indexed
Is SHARE-VDE in a state where data can be extracted? Phase 1 results, yes; Phase 2, no, more difficult to achieve.
Some questions from the session:
$0s are keyword indexed in Voyager; $0 in Sierra is not indexed
Is SHARE-VDE in a state where data can be extracted? Phase 1 results, yes; Phase 2, no, more difficult to achieve.
Issues with metadata provenance and priority
Look at share sharevde.org
Philipp: LD4P2 starting later this year, hope to include SHARE-VDE work in this phase
How far are we from 3rd party tools that can be used for workflows: Not there yet; LC and CEDAR are 1st-gen tools that maybe someone will improve before too longer; PCC Sandbox could be a nice way to develop editors
ALCTS Forum: The Case for Making Video Content Accessible
Monday, February 12
10:30 AM – 11:30 AM
Villains Iglesias, GVPI, Director Business Development
Speaker is with a tech firm that has the product ElementsPlay, a streaming media component to help media be more accessible online.
Started with a survey of institutions as far as what they wanted, accessibility was a big item. GVPI trying to get publishers to supply more accessible content. They are also working on how to make this work more discoverable.
Accessibility has many components: Closed captions (absolute minimum), accessible player (must support many needs such as screen reading, keyboard shortcuts, transcripts, audio descriptions
Reasons to make accessible: Legal mandates, wide audience includes 5% of the population with hearing loss, user experience (e.g. closed captions to help screen out background as well as helping people understand on-screen language), increased user engagement
Open web discovery (Closed-caption items are higher in a Google search result), full text will be searchable, reusable content if marked up semantically
Hurdles: heavy on tech
Upside: far-reaching benefits, ROI
ElementsPlay: video solution for publishers
Has video with/without closed captions, transcripts, ways to move the content around on screen.
Danielle Whren Johnson, Loyola University
Accessibility and Video Captioning at Loyola Norte Dame Library
Discussed campus decision-making to go towards making things more accessible, both carrot (legal) and stick (benefits to the whole person, not truly educating the student)
They have hosted videos, expect that 3rd party content has accessible features
Issues with library-hosted content, looking to see if creators have things like script
Library create content: working for that to be accessible
Campus created content: often need to add accessible content
Captioning tool: Amara
Original version is open access, so the community can view and contribute content
Hosted content that integrates Ensemble Video with Amara. They pay for captioning services. $1/hour for transcripts from YouTube automatic captioning.
Preservation and Accessibility in Library Media Collections
Stefan Elnabli, UC San Diego
We have an ethical mandate to make content accessible? How does it overlap with media collections?
Talking about a project to reformat media titles and the accessibility concerns in a digital repository. Landscape: Office for Students with Disabilities, Library Digital User Services Program; has media in multiple formats; showed Kanopy resource which has accessibility features; digital media reserves, JW player used for the workflow and has a way to accept uploaded files for captioning. Digital collections: currently can deliver complex objects (e.g. file plus a transcript) but without syncing one resource to the other.
VHS Migration Project: A way to reformat VHS-carrier materials. Replaces VHS with DVD when purchasable (90%), replaced with similar versions of same resource (5%) in-house digitizing of remainder (5%). Section 108(c) of the Copyright Act describes situations when we can make copies.
VHS Migration Project included VHS with CC, that we’d like to port over. The conversion process has an uncompressed master that needs to be transformed into CC on DVDs. Working with vendor who will deliver digitized content with a caption file.
QUESTIONS: How do we get buy-in? Library works with accessibility office. It’s important to get IT on board.
Does anyone have experience generating transcript files from extant transcripts? There are ways to add time syncing into a text.
RDA Linked Data Forum
Monday, February 12
1:00 PM 2:30 PM
(This session will be recorded and made available online)
nonRDA is the new nonMARC : Bibliotek-o and RDA
Steven Folsom, Cornell
Bibliotek-o is a way to look at BIBFRAME functionality and look at alternatives
Bibliotek-o will hope port into BIBFRAME
BF is limited in workwork relationships / Bibliotek o reuses RDAu to fill in that gap, look at Relations Pattern document for more detail
B:Activity: roles as activity? An established ontology design pattern / loose adherence to RDA
Content and Carrier subclasses of bf:Work / bf:Instance
As we move out to non RDA data sources we need to worry about what benefits from RDA are to be looked out for
Rob Sanderson LOUD Linked Open Usable Data, looking at the range of tools versus richness and usability.
We’re colored by expressing RDA within MARC. We can look at areas that still need to be enriched in RDA when we try to express things.
No model is perfect. Hope to bridge MARC and nonMARC with RDA.
RDA and Linked Data
Kathy Glennan, ALA Representative to NARDAC, University of Maryland
Background: RDA is a bundle of standards and use guides, not just a content standard
RDA and linked data goes back to 2007, formed DCMI/RDA Task Group
First RDA vocab OMR in 2011
RDA Registry launched 2014
RDA moving towards a data dictionary
Different pieces for different types of users: RDA Reference, Toolkit, etc.
RDA Reference stored in OMR in RDF linked data format, primary source o RDA Toolkit, includes translations
OMR Example — Form of Work
RDA Vocabularies is RDA Reference export to a Git Hub repository
RDA Registry provides links to download the individual element sets in the current release of RDA Vocabulary’s
RDA Vocabulary Server
RDA Toolkit bundle of RDA
RIMFF visualization and cataloging and prototyping tool to show what is possible
RDA Implementation Scenarios
Flat file, card use to Linked authorized access points
James Hennelly, RDA Toolkit, for Kate James, RDA Examples Editor
Examples in RDA Toolkit with allow displays
Basic set (as now)
Recording methods
View in Context
View as Relationship, a linked data visualization
The linked data visualization is a graphic representation of relationships
DITA stores the information that is visualized
Questions:
Kathy talked a bit more about RIMFF capabilities, a great way to train on RDA but is limited for other possible purposes