Some Meetings of Interest on MARC and Other Encoding Standards
Notes from ALA Midwinter 2016
Submitted by Jim Soe Nyun, Chair, Encoding Standards Subcommittee, February 1, 2016
Most if not all of these sessions will issue formal notes. Below are notes from me as a participant or observer, jotting down many of the highlights.
MARC Advisory Committee (MAC)
These sessions are advisory to the MARC Steering Group. If MAC’s recommendations are approved by the Steering Group they may take slightly different forms from what was discussed at MAC.
The two sessions of the committee meeting examined two formal MARC proposals and sixteen discussion papers. MLA had one of the papers and three of the discussion papers:
Proposal No. 2016-02: Defining Subfield $r and Subfield $t, and Redefining Subfield $e in Field 382 of the MARC 21 Bibliographic and Authority Formats (http://www.loc.gov/marc/mac/2016/2016-02.html):
Discussion Paper No. 2016-DP01: Defining Subfields $3 and $5 in Field 382 of the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp01.html):
Most discussion centered on the typical North American uses of $3, and revealed its use both as a sort of eye-readable label, as descriptive metadata, versus its use to associate two or more MARC fields, as structural metadata. Subfield $8 is defined in MACR21 for structural uses, but it has not caught on in North America, nor is it validated for use in many OCLC fields. MLA was not asking for any uses of the $5 different from what would be encountered in other MARC fields. In the end MAC recommended that the paper be developed into a formal proposal. Aside: Jay Weitz revealed that later this year OCLC plans to validate $8 in all the fields where the MARC format defines it for use.
Discussion Paper No. 2016-DP02: Clarifying Code Values in Field 008/20 (Format of Music) in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016- dp02.html):
Three commenters suggested that we define a further code, maybe “p,” for piano scores instead of lumping it into “z” for “other.” As this was a topic of discussion in the working group that wrote the paper, this should be something we can accommodate easily in a formal proposal. Approved by MAC to go ahead as a proposal.
Discussion Paper No. 2016-DP03: Recording Distributor Number for Music and Moving Image Materials in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp03.html):
MLA secured OLAC’s co-sponsorship of this paper. At the meeting OLAC noted that this would lead to some extra work on the part of video catalogers, but they were willing to support this paper. Recommended by MAC to develop into a proposal.
Some other things of direct MLA interest:
2016-01: Coding 007 Field Positions for Digital Reproductions of Sound Recordings in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-01.html):
This Canadian proposal was approved with a minor change to the definition of “sound recording.” MLA and the British Library advanced alternative definitions in advance of the meeting, and the CMC representative preferred the BL definition. MAC agreed to use the BL wording.
2016-DP06: Define Subfield $2 and Subfield $0 in Field 753 of the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp06.html):
This OLAC/GAMECIP discussion paper was prompted in the room directly to proposal status and approved. It described needs for subfields $0 and $2 in MARC field 753 to record URI and source for computer system access details. It was proposed with the gaming community in mind but will have wider applicability.
2016-DP07: Broaden Usage of Field 257 to Include Autonomous Regions in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp07.html):
This OLAC- sponsored paper asked to broaden the use of field 257 for certain entities like Hong Kong and Palestine. It was well-supported and will return as a proposal.
2016-DP08: Remove Restriction on the Use of Dates in Field 046 $k of the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp08.html):
This OLAC- sponsored paper identified a confusion as to why the current format forbids the use of this subfield when the date appears in other fields. MAC approved bringing this back as a proposal.
2016-DP10: Defining Field 347 (Digital File Characteristics) in the MARC 21 Holdings Format (http://www.loc.gov/marc/mac/2016/2016-dp10.html):
CONSER identified a need to record differences in digital file characteristics for materials cataloged according to the provider-neutral guidelines. MAC agreed that this should return as a proposal.
Discussion Paper No. 2016-DP15: Media Type and Carrier Type in the MARC 21 Authority Format (http://www.loc.gov/marc/mac/2016/2016-dp15.html):
This German proposal was not embraced by many in the room but raised some interesting points. There could be some cases where a work may have an authority record but manifestations may no longer exist, for instance a historically-important holograph that has been lost over time. Lacking a physical manifestation the authority record could record aspects of the original carrier, when known. (A participant noted that the FRBR-LRM work is developing a branch to the FRBR tree for unique resources like manuscripts, so that they would no longer need to artificially conflate manifestation and item. Though not exactly the same thing the discussion paper echoes that development in the FRBR model.) Based on the ambivalent response, the DNB may or may not opt to bring this back as a formal proposal.
MAC will issue formal notes for the meeting that will cover the remaining topics.
OCLC linked data round table: Stories from the front
GND and URIs: slides up at: http://www.slideshare.net/sollbruchstelle/gnd-and-uris-integration- and-identification
Kirk Hess, LC update
Spoke about technical issues related to LCs BF work. They have a BF editor in 0.3 preview mode; it has REST hooks, look different from 0.2. He does work on both the editor and converter. LC will be supplying a SPARQL endpoint for the BF test of the 1.0 vocabularies (ca. 900 items cataloged). The work is creating “lots of blank nodes with fakes URIs.”
OCLC Person Entity Lookup Pilot, (I believe the presenter was Stephan Schindehette)
Began October 22. Seven partners: Cornell University, Harvard University, the Library of Congress, the National Library of Medicine, the National Library of Poland, Stanford University and the University of California, Davis. Works with inputs of identifiers and return piles of sameAs information in Phase One. Phase Two in December allows a search on name string. Future will include possibilities for fuzziness. At Cornell, working on how to use external links, eg, for disambiguation; uses Blacklight to do things such as facet. Uses OCLC work IDs to cluster in current sprint. Uses real-world-object data from record in display on an,e info. Entity pilot populates disambiguation display with information from linked records, including images associated with an entity (e.g. a portrait). Entity pilot API. Entity maintenance still limited. Wish list…editing based on querying, human interface to data
Jeff Mixter, OCLC research engineer
Presented on work related to OCLC’s EntityJS project. Working with concepts similar to Google’s Knowledge Vault, which draws information from myriad sources and gives them a confidence rating, adapting the model to bib and authority data or user contributed like wiki data. They are working with Worldcat, VIAF and FAST, no priors, going directly to confidence-scored triples. You can harvest text, not just linked data with this method. Generates enhanced MARC records with URIs and moves to knowledge vault of triples. The project began with ArchiveGrid data. Has knowledge card with piles of associated information from many LD sources. Has links out integrated into user interface. Some resources don’t have links. Users can pick from link suggestions from Wikipedia. Needs to push into data verification, but user contributions are currently cached. Look out for a pilot about to be released.
Questions….GND developed new seems not MADS ontology. Language issues with resolving opaque things like URIs.
Question on same as and pseudonym relationships. Hard, develop ways to assert more subtly than same as.
ALCTS Metadata Interest Group
Began with a business meeting, which included a report from the MLA liaison (me) about various metadata goings-on in MLA, with a focus on work within the Encoding Subcommittee and the BIBFRAME Task Force.
Some points from the business meeting:
- There will be a meeting at annual and a program, Diverse and inclusive metadata: Developing cultural competencies in descriptive practices’ Topics could include: Strategies for evaluating inclusivity or exclusivity of metadata; Tools and educational resources for developing inclusive metadata; Strategies for working with diverse communities.
- There will be a virtual preconference on metadata automation.
- The group has a listserv that most didn’t know about.
- At ALA Annual the Interest Group will have three officer openings.
The above was followed by two presentations
Presentation 1: Kevin Clair and Jennifer Liss on the draft Principles for Evaluating Metadata Standards (http://metaware.buzz/2015/10/27/draft-principles-for-evaluating-metadata- standards/)
- Developed a year ago a draft checklist for evaluations metadata standards, and has morphed into this document
- The drafters mean for the document to apply to all manner of metadata, including things like RDA
- Will look at these to help them develop a framework for looking at standards that might be evaluated by the Metadata Standards Committee. Includes looking at vocabularies.
- Comments include privacy, cultural bias, questions about serializations (ignored intentionally), how to communicate this to developers
- Thinking about evaluating VRA Core in RDF using this template (at the later meeting of the Metadata Standards Committee, thoughts were that VRA Core would be too large and complex to test the standards on)
Presentation 2: Emily Gore, Digital Public Library of America, on standardized rights statements
- Rightsstatements.org in development, end of February for release
- …background on DPLA…many states and regions being added
- A requirement for DPLA resources is a rights statement
- Current statements are all over the place and aren’t necessarily statements stating usage
- Only 5% open access, plus 3 Creative Commons, plus others that are open, but unclear
- Want to get things labeled, but also those that are correct representations of legal rights
- Looking at Europeana Rights statements and their processes for rights, ca. half of their
resources are in public domain or able to be reused
- Working to map DPLA model with international rights standards
- Some white papers already up on site
- The rights statements will be in broad categories: In copyright, no copyright, unknown.
(Europeana also address orphan works in response to EU laws)
- Several categories of statements in each category. NOTE: These are not licenses!
- Includes both statements and statement codes
- Rights statements plus CC licenses can be used in conjunction to identify rights
- Will roll out training soon
- See current versions: Bit.ly/rights-data-model, available in SKOS/Turltle
- Will provide guidance on how to employ these statements
- DPLAfest in DC in April
- Lawyers have said DPLA should NOT take on transforming rights statements
- Accuracy is the hardest part…it’s hard where campuses don’t have lawyers…mention of
some tools developed to assess risk
- No work yet to incorporate changes in copyright status. Future versions may include
automatic status changes, e.g. when copyrights will expire, something lighter than W3C model
Library of Congress BIBFRAME Update
Presentation slides: http://www.loc.gov/bibframe/news/bibframe-update-mw2016.html
Sally McCallum introduced program and presenters.
Beacher Wiggins on LC BIBFRAME pilot
- BIBFRAME pilot is launched, a little bumpy to start, problems with converting ten
million records for acquisitions to search against, also with editor.
- Materials from their test are available online: http://www.loc.gov/bibframe/tools/
- Key element is testing of BF editor.
- Paul Frank worked on applying labels according to content standard.
- Testers are working in both encodings, BIBFRAME and MARC, partly because there’s no
good BF to MARC conversion.
- Assessing what training needs to look like, including depth of RDf knowledge necessary.
- Metadata for about 900 records done so far.
- This first test will go on for 1-2 months more
- They will work next on further developments with assessing model and vocabulary.
- Likely 1-2 years longer for the pilot
Sally McCallum on BF vocabulary development
- Current LC BF test is working with BF vocabulary 1.0
- Papers with proposals for vocabulary modifications sent out for comment
- Also looked to the BIBFRAME AV Modeling Study discussion paper for directions
- Received expert reviews of the BF 1.0 vocabulary from Carol Jean Godb From OCLC and Rob Sanderson
- Version 2.0 includes many changes. Many things have moved to classes from properties.
- They are working on modeling events. Holdings annotation moving to item level in model.
- Some issues: BF “Authority” unpopular and needs changing…there are RDF issues, including distinguishing data type and object properties…distinguish type by class… move towards incorporation external vocabularies or enable future adoption separate administrative from descriptive metadata…looking at adapting RDA rules about naming properties where they don’t make sense
- Ahead: BIBFRAME 2.0 pilot, tool reengineering, participant in upcoming LD4P grant, adding AV media, did follow up AV Preserve report which revealed that many things in MARc aren’t useful to the user, looking at PREMIS
Tiziana Possemato, Casalini Libri, vendor approach to BF data
- Discussed using the ALIADA (AUTOMATIC PUBLICATION UNDER LINKED DATA PARADIGM OF LIBRARY DATA) framework (http://www.atcult.it/en/progetti/aliada/) to provide linked data elements in vendor records
- They can now include identifiers in heading strings
- Discussed conversion of data to RDF via ALIADA framework, a collaboration between
- They can use many ontologies, including BIBFRAME
- Can create FRBR/BF layer from Bib and authority records, work cluster and person
cluster with links to instance titles
- Personal cluster includes variant names and associated works
- BIBFRAME-UP three layer architecture: person works layer, instance layer, item layer.
Overviews of three projects that will be included in the upcoming Linked Data for Production (LD4P) grant
- This development out of the Linked Data for Libraries (LD4L) pilot has been submitted to the Mellon Foundation, and a positive response is anticipated. The following are components of a much larger grant effort. One projected component will involve MLA working with PCC, ARSC to develop the BF ontology for use for performed music, but also to develop a community model for
Jennifer Baxmeyer, Princeton, working on BF annotations, “De-framing Derrida”
- Jacques Derrida’s 16,000 title library went to Princton. He annotated many of his books, and many have inscriptions
- This LD$P project would work to encode annotations and make available to a scholars, adapting and extending the BF holdings model
- Will deliver RDF dataset for Derrida annotations
Melanie Wacker, Columbia, on Art and Museums
- Many libraries own art, which may be described in MARC.
- Many sites have both libraries and museums and have their own descriptive methods
- Looking to see if BF can bridge the gap.
- Source is spreadsheet with art-focused description and controlled vocabulary.
- Will test work on set of 112 art objects in various formats.
- Various art issues in art description.
- Did lit review which will be in final report.
- Focus on data modeling, tools, developing use cases, developing art parts of BF 2.0
Chiat Naun Chew, Cornell, rare books, hip hop LPs
- LD4L moving to LD4Labs and LD4P projects.
- Has worked already with VIVO, special collections in hip hop.
- Has interest in extending BF, looking authorities alternatives, creating RDF natively.
- One branch of the project would work on the Afrika Bambaata LP collection. Original cataloging of LPs may include annotations.
- Work on linking hiphop flyers to discs in RDF.
John Chapman, OCLC update
Discussed many OCLC activities, including:
- OCLC is working on BF modeling, developing production services, visualization, working with LC.
- Completion of Common Ground paper contrasting BF and Schema.org.
- Talk of BF moving to work, manifestation, item model.
- WorldCat person entity pilot, API sends one ID and gets back sameAs information, users can enter text, information returned also includes data like dates..
- They have demo web apps that show what APIs can do.
- Mentioned Entity.JS showing identities management based on an ArchiveGrid dataset; will include ways for users to edit.
Eric Miller, Zepheira update
- Presented ways to expose current metadata to potential future patrons.
- Works with BF framework to move to web exposure with unitary intersection of multiple vocabularies.
- Libhub initiative to publish much content, working on exposure, SEO.
- This program is an option for those who may not have infrastructure to carry this out.
LITA/ALCTS Metadata Standards Committee
Good, full notes up at: http://connect.ala.org/node/249183. Digested version below, and also for those who can’t access the link.
Discussion of Principles for Evaluating Metadata Standards (http://metaware.buzz/2015/10/27/draft-principles-for-evaluating-metadata-standards/)
The group acknowledged that the document started out life closer to “manifesto.” A stated purpose of this committee is to provide feedback on metadata standards and vocabularies, and this document has morphed into a tool that can be referenced when carrying out evaluations. There was initially some confusion about the purpose of the Principles. An introduction will aim to clarify their use.
Several early comments have been posted to the Metaware site and include a thorough analysis and comments by Diane Hillman (linking to her blog at http://managemetadata.com/blog/?p=470).
One early commenter emphasized issues of personal privacy, as did a commenter in the room, referencing how traditional authority work guidelines limit how much information is recorded about a living person, similar to the French National Library’s practice to hide personal information. The committee agreed that privacy was important and will consider how to incorporate it into the Standards.
Openness came up in several comments. Openness can include freely licensing or sharing metadata, but it can also pertain to the whether documentation for a standard is clear and enables its use, and even to the language of the documentation itself, which might exclude those who do not know the language (a case of both openness and possible bias). The committee admitted that it needs to clarify exactly what it would like to stress.
Another commenter pointed out the term “network” as a term whose meaning the authors took for granted.
Diversity issues surfaced at the morning’s ALCTS Metadata Interest Group and they also emerged here, where they received a fair amount of attention. The upcoming ALA would provide a chance to hear more about diversity issues, but it would come too late for the projected development cycle of the Principles. Perhaps one or more of the speakers could be pulled into looking at the draft?
An aim for these guidelines emerged in the discussion, that they would point out ways standards and vocabularies could be better. They wouldn’t use a pass/fail model.
The Standards would strive to re-purpose good work already out there rather than try to reinvent it. They could use Tim Berners-Lee’s 5 Star Linked Data Principles. And instead of developing a glossary of concepts used in the Principles they could point to existing definitions.
Do the Principles apply only to linked data standards and vocabularies? Consensus was that they could be used broadly, not just for metadata on the web.
Where to go next? Develop a checklist of principles? Consensus was that this was too much to take on immediately. However, reviewing a metadata standard against the Principles might lead organically towards something approaching a checklist, something that could be developed in the future.
The formal comment period will remain open for a short time longer. The plan is to prepare formal responses to the comments as well as incorporating ideas into the Principles. This work should be completed by ALA Annual.
Testing the Draft Principles: applying the draft document to a standard: The VRA Core 4.0 RDF Ontology was mentioned in the earlier Metadata Interest Group meeting, but the committee began to think it might be too complex. The DPLA rights statements might be a more manageable—and still meaningful—target. It’s a vocabulary and not a schema, but would still be useful. Definitely staying away from BIBFRAME for now.
Timetable: Aiming for a final Principles document by ALA Annual. Also a test of at least one metadata standard or vocabulary using the Principles.