Encoding Standards: ALA Annual Report 2016

Notes on meetings at ALA Annual, 2016 in Orlando related to MARC and encoding standards

Prepared by Jim Soe Nyun, Chair, Encoding Standards Subcommittee of the Music Library Association’s Metadata and Cataloging Committee

Submitted July 31, 2016

Below are notes for the LC BIBFRAME Update Forum and the sessions of the MARC Advisory Committee, as well as several meetings of more general interest to those interested in metadata encoding issues.

LC BIBFRAME Update Forum
Sunday, June 26, 10:30 a.m.-noon

BIBFRAME Pilot / Beacher Wiggins

Report on BF 1.0 pilot
Not tested: end user impact, holdings, acquisitions, description distribution
Did not address the impact of BF on production
Converted 13.8 million records from MARC to BF 1.0, goal of 18 M
Now have over 2K examples in sandbox of records created
No workflow assessments made, maybe in the next phase
Will discard sandbox data with 1.0 vocab once the 2.0 pilot is underway
2.0 Pilot no sooner than October, as late as January 2017
Check out fuller assessment on COIN and other LC sites.

BIBFRAME Vocabulary 2.0 / Sally McCallum

Froze 1.0 vocabulary in 2014 to prepare for pilot and collect comments
Issued proposals for how to change the BF model
2.0 is similar but has Work/Instance/Item (Item added as a core class)
Annotations got complicated for item-level descriptions, hence Item class.
They have a commitment to RDF and its principles, support distinctions between data type and object (resource, URI), enable URI, label or both
Holdings annotation has migrated into Item core class
Authority has migrated to Agents and concepts
Better accommodation for URIs and labels
Talked about MARC issues, different approaches for different formats, very complex, much duplication
Working on converter, wanting to provide conversion service and share it with the cataloging community
Needs to upgrade MARCLogic platform with 4Store triple store
Need to move id.loc.gov, 2M hits a day, do without taking it down, it’s also an integral part of how the Editor works.
Their office will consider bringing in ongoing MARC work into BF store, (would require crosswalk)

Linked Data for Production: Performed Music Ontology / Nancy Lorimer

Report on LD4PM project, its participants, its relationships
Discussed project makeup of representatives of MLA, ARSC, LC and PCC
Music presents many difficulties in MARC, hopefully BF will do us better
Goals, evaluate BF ontology for performed music, including use cases; work up new ontology.

Linked Data for Library Cartographic Materials / Scott Wicks, Harvard

Project included in Harvard Library ITS Multi-year Business Plan
Reprioritizing includes experimenting with metadata technologies e.g. LOD
Harvard is involved with both Linked Data for Labs and Linked Data for Production LD4PM-CM (Cartographic Materials) : to develop best practices for maps and cartographic resources
At Harvard they’ll have catalogers test the processing of maps and geospatial resources
Harvard has a parallel project for Moving Image materials
Marc McGee, geospatial contact
Christine Eslao, moving image contact

The Library.Link Network / Eric Miller

Weaving libraries into the web, pushing libraries out onto the web. Essentially this project is an outgrowth/expansion of the earlier LibHub.
2014 libraries and vendors want to get out on the web, Eric Miller devised BF Lite, branching out from DC.
“Libraries are more than a catalog.”
Linking first, not focused on language/vocabularies, deemphasizes standards like BF.
“Beyond SEO” to working relationships.
Work on Creative Commons and develop attributions to sources of metadata discovered on the web?

OCLC Collaborations / Jean Godby

Repurposing legacy data: Developing ways to describe entities and relationships.
Looked at what to do with MARC right now: Use UTs, supply roles, use indicators, use 041 for translations, use indicators.
What not to do: one thing is to avoid using free text; if you must, use use standard vocab.
PCC-URI task group looking at how to add URIs to MARC records to make them transform more easily into linked data.
Extending the scope of authority control: e.g. register researchers to assign them identifiers or add them to authority files; develop national strategy for shareable local authorities .
Defining New Models of Creative Works: UTs are a subset of works that exist; working on identifiers for more works.
OCLC will provide expert BF feedback; look at Place in a bibliographic description; reconcile Work identifiers.

MARC Advisory Committee (MAC)
Session 1: Saturday, June 25, 8:30-10:00 a.m.
Session 2: Sunday, June 26, 3:00-5:30 p.m.
View LC’s notes from the meeting at: http://www.loc.gov/marc/mac/minutes/an-16.html

The 25 discussion papers and proposals on the agenda made this the heaviest agenda for MAC in recent memory. A number emanated from the German National Library as developments out of reconciling data for the German Integrated Authority File, the GND.

One general–and very welcome–change in the workings of the committee is that straightforward requests for changes to the MARC formats may be accommodated as part of a new MARC Fast-Track process. These would be handled internally by the MARC Steering Committee and would not need to go through the formal discussion paper/proposal process.

MLA presented three formal proposals for changes to the MARC format. The third was also co- sponsored by OLAC

MARC Proposal No. 2016-07, Defining Subfield $3 in Field 382 of the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-07.html)
MARC Proposal No. 2016-08, Redefining Code Values in Field 008/20 (Format of Music) in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-08.html)
MARC Proposal No. 2016-09, Recording Distributor Number for Music and Moving Image Materials in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-09.html)

OLAC’s representative asked about the absence of $5 in Proposal 2016-07, and thought it might have been useful. MLA responded that there had been some resistance to it at the first MARC review stage at NDMSO, and that use cases ultimately were far less compelling for it versus the $3.

One reviewer commented on the third proposal that none of the examples showed notes generated from the second indicator “2.” This was not an issue for others in the room, though it might be an opportunity for a slight change before the change goes to press as part of the MARC format (TBD, if the MARC Steering Committee concurs).

In the end all three papers were approved unanimously by MAC.

Proposal 2016-03, Clarify the Definition of Subfield $k and Expand the Scope of Field 046 in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-03.html)

Approved, with two changes to proposal text:, substituting “B.C.E.” for “B.C.” in discussion of early dates in section 4.2; and simplifying the proposed $k definition.

Proposal 2016-04, Broaden Usage of Field 257 to Include Autonomous regions in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-04.html)

This proved one of the most controversial topics on the agenda. The British Library presented the strongest objections to broadening the field’s definition to include “autonomous regions,” citing a library’s need to remain sensitive to geopolitical developments and needing to stay as neutral as possible and not declare that a contested region could be called “autonomous.” A suggestion from the floor was to permit usage of the 257 for general regions, not just those defined as autonomous. Some supported this idea, but OLAC did not, stating that their original intent was to broaden the field’s use only slightly, to accommodate a very limited number of contested regions with significant filmmaking traditions (Hong Kong and Palestine, e..g) and that catalogers might suddenly begin supplying “California” or “Hollywood.” OLAC will reconsider the objections and possibilities and return with a revised proposal for Midwinter.

Proposal 2016-05, Defining New X47 Fields for Named Events in the MARC 21 Authority and Bibliographic Formats (http://www.loc.gov/marc/mac/2016/2016-05.html)

Robert Bremer of OCLC introduced the topic described how it grew out of the dissatisfaction during FAST conversions of lumping named events into field 611, which otherwise is used for meetings or other time-based events where there are responsible agents. VRA suggested also adding $n to be used for dates associated with an event, though there was not consensus. Ultimately the proposal was approved as submitted.

Proposal 2016-06, Defining Field 347 (Digital File Characteristics) in the MARC 21 Holdings Format (http://www.loc.gov/marc/mac/2016/2016-06.html)

One comment that $8 in the proposal might not have a use case, though no consensus to change. Passed as written.

Proposal 2016-10, Punctuation in the MARC 21 Authority format (http://www.loc.gov/marc/mac/2016/2016-10.html)

Clarification that this proposal addresses terminal punctuation only. Passed as written.

Proposal 2016-11, Designating Matching Information in the MARC 21 Bibliographic and Authority Formats (http://www.loc.gov/marc/mac/2016/2016-11.html)

Discussion led to clarified definitions for $a, $c, and $d, and the addition of new subfields $x for non-public note and $z for public note. (Well-described in LC’s notes.) Vote was not unanimous: 10 for, 1 against, 7 abstentions.

Proposal 2016-12, Designation of a Definition in the MARC 21 Authority format (http://www.loc.gov/marc/mac/2016/2016-12.html)

Accepted British Library’s rewording for $a and added $u; field definition also to be adjusted. Approved with several abstentions.

Proposal no. 2016-13, Designation of the Type of Entity in the MARC 21 Authority Format (http://www.loc.gov/marc/mac/2016/2016-13.html)

Discussion led to making $a and $b not repeatable. Approved.

Discussion Paper No. 2016-DP17, Redefining Subfield $4 to Encompass URIs for Relationships in the MARC 21 Authority and Bibliographic Formats (http://www.loc.gov/marc/mac/2016/2016- dp17.html)

The British Library and the PCC Task Group on URIs in MARC submitted this discussion paper, which ultimately did lead to much discussion and no real resolution. There are valid needs to convey relator information (agent-to-resource) and relationship information (resource- to-resource) and our current systems using $4 and $0 in various fields is inconsistent. The paper authors were asked to develop their ideas further and hopefully return with another discussion paper, one providing a more comprehensive treatment of relationship identifiers across the MARC formats.

Discussion Paper No. 2016-DP18, Redefining Subfield $0 to Remove the Use of Parenthetical Prefix “(uri)” in the MARC 21 Authority, Bibliographic, and Holdings Formats (http://www.loc.gov/marc/mac/2016/2016-dp18.html)

In discussing whether to approve no longer requiring us to prepend “(uri)” to URIs, the committee balanced the internal inconsistency that would be created within MARC, versus the difficulty of recording URIs in a form that would require text-manipulations to make them machine-actionable. Opinions weighed heavily towards removing the requirement to add the “(uri)” label, and this discussion paper was advanced to a proposal and approved. Some minor rewording to the $0 definition will be made.

Discussion Paper No. 2016-DP19, Adding Subfield $0 to Fields 257 and 377 in the MARC 21 Bibliographic Format and Field 377 in the MARC 21 Authority Format (http://www.loc.gov/marc/mac/2016/2016-dp19.html)

Advanced to a formal proposal and approved.

Discussion Paper No. 2016-DP20, Recording Temporary Sublocation and Temporary Shelving Location in the MARC 21 Holdings Format (http://www.loc.gov/marc/mac/2016/2016-dp20.html)

Will return as a proposal, with the likely additions of subfields to parallel the structure of the existing 852 in the holdings format.

Discussion Paper No. 2016-DP21, Defining Subfields $e and $4 in Field 752 of the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp21.html)

Advanced to proposal and approval with the refinement of the definition of $2, essentially replacing “term” with “geographic name.”

Discussion Paper No. 2016-DP22, Defining a New Subfield in Field 340 to Record Color Content in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016- dp22.html)

Approved to return as proposal, with further developed subfields, and with the suggestion that we develop externally-developed and -maintained vocabularies. NDMSO will work with RSC on this.

Discussion Paper No. 2016-DP23, Adding Subfields $b and $2 to Field 567 in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp23.html)

Generally positively received, to return as proposal that incorporates and develops several suggestions.

Discussion Paper No. 2016-DP24, Define a Code to Indicate the Omission of Non-ISBD Punctuation in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016- dp24.html)

Two changes to this paper were made on the floor and it was advance to a proposal and approved. To accommodate legacy data the definition for the [blank] value will be left alone, and only the new value “n” will be added to allow a user to indicate that non-ISBD punctuation has been used.

Discussion Paper No. 2016-DP25, Extending the Encoding Level in the MARC 21 Authority Format (http://www.loc.gov/marc/mac/2016/2016-dp25.html)

The aims of this paper can be realized through defining new codes in the MARC authentication list. No changes to MARC required.

Discussion Paper No. 2016-DP 26, Designating a Norm or Standard used for Romanization in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp26.html)

The issues were sufficiently clear and compelling that DNB and LC are asked to develop a proposal for MAC.

Discussion Paper No. 2016-DP 27, General Field Linking with Subfield $8 in the Five MARC 21 Formats (http://www.loc.gov/marc/mac/2016/2016-dp27.html)

The committee preferred Option 2 presented in the paper. Approved as a proposal.

Discussion Paper No. 2016-DP28, Using a Classification Record Control Number as a Link in the MARC 21 Bibliographic Format (http://www.loc.gov/marc/mac/2016/2016-dp28.html)

Will return as a proposal. Some suggestion that the authors also examine incorporating field 053.

Discussion Paper No. 2016-DP29, Defining New Subfields $i, $3, and $4 in Field 370 of the MARC 21 Bibliographic and Authority Formats (http://www.loc.gov/marc/mac/2016/2016- dp29.html)

Some comments were offered and this will return as a proposal. Time was tight, due to the length of the agenda, and otherwise this might have been advanced to a proposal after talking through the suggested changes more.

Discussion Paper No. 2016-DP30, Defining New Subfields $i and $4 in Field 386 of the MARC 21 Bibliographic and Authority Formats (http://www.loc.gov/marc/mac/2016/2016-dp30.html)

There was support for this paper, and the committee responded to the paper’s questions that these subdfields are not appropriate for Field 385. As with the preceding paper, this might have advance to an approved proposal had there been more time for discussion.

Cataloging Norms Interest Group Saturday, June 25, 10:30-11:30 a.m.

Fearless Transformation: Applying OpenRefine to Digital Collections / Kara Long, Baylor University

The Spencer Collection of Popular American Sheet Music consists of 30K items, with extensive notes, including the owner’s subjects using the owner’s custom vocabulary.

Grant to digitize 1K items in 1999.
Worked with Flourish to provide content in MARC, which they transformed to load into their ContentDM instance.
PAST: Used XSLT to convert MARC to MARC XML using PERL scripts to transform into to Tabbed format.
NOW: MARC ==> Open Refine ==> ContentDM.

Issues of moving from MARC to DC used by ContentDM, including significant loss of granularity.

Gave a quick intro to Open Refine. Open Refine can use GREL or Jython for scripting.

BIBFRAME Pilot Training
Judith Cannan, Paul Frank, Library of Congress

Spoke about BF test that began last fall, and the testing that generated the test data. Once 2.0 is done they’ll remove the test records for 1.0. Paul Frank developed the profiles to where it is today. Some issues with the test: no ability to revise records; was creating duplicate records, MARC and BF.

Paul: Before the test they convinced Catalogers they were not doing anything different conceptually, cataloging-wise. Hardest issue is the difference between the BF and RDA models. How do you deal with RDA expressions in BF? Added various lookups as things progressed, e.g. RDA Registry.

Best practices and lessons learned: Need to understand RDA well. People need to deal with thinking outside of MARC AND ISBD. Need to convince catalogers that these are irrelevant. This running the pilot were surprised that people were interested in the triples. Lesson learned: group like resources to learn workflows for that kind of item. The Catalogers shifted to

cataloging things first in BF, then MARC, rather than the other way around, which is the way they started.

Hard for parallel languages. Things like parallel statements for publishers.

Judith: talked about transition to the linked data world. We need to do it, mentioned Kodak becoming irrelevant because they failed to adapt to changes in technology. During the BF Pilot they made incremental changes in response to user input. Staff wanted to continue BF 1 day a week to keep going to keep their chops up.

BF 2.0: When? Need to contract converting the LC database into BF 2.0. More likely the end of this year, start of next year. LC will not be the central triple store for the world. BF 2.0 will be more RDA friendly, also will be structured to make extensions easier to to incorporate. NACO “records” in BF would be nice, not sure how much will be in place.

MARC Formats Transition Interest Group
Saturday, June 25, 3-4 p.m.
Debra Shapiro incoming co-chair, looking for 2nd co-chair and 1 vice-chair

RDA: Alive, well, and still speaking MARC / Diane Hillman

Talked about RDA beyond the rules: RDA Registry for RDA elements; Jane-a-thon, and related RIMMF tool; RIMMF R-balls have past Jane-a-thin results.
Multi-lingual RDA translations in the works, includes instructions, vocabularies, data.
Linked Data, FRBR and RDF: RDA based on FRBR except for “unconstrained” version which is the FRBR-less version ; RDF intro, about how it’s based for machine-sharing.
RDA in RDF: Value vocabularies e.g. Carrier type vocabulary ; entities, attributes and relationships represented as RDF element sets.
RDA registry has ToolkitLabel in multilingual forms, based on open metadata registry.
Uses opaque URI–adamant that they are being as neutral as possible, in conflict with some practices asking for understandable
URIs, “natural keys,” which predominantly would be in a language such as English, exhibiting linguistic bias.

RIMMF: Freely downloadable. Demonstrated in English and French versions. Working towards RIMMF as a cataloging tool “in the next year or so.”

Presentation up on SlideShare

Library.Link at Multnomah Community Library / Erica Findley

Large public library in Portland.
Part of Libhub which became Library.Link, a project that went from 10 libraries ==> ~20 libraries Project is to focus on just publishing data out on the web, without focusing on BIBFRAME. Gave background on what linked data is and how it evolved.
September 2014 signed wth Zepheira, then met with other public library participants, include 6 weeks of linked data training.
Sent forward 100K records for quick analysis, then loaded 1.8M records. They’re forming an assessment group to see how well Library.Link is doing. One thing they’ve developed is an author page, similar to a similar-author browse with more relationships.
They’re doing a lot of work to register user interactions with search engines, hoping for good SEO results. First Analytics display showed about 30% from beyond the catalog so far. Showed 29 transaction conversions to holds on books, though they’re still figuring out what all this means.
Most work is independent of cataloging. They’re thinking of including URIs to the subject and author pages they’ve developed.
Metadata is geotagged to make the results more meaningful to searchers.

Question session

There was a fairly passionate exchange about about how RDA Registry formulates URIs without the [English] label in the string, back to the “natural keys” discussion mentioned earlier.
Eric Miller answered some of what’s going on the project they’re doing in Library.Link. Part of the question was “what flavor of BIBFRAME” is being used. EM spoke of developing a stack of elements to use to expose materials on the web. He mentioned using BIBFRAMElite. He considered Lite as part of the BIBFRAME cluster, “It’s all BIBFRAME” [editorial note: possibly a defensive statement in the answer to criticisms elsewhere (for one, by Rob Sanderson) that BIBFRAMELite is not truly BIBFRAME].

Faceted Subject Access Interest Group #FSAIG
Saturday, June 25, 4:30-5:30 p.m.

Magda el-Sherbini will be new co-chair.

Thurstan Young from British Library.
FAST at the British Library / TY on behalf of the FAST Review Group

Dramatic rise in e-resources, other materials only get slightly down. Looking at increased reuse of third-parties’ metadata. Looking to add FAST, as a more contemporary way of subject access that would better match current keyword searches. Test with adding FAST, on non-deposit items, and some backlogs, also use of abbreviated subsets of the vocabulary.

Benefits–more efficient, FAST quicker to apply, fewer barriers to keyword searching. The discovery advantages help its use in their catalog. FAST is free and can be downloadable in its entirety versus LCSH which must be paid for as far as Toolkit subscriptions. Disadvantages–facets get dissociated and important context gets lost. FAST is an OCLC research project, therefore proprietary and not a dedicated service.

British Library Subject Survey from earlier this year. Three possible proposals: proposes to adopt FAST selectively; proposes to implement FAST instead of LCSH; abridged Dewey implementation. Received 60 responses to questions about the above proposals. Some very negative responses to first proposal, but much towards neutral. Much more negative with option 2. Option 3: lots of neutral, some negative to very negative. BL will publish a response to the survey. Includes future needs and plans: issues with support of FAST, wants to take up an active role, wants to conduct a time and motion study to see how efficient it is, no decisions until fall of 2016.

OCLC Fast / Eric Childress

Been doing a lot of housekeeping re FAST. Now working on but not yet ready to release: a tool for users to add FAST headings from other domains. 1.7 million headings–not changed much. “Facetvoc” listserv is being established for all things faceted access, not just FAST. Expect it soon after ALA.

Group Discussion: “Jewish Men Librarians” is not in LCSH / led by Netanel Ganin Former is published promo title, new, final presentation title: The People Were Divided: A problem with faceting LCSH

Talked about problems with the current SACO process, heavy on the fact that it takes so long.
LCDGT is not mentioned to be used for subjects.
Possible solution: create strings on the fly if the components are already established.

Audience: Many ideas, including $8 to link facets. Create a string with multiple rotations? DNB has a similar local implementation to deal with facets contained in a single string.

Metadata Interest Group Sunday June 26, 2016 8:30

Diverse and Inclusive Metadata: Developing Cultural Competencies in Descriptive Practices

Digital Library North / Sharon Parnell, University of Alberta, PhD Library school student

Topic: Engaging with communities to develop culturally appropriate and aware metadata
4 year research project, with Inuvialuit Cultural Resource Centre, to work the culture and language of the region of extreme northern Canada (much of it north of Alaska)
Three dialects of the local language, considered at risk. Different communities, each with 135- 3300 residents.
A focus on creating metadata for the project, with a major part focused on the process of developing metadata.
Seed collections of things in the cultural Centre: Audio video, image, language materials, oral histories, genealogies, etc.
Lessons from the literature: Digital libraries are key elements in preserving language, metadata content and labels, role of dominant language, flexible search and browse, local terms for people places activities / tensions of local needs versus standards and interoperability, acknowledge cultural biases of many standards, working with communities to modify or make up new standards / role of community in creating content and designing the interface / issues of rights, use and access, need to respect understandings of rights use
Community events, surveys, conversations with institute staff and community about what was important: Metadata should capture which dialect/language, whether something is language resource, acknowledge traditional westernized names associated with resources, interest in using existing vocabularies where appropriate. Want community members to have review role to acknowledge proper attributions and descriptions have been made. Clear statements of use rights and restrictions so all will use the materials in the most appropriate and sensitive way.
OLMECA prototype displayed, a search demo for language resources. Language and dialect are highlighted in displays, and role of contributors will be highlighted. Some mapping within Olmeca invokes Google pinning based on metadata place names. They have developed names of places of people places, objects in different languages and dialects.
All content is community-firewalled for now until the community has decided what to do. Will redesign and enhance metadata in the summer and fall working towards further work in the winter with the community.
Want to work at long-term sustainability of the resource to keep it alive into the future, including community editing and hosting, to keep it from withering away.
Sharing is an issue, needs discussion. Most interest in connecting within the community. Sharing with the rest of the world is less of a focus.

Creating Inclusive and Discoverable Metadata for the IR / Tiewei (Lucy) Liu (Cal State Fresno)

The IR is diverse and reflective of a diverse school demographic. It runs on DSpace, DC metadata, but planning a migration. 4800 items in the IR so far, archives, special collections, dissertations, no datasets yet but open to them.
Using the most specific elements possible. Working for something that will convert well into MODS.
Has IR committee about 8 people, also marketing committee support. Faculty also help with metadata planning and rights.
In a 30 minute interview they ask faculty to verify identities and variants, and discuss metadata needs. Submissions in-person, email.
Batch processes employ spreadsheet input.
Multi-lingual content, may develop multi-language search interfaces. Interest in crowd-sourcing and tagging.
Local issues: authority control, technical glitches, copyright clearance, others.
Faculty help with metadata design and rights clearance. Only 1 out of 8 faculty so far have authority records established.
Keyword and faceted search by faceted descriptors.
How to encourage faculty participation? Lunch, faculty events to promote the program, pushing through liaisons, dean level encouragement, no ORCID push currently, maybe use a NACO funnel to create records.

BUSINESS MEETING

Approved minutes on ALA Connect
CC:DA report from Jessica Hayden
MLA report from Jim Soe Nyun of things going on at MLA No LITA liaison, no report
Elections were held for: Vice-chair:
Program Co-chair: Blog coordinator:

Secretary:
Ideas for future conferences? Submit online.

LITA ALCTS RUSA Metadata Standards Committee Sunday, June 26, 1-2:30 p.m.

Many new liaisons have been named, and there’s some confusion so not all liaisons have been named.

ALCTS celebrating its 60th anniversary in 2017.
ALCTS will review committees every 5 years, the committee is up for review now, and the process is about to start up.
Mike Nelson, Enabling Information in the Age of the Cloud–tomorrow’s President’s speaker.

Principles for Evaluating Metadata Standards–reviewed comments on draft document:
https://docs.google.com/document/d/1NXO0miR1- X1xsue1wyKoSStGB_SkGg5SyXTgJVPxezw/mobilebasic
Has gone around various constituencies, including from ethnic caucuses to make things more inclusive. The above link now includes the latest-edited version of the document.

A clean copy of the above will be posted in ALA Connect and on Metawear.

Agenda item on the committee’s task to respond to standards, requests for drafts, etc. What are ways the group can be more effective? Identify standards that the group could be acting to comment on? And could the group look at liaising with groups that produce standards. Defining what is out of scope: CC:DA? MAC? Maybe get a handle on the workload involved with commenting, maybe looking at numbers of requests for comment and the length of timetables.

Metaware (metaware.biz) discussion. Analysis tools are limited. Lots of commercials-friendly tools, such as total hits and time of day, nothing with which articles are used. It’s hard to gauge what content is of interest. Suggestion about withdrawing the site, at least as configured. Maybe move to ALA Connect? Issues that it distracts from the committee’s work. In its defense, it is more outward-facing than Connect would be. So for the time being it will probably remain alive, fairly quiet, with some content from Mike Bolam’s library school students.

August 15 is a deadline for program suggestions through ALCTS.

Jennifer Liss outgoing chair, Kevin Clair incoming.

ALCTS CAMMS Heads of Cataloging Interest Group:
Towards Semantic Metadata Aggregation for DPLA and Beyond Monday, June 27, 8:30-10 a.m.

Josh Hadro NYPL Labs

Moving to web-scale delivery of metadata from different domains
Content-hub side of the equation
Labs is a 29-person program, librarians, developers, designers, photographers, others…
Working with this problem: many metadata creators, systems, practices; multiple metadata migrations; sparse documentation.

(What does it mean to be a content hub? 680K, 1.1 M “pages”/items in digital collections, heavy for special collections, including bits of Percy Bysshe Shelly’s skull.)

Trying to make use of DPLA’s renderings of their content. Used DPLA MAP to crosswalk their MODS data. Released to the public domain high-level TIFFs of everything they have that they know to be in the public domain. Includes public domain as a facet to limit by. Also has NYPL API. Published CSV and JSON whole-collections release of public domain components. Data dump on GitHub. They have a “remix residency” program for artists and others to work with their content and have received numerous expressions of interest; residents will be announced soon.

Rights metadata: Moving to map to expose rights for all their objects.

Metadata work and remediation: multiple systems, with some objects described in several of their four systems. Developed liaison model with their 26 research divisions. Worked to develop minimum viable metadata requirements.

Metadata audit: Looked for presence of 6 fields to gauge minimal level of completeness. Is data in machine and/or human readable form? Elements in audit: Title, type of resource, genre, date, identifier (external), location in NYPL division. Looks to see what’s missing and what division is involved. Tied quality of metadata to use of collections.

Staff skill sets are moving from working with single records to Excel to Open Refine to Python etc. Not there yet, but each step moves towards mass-record manipulation.

Jason Roy, Minnesota Digital Library and DPLA
What does it mean for Minnesota to be a service hub for DPLA?

Minnesota Digital Library deals with small institutions that don’t have the capacity to handle all their digitizations needs. 180 institutions working with central repository, 225,000 resources. Funded by the state and interested in showing impact.

They get metadata and map it using the DPLA application profile then then send it to DPLA. They work with Minnesota Reflections, a data hub, for smaller institutions. Larger institutions don’t need that level of support.

Turns OAI-generated XML to JSON. Involves catalogers to establish rules for transformations. Catalogers develop rules for lookup tables used for transformations, helping with vocabulary. UMNL libraries have the transform tools on GitHub.

Applies CC0 waiver to their metadata. Said that rights are harder than metadata work.

Next step Umbra: They work to see what’s where and also what is missing, looking at underrepresented collections. Consumes DPLA’s API and it makes up 60% of the content. They’re finding Related Content to be a popular feature. Main problem is undescribed materials, much in EADs. Working with student help to see if harvested content is not appropriately described; looking at algorithmic ways to do this work. Question about how to flow added value work back to the source institution. Thinking about and working on it.

Questions:

Some obvious errors can be fixed relatively reliably, e.g. Dates. Exposing collections can result in corrections/enhancements, informally “crowd-sourcing” the improvements and corrections, with several comments each day.

Funding issues: MDL has a generous baseline with state support, though they still work on grants. NYPL gets city funding. DPLA was taken on without extra funding, they decided it should be a priority.

Pre-coordinate vs Post-coordinate Subject Access: Pros and Cons and a Real Life Experience…
Monday, June 27, 1-2:30 p.m.

Peter Fletcher, UCLA ; Diane L. Bohr, NLM

Peter Fletcher

Pros and Cons:
Pre-AND-POSt: examples include old PRÉCIS from British Library: consists of syntactic and thesaurus. No longer used there in favor of LCSH

PRE: LCSH greatest use

LC Report (2007)
Strong bias towards retaining LCSH.

LCSH: not a true thesaurus. Component parts can’t be manipulated. Inconsistent Suns Syntax. LCSH into BLISS? Current strings can be multiple BLISS categories.

Browser (user) research: Left-anchored searches are not intuitive. Faceted: More machine actionable, better for linked data

FH Ayres: Time for a Change: a new approach to cataloging concepts Argued for a move to faceted access, considering it user-driven.

Slides online, includes bibliography

A Real Life Experience / Diane Boehr

Had faceted in their catalog but distributed as pre-coordinated

(MESH never used temporal facets)
Before 1999 used pre-coordinated strings. Then moved to Voyager, moved to MARC.

NLM Gateway–a now-dead search capacity–explored federated book and article searching. Main pre ordinated headings and some faceted strings. Because of subscribers objected they didn’t send out faceted metadata. They machine-coordinated their facets for distribution. Didn’t always work (Eskimos $z Hawaii)

2005 another survey from NLM, 35 responses. 51/49% in favor of unstringing MESH.

Strings make searching easier in catalogs without boolean capabilities. Hyperlinks make more sense. Breaking apart strings is a one-way street. Complexity can never be recovered through automation.

New survey in 2015: 100% agreed no need for combining strings.
Background: more discovery layers with facets; issued MESH as linked data; OCLC removed old record sigh coordinated strings,.

If using MESH, use NLM guidelines when creating national-level catalogs.

Questions/Issues: How to deal with see/see-also in faceted systems?

RDF MESH in multiple languages: a test showed it worked.

Issues about how much to break up strings? FAST still has some strings that don’t conform to one string/one topic concept.

Northwestern maps MESH to LCSH annually.

CATALOGING and METADATA COMMITTEE

LC BIBFRAME Update Forum Sunday, June 26, 10:30 a.m.-noon