Skip to main content
SearchLoginLogin or Signup

Metadata: Archiving challenges for small publishers series

Part of a short series of posts regarding some of the challenges that exist for small publishers of open access monographs, this post considers the challenges of metadata in archiving.

Published onSep 20, 2024
Metadata: Archiving challenges for small publishers series

This post is part of a short series of posts regarding some of the challenges that exist for small publishers of open access monographs. Other posts discuss the challenge of link rot and where existing pathways to preservation may exclude the small publisher due to various platform requirements.

Metadata

Metadata is data that provides important identifying information about a specific set of data. For monographs and books, this is typically created by the publisher in order to allow the book to be discoverable. This metadata usually includes author information, title, publication date, etc., but also DOIs, ISBNs, and more fine-grain information such rights, funders, language, and audience. A good overview of metadata for open access books can be found in OAPEN’s OA Books Toolkit. There are issues that are well-known around inconsistencies within metadata standards and practice. Here we will detail specific challenges we have noted, which may well apply beyond the subset of small publishers.

 Open access status (often non-existent or missing) 

 One of the challenges for all presses, aggregators, and preservation platforms seems to be metadata and especially that metadata which relates to open access (OA) status. To the best of our knowledge there isn’t a specifically defined field which aggregators or metadata schemas have for OA status. In some instances it appears to be a a “rights” field whereas in other instances, aggregators etc. look in a “licence” field. The lack of consistency is not an ideal scenario and this does not help publishers, users (whether they be human or machine), or preservation platforms. How can preservation platforms consistently identify if a work is open access if the metadata itself is not consistent? 

 As such, our recommendation for all presses, but particularly open access presses, is to make sure that the open access licence for the work is included in the licence field. At the very least, this can then be used as a proxy for a work being open access. For example, if a work is published with a CC-BY (Creative Commons By Attribution) licence then preservation platforms, users, etc., can be fairly sure that this work is open access. 

Inconsistencies 

 A wider challenge for users of works (whether researchers, aggregators, humans or machines) is inconsistency in the metadata. Inaccurate or inconsistent metadata could hinder the re-use, discoverability, archivability, and general dissemination of a work. In contrast, consistent and accurate metadata enables all of these attributes. 

For archiving, consistent metadata is essential. Not only can it give the open access status and open access licence (see above) but also helps with the provenance of the associated work. Additionally, consistent metadata can be used by preservation platforms to correctly archive or preserve a work. Having metadata which is inconsistent across works from the same publisher can mean a work is incorrectly archived, or not archived at all. 

One particular aspect we have observed is the inconsistent inclusion of both the DOI and the ISBN for a digital, open-access monograph. This can cause issues both in consistence identification of a work, as well as result in duplication of a work within a platform. A DOI allows for monograph usage to be tracked and for citations of the work to remain accurate. A DOI is a unique identifier that ensures the precise version is being referenced. Having both the DOI and ISBN identifiers present allows for a complete and accurate record for the monograph on all platforms.

 Proprietary metadata: closed vs. open

Metadata is often proprietary and therefore held behind a paywall. As the Thoth team discuss in their 2020 Pub ‘Open Metadata in Thoth’, “the current closed and compartmentalized book metadata ecosystem” unnecessarily doubles the work invested into the metadata sets.

 Publishers can openly licence their metadata to allow for reuse.

Effect in dissemination 

 As noted above for archiving, inconsistent or inaccurate metadata can also affect the dissemination of books. Many aggregators require metadata in a certain schema or standard in order to disseminate a book. Creating all of the required schemas can be a complex and time-consuming activity for presses, particularly smaller presses. One of the benefits of the open metadata system Thoth is that it can create these different schemas for publishers. This helps disseminate the works to a number of platforms e.g. OAPEN, JSTOR etc. Just disseminating the work to a number of platforms helps with the archiving of that work and the simpler this dissemination can be, the more chance there is of a scholarly work both being read and archived. By disseminating a work through other platforms, e.g. OAPEN, the book is then included in the archiving and preservation policies of that aggregator. For example, OAPEN has a relationship with Portico. Good, high-quality, metadata helps with this dissemination process.  


Photo by Tobias Fischer on Unsplash

Comments
0
comment
No comments here
Why not start the discussion?