Skip to main content
SearchLoginLogin or Signup

Open Metadata in Thoth

Published onNov 24, 2020
Open Metadata in Thoth

Based on the recommendations set out in the WP5 Scoping Report, the development of open metadata management and dissemination system Thoth has been going ahead, and we’re now at the point that two scholar-led open-access publishers, punctum books and Open Book Publishers, are using Thoth for their day-to-day metadata management. The next step in our process will be to create a variety of metadata output formats, such as different flavors of ONIX, MARCXML, and BibTeX, which will allow users of Thoth to easily expose and transfer their metadata to other platforms, such as digital repositories, libraries, and vendors.

At the moment that metadata imported, edited, and stored inside Thoth leave the system, we encounter the question of the licensing of the book and chapter metadata that are exported. More than often, publication metadata records do not come with an explicit license, which creates an undesirable uncertainty as to the precise terms under which metadata can be (re)used and altered. This is only one of the many challenges faced by the scholarly communications community with regard to the processing of metadata (Gregg et al. 2019).

The Open Knowledge Foundation’s Principles on Open Bibliographic Data recommend “the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver” (Open Bibliographic Data Working Group of the Open Knowledge Foundation, 2010); The Open Metadata Principles (Discovery 2011), which have been signed by COPIM partners Lancaster University, Jisc, and the British Library, recommend the usage of Open Data Commons Public Domain Dedication & Licence (ODC-PDDL), Creative Commons CC0 licence, or the UK Open Government Licence (OGL); Plan S mandates the release of “high-quality […] metadata in standard interoperable non-proprietary format, under a CC0 public domain dedication” (cOAlition S n.d.); and the Data Exchange Agreement of Europeana, an EU-funded project to open up the digital collections of European cultural institutions, has a policy of requiring CC0-licensed metadata (Europeana Foundation 2015). The Metadata 2020 project, a collaboration of commercial publishers, service providers, and libraries has so far issued no guidance with regard to metadata licensing, but states as one of its principles that “metadata must be as open, interoperable, parsable, machine actionable, human readable as possible.”

These proposals to release metadata in the public domain have, understandably, encountered resistance from commercial vendors who benefit from a closed metadata ecosystem. For example, OCLC adopted a more restrictive Open Data Commons Attribution License (ODC-By) license for its metadata records, stating that the obligatory attribution “represents a significant investment of time and resources both from OCLC and from each of its member institutions that contribute records” (OCLC 2012).

There is no doubt that many of the stakeholders in the scholarly communications chain make significant investments into creating metadata records, and it is perhaps a feature of the current closed and compartmentalized book metadata ecosystem that much of these investments are actually unnecessary double work. Implementing an ODC-By or CC-BY license for metadata would imply the creation of a potentially sizeable parallel administration of meta-metadata, recording the different authors, editors, and contributors to individual metadata records, who may license their contributions in different ways, thus creating eventually the necessity of licensing those meta-records. This phenomenon is known as “attribution stacking” (Korn and Oppenheim 2011, 5).

The only way to halt such undesirable recursion is for metadata records to be fundamentally open, that is, released under one of the licenses suggested by the Open Metadata Principles. As all member presses of Scholarled (Barnes 2018, Deville et al. 2019) already release their books under one of the available Create Commons licenses, it makes sense to stay within the same conceptual framework and license Thoth’s dataset as CC0. This implies that the present and future users of Thoth, which is freely available as open source software, commit to making their metadata open as well.

Flynn (2013) envisions an “Open Collaborative Catalog (OCC),” which “would allow libraries to combine their cataloging efforts and achieve maximum benefit with OA metadata housed in the cloud, and then pulled into local OPACs [Online Public Access Catalogs].” These “would house the cataloging records released as OA metadata […]. It would be ideal for the OCC to be editable as a wiki in order to maintain the best records and share them with ease, so libraries could download exactly the ones they need” (30).

There is no reason to restrict this vision solely to libraries. The pioneering usage of Thoth within the COPIM project and the Scholarled consortium, represents only one, rather limited use case: the management of a relatively small set of open book metadata. Connected to multiple metadata feeds, such as those listed on’s Bulk Bibliographic Metadata page, Thoth has the potential to become the functional core of the Open Collaborative Catalog that Flynn envisions, providing a collaborative platform for the creation, mashup (Stephens 2011), management, and export of high-quality open metadata without restrictions.


Barnes, Lucy. 2018. “ScholarLed Collaboration: A Powerful Engine to Grow Open Access Publishing.” Impact of Social Sciences (blog). October 26, 2018.

cOAlition S. n.d. “Accelerating the Transition to Full and Immediate Open Access to Scientific Publications.” Accessed November 22, 2020.

Deville, Joe, Jeroen Sondervan, Graham Stone, and Sofie Wennström. 2019. “Rebels with a Cause? Supporting Library and Academic-Led Open Access Publishing.” LIBER Quarterly 29 (1): 1.

Discovery. 2011. “Open Metadata Principles.”

Europeana Foundation. 2015. “Europeana Data Exchange Agreement.”

Flynn, Emily Alinder. 2013. “Open Access Metadata, Catalogers, and Vendors: The Future of Cataloging Records.” The Journal of Academic Librarianship 39 (1): 29–31.

Gregg, Will James, Christopher Erdmann, Laura A. D. Paglione, Juliane Schneider, and Clare Dean. 2019. “A Literature Review of Scholarly Communications Metadata.” Research Ideas and Outcomes 5: e38698.

Korn, Naomi, and Charles Oppenheim. 2011. “Licensing Open Data: A Practical Guide.”

OCLC. 2012. “WorldCat Data Licensing.”

Open Bibliographic Data Working Group of the Open Knowledge Foundation. 2010. “Principles for Open Bibliographic Data.” Open Bibliography and Open Bibliographic Data (blog). October 15, 2010.

Stephens, Owen. 2011. “Mashups and Open Data in Libraries.” Serials 24 (3): 245–50.

David Shotton:

This is an excellent and informed scholarly discussion of licensing issues, and Thoth has come to the correct conclusion that it is essential to use the Creative Commons CC0 Public Domain Waiver at the license that will give maximum freedom for preservation and re-use of their metadata. Well done!

David Shotton, Director, OpenCitations (