WP5 Scoping Report: Building an Open Dissemination System

1.0 Introduction

COPIM is an international partnership of researchers, universities (Coventry University; Birkbeck, University of London; Lancaster University; and Trinity College, Cambridge), established open access publishers (the ScholarLed consortium, which includes Mattering Press, meson press, Open Humanities Press, Open Book Publishers and punctum books), libraries (UCSB Library and Loughborough University Library) and infrastructure providers (the Directory of Open Access Books and Jisc).

In addition to the consortium members, COPIM is collaborating closely with institutions such as the British Library and the Digital Preservation Coalition, and with the OPERAS-P project and the Next Generation Library Publishing project, as well as a broad spectrum of academics, publishers, librarians, software developers, funders and more as part of the working groups, events and projects that COPIM is setting up and running. COPIM is funded by the Research England Development Fund and Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin.

The project focuses on an underlying philosophy of ‘scaling small’, the idea that publishing Open Access (OA) books should be something that a wide range of publishers, of differing sizes and with a variety of business models, can accomplish at manageable cost through collaborative effort and effective network-building. This way we keep the diversity, autonomy and independence of these presses intact while allowing them to benefit from the relationships fostered through setting up horizontal and vertical alliances with other stakeholders (Adema & Moore, 2018). COPIM aims to collectively develop a significantly enriched not-for-profit and open source, community-governed ecosystem for OA book publishing, to support and sustain a diversity of publishing initiatives and models, particularly within the Humanities and Social Sciences (HSS), in the UK and internationally. More specifically, the project aims to:

Remove hurdles preventing new and existing open access book initiatives from adopting open access workflows by 1) building open-source, community-based infrastructures that support the publication of open access books, and 2) establishing and consolidating partnerships between HE institutions and open access book publishers
Develop consortial, institutional, and other funding systems—building upon the partners’ existing network of 240+ libraries internationally—that will 1) serve as an important hybrid community-led revenue models for open access book publishers, 2) support the establishment of more community-owned and governed infrastructures, and 3) promote publisher-librarian partnerships around open access book publishing
Showcase alternative (non-BPC) business models that incorporate infrastructural innovations and/or cost-reductions through streamlined operating processes, production workflows and economic efficiencies—which would benefit all scales of publishing initiatives
Support the creation of, interaction with, and reuse of open access books in all their variety and complexity (including emergent and experimental genres), most importantly by ensuring that these complex digital research publications can be archived effectively
Achieve knowledge transfer to stakeholders through various pilots that will 1) enable COPIM’s technical, organisational, financial and relational innovations to scale both horizontally (to other presses) and vertically (to other partners, including universities, libraries, and funders) and 2) inform and support (future) funder requirements for open access books.

The project consolidates existing relationships among the partners into a major strategic collaboration that will enhance the impact of research internationally. At the same time, COPIM promotes a community-based approach for the collaboration of academic institutions and industry stakeholders. Finally, it develops innovative approaches for knowledge exchange activities to facilitate sustainable and workable transitions to an open publication ecosystem for monographs and to ensure a diverse ecology of publishers.

COPIM will benefit the general public and the economy by maximising the dissemination and impact of research. The adoption of COPIM’s infrastructures, business models, preservation structures, and governance procedures, will enable economic resilience and enhanced capacities, at smaller and larger scales, for open access books. It will offer Higher Education institutions and Arts, Humanities and Social Sciences researchers publishing models they control, increased publishing options, and cost-reductions to build a more horizontal and co-operative knowledge sharing community.

As part of seven connected Work Packages, COPIM is working on

integrated capacity-building amongst presses;
access to and development of consortial, institutional, and other funding channels;
development and piloting of appropriate business models;
cost reductions achieved by economies of scale;
mutually supportive governance models;
integration into library, repository, and digital learning environments;
the re-use of and experimentation with OA books;
the effective and robust archiving of OA content; and
knowledge transfer to stakeholders through various pilots.

This scoping report concentrates on the metadata required by the key stakeholders in the scholarly communications supply chain in order to define a best practice for all stakeholders as regards the metadata types and formats that are required to allow open access publishers to meaningfully interact with it.

On 16 March 2020 a COPIM publishers workshop took place, which helped to inform parts of the present report. The authors would like to acknowledge the contributions of the participants to this workshop, including, Alessandra Tosi, Bethan Ruddock, Charles Watkinson, Eelco Ferwerda, Emily Farrell, Fulvio Guatelli, Janneke Adema, John Scherer, Lorenzo Armando, Martin Eve, Natalie Williams, Sharla Lair, Simon Forde, and Sofie Wennström.

1.1 Building an Open Dissemination System

Work package 5 (WP5) is developing technical protocols and infrastructure to better integrate OA books into institutional library, digital learning, and repository systems. This will support wider discovery and dissemination of OA books. Existing print and ebook distribution channels are difficult for new or OA publishers to engage with, requiring submission of metadata in multiple different formats (e.g. MARC, ONIX, KBART), and many platforms requiring multiple different metadata submissions; In addition, existing distribution channels are not well suited to OA content, while entirely new discovery and dissemination platforms are emerging (e.g. Google Books/Scholar).

Guided by the perspective of new and emerging not-for-profit OA presses that have not yet been sufficiently integrated into existing discovery systems, knowledge bases, and supply routes, the aim of WP5 is to develop methods and systems to better integrate the catalogues of OA publishers into curated research records. The implementation of “best practices” workflows for OA book publishers will allow their catalogues to be better integrated into the scholarly record (discoverability, reach, persistence), increasing the impact of OA books.

WP5 will build an Open Dissemination System (ODS) for OA books and a shared “best practices” digital catalogue. The ODS will be built as a decentralised system, using open source code, open protocols and standards and distributed databases—all under collective control. Doing so will ensure the system cannot be operated for the benefit of a single entity (either commercial or not). The ODS is currently under development under the project name Thoth. It consists of a metadata management system and a suite of exporting functions to allow publication metadata to be exported to all main metadata formats and data transfer with all relevant major platforms in the library and book selling supply chain.

2.0 About this scoping report

This scoping report is a key deliverable of WP5, in order to support the creation of the ODS. The report itself will discuss the distribution of books via the traditional library supply and new forms of digital dissemination before looking at metadata in depth. Metadata creation and types will be investigated in order to form a number of key recommendations for WP5. These recommendations are noted throughout the report before being grouped and discussed further in the recommendation section (see 9.0).

Rather than publishing this report at the outset of the work package, it was decided to publish a time-stamped version, while simultaneously continuing to develop the report as the project progressed over time, and to encourage comment from the community. Version 1.0 of this report is available here.

2.1 Background

A 2016 study by Nielsen noted that “the ease with which books can be discovered and the ease with which they can be traded, rely heavily on the provision of appropriate, accurate and timely metadata” (Walter, 2016). It follows that this is also true for the discovery of OA monographs. Studies, such as the OAPEN-NL project (Ferwerda, Snijder & Adema, 2013) have shown that OA has a positive impact on the usage and discovery of monographs. However, a recurring theme for open access monograph publishers is that of discoverability, dissemination and metadata.

In 2017, a Jisc landscape study (Adema & Stone, 2017a) suggested that, because the quality of metadata at New University Presses (NUP) and Academic-led publishing (ALP) initiatives were at various levels of maturity, best practices in metadata needed to be drawn up. This was later confirmed at a European level by the Knowledge Unlatched Research report to OPERAS on the visibility of metadata (Neylon et al., 2018), which stated that “[t]he metadata held and managed by OPERAS partners is inconsistent and variable in quality. Collecting and aggregating data from multiple OPERAS partners was a challenge due to inconsistency in bibliographic metadata processes and formats”.

New University Presses, Academic-Led Publishers, and open access presses in general, have difficulty in accessing the library supply chain. As part of the ALP interviews for the Jisc landscape study, Rupert Gatti of Open Book Publishers noted that it would be helpful to have a service that “looks at how to bring academic content into the catalogues and the digital learning environments of the universities and to allow universities to also relate back to the publisher, so that there is a flow of information going back both ways” (Adema & Stone, 2017a).

This is often compounded by the low staffing levels at many presses. Many presses are staffed by 1–1.5 FTE (Adema & Stone, 2017b) , staff are often trying to cover a number of roles. In some cases, all ‘press’ staff may actually be freelancers. Therefore, the opportunity to improve and automate the quality of metadata in the publication workflows for these presses is of paramount importance in order for their voices to be heard.

Finally, it is vital to understand the needs of open access monograph publishers that operate outside of Northern Europe and North America. Van Schalkwyk and Luescher (2017) observe that the move to digital content is an opportunity for “a new institutional logic in higher education publishing” and that this is particularly important for African university presses.

2.2 Metadata and discovery

Gregg et al. (2019) describe the challenges, opportunities and gaps for metadata for various stakeholders in the scholarly communications supply chain such as publishers, service providers, platforms and tools, researchers, funders, librarians, data curators and repositories. They provide an excellent overview of the issues in their literature review, which forms a very useful background to this report. Deville et al. (2019) describe metadata as the “final piece of this ever-changing jigsaw”. However, they also note that metadata issues are “also an issue for many traditional publishers and is not necessarily an open access issue”.

The 2019 Science Europe briefing paper on open access to academic books notes that to ensure discovery, “the technical quality of academic books and the use of common standards for their metadata, are essential” (Science Europe, 2019). It observes that conventional metadata includes “bibliographic information, ISBNs, classification codes, keywords, abstracts; digital metadata: DOI, ORCID, and increasingly chapter level metadata; and specific Open Access metadata: license information, funder information, links to Open Access editions, and, in the case of green Open Access: embargo information, version information and link to the version of record”. The European Commission’s Future of scholarly publishing and scholarly communication (2019) also calls for standardised metadata to maximise usability.

Various authors (Gregg et al., 2019; Adema & Stone, 2017b; Deville et al., 2019; Adema, 2019) recommend that best practices and a minimum metadata requirement are established to assist all stakeholders in the supply chain. Some work has already been carried out in this area, such as the Jisc/OAPEN metadata model for open access books, which was “[d]eveloped in consultation with research funders, academics and institutional staff and OA monograph publishers, the model recommends a provisional list of metadata for OA book publishers and other stakeholders” (Snijder, 2016).

At a 2018 workshop (Stone, 2018), participants representing key stakeholders in the scholarly communications chain supported this view and called for a minimum metadata requirement, which could then be used in all metadata in the library supply chain, such as ONIX, MARC, KBART etc.

A second ‘enhanced’ set of metadata requirements may also be required to aid discovery and dissemination of OA book content. A recent report from Nielsen (Walter, 2016) suggests that, in addition to a basic set of metadata requirements, this set of enhanced metadata requirements would increase sales. For the purposes of this report, we equate sales to increased discovery of OA material. However, it should also be remembered that many OA monograph publishers also rely on sales as an additional revenue stream.

Given the size of many OA monograph publishers “... enriching existing metadata records can be difficult and time consuming” (Kemp, 2018, p. 208). In the UK, for example, BDS offer a chargeable service for both libraries and publishers (where they can buy back their own metadata) (BDS, 2020). Therefore, enriched metadata is in scope for this report in order for COPIM to provide an open source service for OA publishers.

This report will scope out the various forms of metadata required by the key stakeholders in the scholarly communications supply chain in order to inform a minimum metadata requirement. We believe that this would go a long way to plugs some of the gaps that Gregg et al. (2019) identify in publisher and author-supplied data.

Recommendation 1. COPIM should consider developing two metadata requirements for OA monographs, a minimum set of metadata requirements and an enriched set. Any technical report used to build the ODS should consider both

2.3 Overlap with other initiatives

As part of the scoping report, it is important to identify any overlap between the COPIM project and other projects/initiatives in order to prevent ‘reinventing the wheel’.

2.3.1 COPIM, OPERAS and Jisc’s Library Hub

In 2019, a call between COPIM WP5, OPERAS and Jisc’s Library Hub and National Bibliographic Knowledgebase (NBK) team took place to identify any overlaps between the initiatives, which will benefit the development of the ODS. For example, OPERAS will be surveying researchers in order to understand their needs and this could have an impact on scoping the new forms of dissemination section.

One of the work packages in OPERAS-P is the redevelopment and migration of DOAB to a new open source platform. The redevelopment will be based on DSpace, the open source repository system. The redevelopment also involves the OAPEN Library, which launched on the new platform in April (Ferwerda, 2020). DOAB will follow in October 2020.

Jisc’s NBK is now overlaid by the Library Hub services. NBK is the data lake while Library Hub is positioned as a national infrastructure and includes Discover, Compare and Cataloguing services. Contributors include cultural heritage institutions. There is a clear overlap between NBK and COPIM WP5 and liaison will continue between the two initiatives. Following on from this, it was announced in May 2020 that a project had been commissioned in order “to design new agreements between suppliers and libraries that allow the latter to more freely share and reuse bibliographic data” (Grindley, 2020). This service could compliment the work of COPIM and a two-way exchange of information and data needs to be developed.

All three parties share commonalities of interest on standardisation of data - data quality/fit for purpose conversations as well as business models.

Recommendation 2. COPIM WP5 should consider developing a set of formal links with OPERAS and NBK/Library Hub in order for a two-way exchange of information and metadata. This should include key deliverables

2.3.2 Metadata 2020

“Metadata 2020 is a collaboration that advocates richer, connected, and reusable, open metadata for all research outputs, which will advance scholarly pursuits for the benefit of society” (Metadata 2020, n.d.a.). The project is particularly relevant to this scoping report as it looks at best practice for metadata and has communities based on researchers, publishers, librarians, platforms and funders. There is an overlap between some of the aims of the project, to create workflows and best practice, with that of the COPIM project.

The various groups have established “defined metadata challenges, barriers, and opportunities” (Schneider & Steinle, 2019) and the project has defined its metadata guiding principles (Kaiser et al., 2020). However, it appears that Metadata 2020 is looking at all metadata and emphasis appears to be on journal metadata (see Schneider & Steinle, 2019). For example, one of the community groups is looking at defining “core metadata glossary of terms e.g. ‘concept’, ‘schema’, ‘title’”, but it is unclear whether monograph material is being looked at in the first instance. However, there is clear crossover between the Metadata 2020 initiative and COPIM.

Another community is looking at Metadata Recommendations and Element Mappings, which, if monographs is in scope, would have a direct influence on COPIM. The challenges identified by this group certainly resonate with COPIM:

Many communities and publishers develop recommendations for metadata in their disciplines or for data submitted with scientific papers. These recommendations typically include two elements: conceptual descriptions of metadata needs and representations of those concepts in community dialects (XML, CSV, JSON, RDF, …). Mapping between the recommended concepts is an important step towards converging towards a recommendation that is consistent across communities
There are many different ways that metadata is created, vetted, used and distributed; and the complexity of this makes finding new efficiencies and systems implementation difficult
Most groups face interoperability challenges with systems and processes
There are silos within organizations themselves, making communications challenging

(Metadata 2020, n.d.b)

Recommendation 3. COPIM should keep a watching brief on the Metadata 2020 project and makes findings available if appropriate

3.0 Distribution channels

In their 2019 literature review, Gregg et al. (2019) group service providers, platforms and tools together. This includes traditional library systems vendors (e.g. Ex Libris, EBSCO and OCLC) and e-retailers, such as Amazon. In this scoping report we have decided to separate these out between traditional and new forms of dissemination.

Gregg et al. (2019) observe that for these vendors, “initiatives for open metadata and usage data, transparent pricing and contracts, allowance for community input, and use of well-established international standards to promote interoperability would bring positive change of the sort modelled in other industries” (p. 7). However, initiatives for open metadata alone will not solve these inconsistencies unless best practice and a minimum requirement is agreed. A further issue is ‘how’ open metadata will find its way into supplier’s knowledge bases. For example, “the business models that have historically compensated partners in the supply chain for their work are challenged by “free stuff” and the partners are also not incentivized to funnel usage and engagement data, the currency for sustaining open access approaches, back through the system” (Grimme et al., 2019. p. 8). Amazon does not have a zero-price point, for example

Deville et al. (2019) observe that “[t]here is a perception that, because they are both digital and readily shareable, open access texts are inherently discoverable. In some respects, however, discovery and dissemination remains the most significant ongoing challenge for open access book publishing. The challenge has three main overlapping components: (1) dissemination via digital platforms and book sellers; (2) dissemination via libraries; (3) metadata”. It is important to understand the difference between dissemination (pushing content out) and discovery (readers finding the content). For example, non-OA publishers spend a large amount of time on promotion. This may not be the case for small OA presses and although there might not be a paywall, there is still a gap between OA content being discoverable and readers actually knowing about it. However, print and OA can co-exist for many OA publishers. One of the key issues for OA presses is how to ensure that the metadata reflects that both formats are available. This could allow OA discovery to lead to print sales, as there is evidence that researchers still prefer print in many disciplines (CUP & OUP, 2019). It is also important that readers discovering the print metadata know that they can access the OA version if they wish.

The main challenge for OA publishers is the collation of data from all of the different platforms used, OA or print distributors. Each platform tends to have a slightly nuanced version of the metadata. Often this work has to be outsourced at a cost to the publisher.

Another issue for OA publishers is understanding where researchers go to discover their content. In 2018, a large-scale survey looking at how 10,977 readers discovered content identified that Google is the most important method of discovery for government, corporate and charity sectors (Figure 1). However, “[p]eople in the Academic sector think their library website is the most important resource for book discovery, followed very closely by Google”.(Inger & Gardner, 2018, p.28) This view is supported by a smaller scale US-based study (Oh & Colón-Aguirre, 2019).

Figure 1. Book search by sector, 2018. (Inger & Gardner, 2018) Published under a Creative Commons Attribution-NonCommercial licence: https://creativecommons.org/licenses/by-nc/4.0/

This was also split by job role in the academic sector, which shows a difference for teachers, researchers and educators. Library web pages appear to be the most important for researchers (Figure 2).

Figure 2. Book search, academic sector, job role, 2018. (Inger & Gardner, 2018) Published under a Creative Commons Attribution-NonCommercial licence: https://creativecommons.org/licenses/by-nc/4.0/

This data shows that the importance of the ‘traditional’ methods of discovery should not be ignored alongside emerging forms. McCollough (2017, p.191) notes that the two most important factors in making OA material available are depositing records in DOAB and libraries systematically including OA monograph records in their catalogues.

Although this may not answer the question of how researcher’s discover OA content, it does point to the forms of discovery that this scoping report should cover.

Recommendation 4. Further work is required to find out how researchers discover OA content

3.1 Traditional library supply chain

The ‘traditional library supply chain’ describes the workflow, which results in discovery via a university library, either digitally or in print. This supply chain can effectively be split into two streams, which will also overlap.

The first stream includes traditional library book suppliers. Suppliers in the UK includes Askew and Holts, EBSCO and ProQuest, who are the approved suppliers under the UK Books, E-books, Standing Orders and Related Material - Inter-regional Agreement (SUPC, 2020). This agreement is in place until 31 July 2021. Other suppliers are also included for e-textbooks and individual sales. Both print and e-book material are covered under this agreement, which includes the provision of MARC records for library catalogues (at a cost). In this part of the supply chain there is little opportunity for OA books to be listed by suppliers, even if a paid print/e-version is available for purchase. Other suppliers in this arena include Lightning Source for POD, and distribution in the USA via Chicago University Press.

The second stream includes library discovery systems such as Primo, Summon etc. These discovery systems have been on the market for over 10 years and exist as one stop shops to provide library users with a large selection of subscription content, library holdings and other selected content. (Stone, 2009). It is also possible for library MARC records to be uploaded into these discovery systems.

This scoping report must consider both means of e-book discovery.

3.1.1 Library suppliers

Deville et al. (2019) observe that the “contemporary academic book industry is … largely built not on direct engagement with readers but on distribution via libraries. Commercial publishers have well-established routes for getting their books into libraries, both in terms of the packages they offer and the access they have to the infrastructures and supply chains used by libraries to purchase content. Libraries tend to use approved library suppliers, which do not usually list open access monographs. If they do, they often favour the copy available for purchase and not the free version”.

As stated above, the UK is part of a framework agreement, negotiated by the Southern Universities Purchasing Consortium, which is governed by European procurement law. This agreement commits universities in the agreement to make a certain percentage of spend through selected suppliers from the agreement. This, in part, is the reason many library acquisition teams are not aware of open access monographs. If they are not listed in the database of their chosen suppliers, they are unlikely to look elsewhere.

Other countries and regions are likely to have similar agreements in place, which typical lock down library purchasing for 3–5 years.

McCollough (2017, p. 185) evidences the issue with OA monographs and the supply chain with research that shows that of 192 library catalogue searches for an OA version of a sample book, 66% of results had no indication of the OA version, 13% had an indirect indication and only 21% had a clear indication of the OA version.

In the UK, the issue of OA monographs and the library supply chain was discussed in a 2018 Jisc workshop by stakeholders from NUPs, ALPs, book suppliers and distributors, metadata suppliers, libraries and other experts in OA publishing who were invited to consider the problem statement:

OA publishers have difficulty accessing the channels that library acquisition departments use to buy print and e-book content.

Metadata was one of the areas discussed (Stone, 2018). The complexity of the acquisition process has proved an issue for these presses. Low staffing numbers (Adema & Stone, 2017b) often mean that specialist knowledge of library systems and the supply chain is lacking. Understanding the metadata requirements for the supply chain, such as MARC, ONIX, KBART is a problem for smaller OA presses. A degree of standardisation would help presses to cope with metadata channels. Participants noted that there were also issues around chapter level metadata as there was not always a means to capture it.

Stakeholders at the workshop also recommended that there should be a discussion with the British Library around possible inclusion in the Cataloguing-in-Publication (CIP) Programme, which “provides records of new and forthcoming books in advance of publication in the United Kingdom and Ireland” (British Library, n.d.). Workshop participants suggested that, due to the importance of Library of Congress categorisations and metadata, which publishers do not tend to engage with, this should also be taken to the Library of Congress committee for discussion. Although these suggestions were made in the Jisc workshop, it seems sensible to include them in this scoping report for action.

In addition to the traditional library suppliers, many of which are mentioned above, participants at the WP5 ‘Cambridge workshop’ (Barnes, 2020) also highlighted the importance of suppliers such as ACCUCOMS and Burgundy in this respect. Therefore, once further developed, it would some prudent to provide a briefing note to these suppliers about the work package in order to encourage engagement and potential business models for suppliers who still remain important to library acquisition processes.

Recommendation 5. COPIM to liaise with the British Library regarding its metadata services and the Cataloguing-in-Publication (CIP) Programme

Recommendation 6. COPIM to make contact with the Library of Congress committee to initiate a discussion about OA books

Recommendation 7. Release a briefing paper aimed at library suppliers to increase engagement

3.1.2 Cultural change

There are important cultural change implications with respect to the library supply chain, which need to be considered alongside the scoping report. Firstly, there is cultural change in the supply chain. Grimme et al. (2019, p.11) comment that there is little incentive for commercial intermediaries to make a free version of a book available to libraries who may buy the print copy. However, adoption of a more service orientated business model may apply in this instance, where suppliers make the data available to the library at a cost. The COPIM model would not rule this out. Indeed, intermediaries already provide access to OA journals as part of full text journal ‘database’ offerings where libraries pay for the service (e.g. the database of titles provided by the intermediary) and not the titles themselves. This area would appear to fall into the remit of work package 2 of the COPIM project.

Recommendation 8. COPIM WP2 may wish to consider the traditional library supply chain in its modelling

Secondly, any minimum metadata standard would have to ensure that there was a way to enable library acquisitions teams to see that there is an OA version of the monograph alongside the print copy via the supplier. However, this is also an area that needs further change. Currently, very few, if any, libraries have Collection Management and Development policies that include OA monographs. Even if they are included in the library supply chain, further work is needed in getting library acquisitions teams to consider open alongside print and other digital formats. Thus, there is need for action in a number of key areas:

Redesign the Library supply chain to support open content
Rethink how to demonstrate value for money for resources invested in open
Support academic staff to select Open Educational Resources and Open Textbooks for teaching
Include open content in Library collection management and development policies so open content is selected and acquired in the same way as purchased or subscribed, or event that the discovery of open material is prioritised over purchased.

(Ball, Stone, & Thompson, 2019)

Recommendation 9. To keep a watching brief on cultural change in order for the work package to be successfully adopted by the supply chain and libraries

3.1.3 Print supply chain

The print supply chain is worthy of separate mention. Despite being born digital, it is also important to get OA monographs into the print supply chain, and not only for the reasons discussed above. Both traditional methods of library supply and systems and newer forms of distribution (Amazon, which is often the number one discovery platform for print) are print-centric and both have a problem dealing with zero cost. This can create an issue for OA metadata. For example, ONIX feeds reject zero-price point (see section 5.1). Distribution systems such as CoreSoure also struggle with a zero-price point.

Having to maintain two forms of distribution for OA (digital and print) is difficult for many OA publishers. It also does very little to link the two together. Furthermore, the Cambridge workshop suggested that researchers do not see the differentiation between print and digital. Their search often starts digitally, followed by an inflection point where researchers move to the print format, and this needs to be seamless. Therefore, a takeaway point for COPIM is to see things from the researcher’s perspective. Print and digital run concurrently for scholars (see Recommendation 4).

However, print supplier’s business models are based on taking a commission on sales based on a percentage of the sale price. Because OA is often paid up front on a per title basis or annual membership fee, there is little financial incentive for print suppliers including metadata referring to the open access version

3.1.4 Library systems

Library systems, particularly online catalogues continue to play an important function as a broker for information and should not be forgotten as they can help build trust in metadata. However, there are issues. Cataloguers often use the print book at create or enrich records. Participants at the Cambridge workshop suggested cataloguers may actually trust print over digital and OA based on a perception of value. This revisits the cultural change issue described above. Other libraries may buy-in metadata from suppliers, which comes with known issues about OA material, such as metadata fields being stripped out by suppliers wishing to focus on the purchased content.

In addition, Mudditt notes that some libraries see their library catalogue “as pertaining only to their local collections” (Mudditt in McCollough, 2017, p.190). In this case, they might be unlikely to add OA versions to what is ostensibly a catalogue of print material. However, this would not stop these records being loaded into a web-scale discovery system (see 3.1.7). This puts small OA publishers at a disadvantage. Therefore, COPIM needs to develop channels through which to address large distributors and to get (open) metadata taken seriously.

The workshop also suggested that COPIM tap into some of the outputs of recent OAPEN workshops, which looked at library services. For example, add-on services, feeds on books and automated reports.

Recommendation 10. COPIM to develop channels to address distributors and library systems vendors

Recommendation 11. COPIM to discuss outputs of the OAPEN workshops in order to develop this area

3.1.5 Publisher platforms

A number of publishers at the Cambridge workshop had their own platforms for digital delivery (e.g. MIT Press), which has led to a more direct relationship with libraries. This has made them aware of the amount of work that aggregators do on publisher’s behalf. Other publishers use an eCommerce site. However, it was noted that for each platform the metadata information needs to be uploaded again. More efficient ways are required to do this to lessen the staff time for smaller publishers. In addition, some platforms require metadata at a chapter level, others do not.

Therefore, the issue is not whether the book is open or behind a paywall, the issue is the difference between platforms. Larger publishers have resources to do this, smaller publishers do not. A consortia way of working would work for these publishers. It should also be noted that publishers have platforms for more reasons than discoverability.

In the US, Library Simplified and SimplyE from LYRASIS and the Digital Public Library of America is a middleware solution with a user app for patrons. It functions like an RSS feed. This is something that COPIM needs to investigate further. It might be possible to pull from publisher platforms as a form of blog feeder, which could encourage better metadata.

Recommendation 12. Liaise further with LYRASIS to better understand the approach of Library Simplified and SimplyE

3.1.6 Vendor platforms

Vendor platforms make the digital version available to the reader, usually via a library subscription. This large-scale dissemination via larger networks and search tools, enabling cross-searching of content makes book discovery seamless. OA presses attach a high value to these platforms, which include Project MUSE (2020), JSTOR (ITHAKA, 2020), Fulcrum (n.d.), OAPEN (2020) and Open Research Library (2020).

For one OA publisher at the Cambridge workshop, JSTOR represents over half of the download figures. Another press has analysed data on OA titles on JSTOR to comparable non-OA titles finding approximately 7x more usage. According to a survey of the initial usage, most readers were only downloading one or two chapters, so the higher usage cannot necessarily be attributed to readers downloading all chapters in a book individually. JSTOR also point out that by making books available at chapter level, the content is much more discoverable. One press reported that 40% of JSTOR searches originate within the platform, meaning that JSTOR is essentially its own ecosystem (see also Montgomery et al., 2017). The impact of usage at chapter level needs to be considered by COPIM when designing the ODS.

Recommendation 13. Some OA publishers see JSTOR as an essential part of their dissemination. Therefore, COPIM should engage with JSTOR

Recommendation 14. COPIM should consider the inclusion of chapter level metadata as within scope for the minimum metadata requirements

3.1.7 Discovery systems

A library must select the appropriate provider in its discovery system knowledgebase, e.g. Directory of Open Access Books (DOAB) (DOAB Foundation, 2020), from available resources in order to load OA books into the discovery system at the local level. This relies on a library collection management and development policy that includes OA book selection in its relevant criteria. This is itself can be a cultural change issue in libraries (see Ball, Stone & Thomson, 2019).

DOAB currently makes its metadata available through the OAI harvesting protocol (OAI-PMH) (Open Archives Initiative, n.d.). However, as explained in 2.3.1, DOAB is migrating to a new platform based on DSpace, which will result in the addition of daily updated files, ready for download. These files will be based on the following formats and standards: ONIX (3.0), MARC, MARCXML, CSV, RIS. Service providers and libraries can then either use the protocol to harvest the metadata of the records, or download the metadata in an appropriate format, for inclusion in their collections and catalogues .

Over the past 10 years, library discovery systems have grown in importance for libraries and their users. Libraries have long been able to add OA resources, such as the DOAB and HathiTrust (n.d.) to their knowledgebases, which should help to level the playing field for OA publishers, as long as the metadata is of good enough quality. McCollough (2017) notes that discovery systems that harvest and display DOAB records may solve the problem of OA discovery through these systems.

However, this does not appear to be the case and it is still unclear about how some of these workflows operate. Gregg et. al. (2019) observe that “depending on the nature of the services offered, one service provider’s use of metadata will vary significantly from another’s” and that these inconsistencies create problems for discovery (Wiersma & Tovstiadi, 2017).

The Mapping the free ebook supply chain report (Watkinson et. al., 2017) concurs with the point made above (section 3.1.3) regarding users of free ebooks tending to move to print to read the whole volume. This implies a potential disconnect in the system if metadata is not linked between OA versions and a print copy held in the library. If the local MARC holdings, which may include a copy of the print version, are uploaded into the discovery system, FRBR (Functional Requirements for Bibliographic records) (IFLA, 2020) may link the two records. However, in order to do this consistent metadata is required, e.g. DOI and an OpenURL (see section 4.0 below), this information may not be available in the MARC record due to poor metadata?

Informal conversations with university presses suggest that this is often the case, even if the metadata is of sufficient quality in DOAB, the press in particular found that their own publications had very poor metadata when retrieved from the library discovery system. This suggests that the quality of metadata was an issue due to workflows. However, good quality metadata is not a guarantee of retrieval. However, good quality metadata is not a guarantee of discoverability. In their research, Wiersma & Tovstiadi (2017) found that (at the time of the research) many discovery platforms did not distinguish between authors and editors and often omitted metadata such as the subtitle of the book. The research also shows that further complications arise with search functionality, such as phrase and Boolean searches and stop words, relevancy ranking algorithms, Therefore, further work is required with vendors to understand this area further (see also section 5.3).

Recommendation 15. To conduct a number of interviews with key library discovery vendors to better understand their use of metadata in relation to that agreed by COPIM

Recommendation 16. Consider re-running the research of Wiersma & Tovstiadi for a selection of OA books once the ODS is released

Another issue reported by OA presses is that they often get caught in the backlog, meaning that their metadata tends to be loaded after what are considered to be important or problem publishers. While this needs to be substantiated it should be noted that libraries have significant purchasing power with the discovery vendors and this needs to be leveraged. This issue should be raised as part of the work package 2 library workshops. Understanding the power distribution in this workflow is important for COPIM as a project.

Recommendation 17. Raise awareness with libraries of their purchasing power as part of the work package 2 workshops

One solution might be to discuss with library discovery system vendors the possibility of adopting the SPARC and COAR best practice principles for data repositories (COAR, SPARC, 2019). Gregg et al. (2019) believe that this could be applied to vendors.

Recommendation 18. Discuss with SPARC and COAR the adoption of best practice principles for data repositories for OA monograph metadata

Presses are also unclear as to how the workflow between these vendors works with regards to their roles as library suppliers, vendor platforms and discovery systems. Further work needs to be undertaken to understand these often-complex workflows.

Recommendation 19. COPIM should engage with the large vendor platforms that do not ingest OA data in order to understand their workflows

It was also suggested that COPIM needs to understand more about researcher discovery. This step needs to happen after discovery in all channels is improved and monitored. That will allow more focus on the most popular channels. This may mean that the scoping report needs to be developed as the project continues (e.g. after each scoping, development, outreach phase).

Recommendation 20. Review scoping report and survey researchers once metadata and discovery channels are in place

3.1.8 Indexing systems

The Cambridge workshop mentioned the importance of indexing systems to OA monographs publishers, including Dimensions, Scopus, Book Citation Index from Clarivate and OpenCitations – these are business model agnostic so as long as metadata and files are clean, they do not discriminate against OA.

Recommendation 21. Further work is required to understand how COPIM can communicate with indexing systems, e.g. is there a peer review level to pass through?

3.1.9 Digital learning environments and reading lists

Another area of dissemination is through digital learning environments or reading lists. Although the majority of OA publishers are publishing ‘research monographs’, there is crossover to the textbook market. Comments at the Jisc workshop regarding the embedding of OA material felt content was typically presented in a very informal way rather than being formally organised as part of reading list software – there is a link here to the library supply chain, where nearly all textbook content is acquired.

There are tools in place to bring in OA content, such as open educational resources into learning environments. For example, by using platforms such as Kortext (2020) and Talis Aspire (2020) in the UK and the Unizin Consortium (Unizin, 2019) in the US, who are interested in more open, community-owned infrastructure for the management of learning systems. However, open textbooks are not very well established in the UK (Collins & Stone, 2019).

This is a large, complex and developing area and it would seem prudent to leave it out of scope until other distribution channels are better served.

Recommendation 22. Digital learning environments and reading lists are out of scope at present, but the project may wish to come back to them at a later date

3.2 New forms of dissemination

New forms of dissemination are as important as the traditional supply chain, if not more important for some (Inger & Gardner, 2016, Grimme et al., 2019, Watkinson et al., 2017, McCollough, 2017). Although Google is often seen as the most important, Amazon is also key as well as emerging social media channels, such as Twitter, Facebook, and LinkedIn.

It should be noted that there is also a link between the more traditional discovery platforms and new forms of dissemination. For example, many academics might use Google and Amazon to aid discovery before moving to the library for the print version.

3.2.1 Dissemination via online book sellers

Van Schalkwyk & Luescher (2017) highlight the importance of Amazon for university presses to reach foreign markets, citing both John Sherer at University of North Carolina and the University of Minnesota Press, who conduct 31% of print sales via Amazon. This makes online book retailers vital to OA university presses, but also challenging as they are trade platforms, which do not link to the OA version and cannot cope with zero pricing or print on demand, which is often registered as not in stock, out of print or even ‘used’. Deville et al. (2019) also note that “[s]maller publishers may also not have access to the expertise necessary to ensure search engine optimization (SEO) for their books”. In addition, comments in the Cambridge workshop revealed that presses were unsure of how Amazon’s catalogue procedure and whether it abides to any standards.

Commercial metadata suppliers, such Nielsen and Title Management Systems (see 4.1) offer integration into Amazon. However, more information is required on how to ensure that OA publishers’ metadata is optimized in order to improve discovery. Metadata such as overview, description, author biography, genre and sub-genre, and ‘hidden’ keywords need to be included in the recommended set of enriched metadata.

Recommendation 23. COPIM needs to understand more about the Amazon Seller Account’s backend section in order to optimize discovery

3.2.2 Dissemination via Digital Platforms

Google Books has been the dominant online portal for book readers for over 10 years (Nunberg, 2009). In the Cambridge workshop publishers reported that although they assign DOIs to books, they also need to do this for book chapters in order to improve discovery in Google Scholar.

OAPEN (Snijder, 2020) have been working closely with Google Scholar and have the following solution for the book verses book chapter issue. Firstly, they separate monographs from edited volumes (all books with at least one editor as are identified as an edited volume) by omitting the ISBN from edited volumes in the “GS metadata”. This allows single author works to be seen as one publication and edited volumes (works with more than one author) are treated as a collection of (separate) chapters. However, it should be noted that this treatment does not take into account truly multi-authored work. Although OAPEN also include chapters, which are clearly marked in the “GS metadata”. OAPEN do

Open Book Publishers (OBP) have noted that when they started to release annotated PDFs (adding book metadata to the PDF file properties: title, creator, subject and description), Google Scholar picked them up. OBP observed that the two requirements seem to be having a standalone URL where one can access the PDF directly and adding metadata to the PDF file. Otherwise, adding Dublin core tags in the HTML head seems to work in many cases, but not always. The main issue is that Google Scholar appear o take deduplication seriously, treating the first link found as canonical. If a distributing platform gets harvested before the publisher, then Google Scholar appears to link to the distributor.

Recommendation 24. DOIs at a book chapter level should be considered for inclusion in the minimum set of requirements for enriched metadata

Recommendation 25. In addition to understanding Amazon’s algorithm, COPIM should try to understand more about SEO in Google and Google Books

3.2.3 Dissemination via new forms of online Public Libraries

Mars, Zarroug & Medak (2015) state that physical public libraries have become ‘an endangered institution, doomed to extinction’, and see in the internet the potential for a new form of public library:

The public library does not need the sort of creative crisis management that wants to propose what the library should be transformed into once our society, obsessed with market logic, has made it impossible for the library to perform its main mission […]: universal access to knowledge for each member of out society. (p. 81–82)

Examples of these new forms of public library are the Internet Archive (2020), Library Genesis (2020), Memory of the World (2020), aaaaarg.fail (2020), Monoskop (2020), and UbuWeb (2020). Although some of these projects may be considered controversial because of claims to perceived copyright infringement (cf. e.g. Albanese (2020)), there is nothing prohibiting open access publishers from using them as additional means of distributing their books, in particular those platforms that aim at a general audience.

Perhaps the best known of these services is the Internet Archive. At the time of writing, the Internet Archive offers over 25 million freely downloadable books and texts, with a further 1.3 million eBook titles that may be accessed with an archive.org account (Internet Archive, 2020a). It is possible to upload books to the Internet Archive. Unfortunately, a high-resolution PDF is the recommended format. MARC records are also accepted and as much metadata as possible is encouraged (Internet Archive, 2020b). PDF as a recommended format has an implication for WP7 regarding preservation formats.

Recommendation 26. COPIM should open lines of communication with online public libraries to understand how open access books can be included in their collections

Recommendation 27. Liaise with the Internet Archive about other file formats in addition to PDF and also bout bulk uploading

3.2.4 Current Research Information Systems and Institutional Repositories

COPIM needs to consider the deposit of complete OA monographs or chapters of edited works in university or subject repositories or Current Research Information Systems (CRIS). For example, Wennström et al. (2019) showed that edited works had higher downloads and attributed this to multiple authors and editors uploading their work to repositories and personal profiles. The Cambridge workshop also identified this form of dissemination as an increasingly important group. However, this is not a new phenomenon as book data has been counted as part of repository usage by IRUS-UK since 2012 (Needham & Stone, 2012).

Digital Science have been working to make sure ONIX for OA monographs can be imported into these systems, e.g. Symplectic. OCLC are also active in this area. However, publishers have reported that there is still a problem in getting the data to flow and to show up reliably in these systems. This is not a new issue. For example, lack of DOIs in repositories was reported as an issue as part of the PIRUS2 project (Shepherd & Needham, 2010).

Recommendation 28. COPIM should liaise with Dimensions and OCLC to understand what work is being carried out in this area, how COPIM can contribute, how this effects the minimum requirement for metadata, and whether full text deposit can also be achieved

3.2.5 Reference management software

There are a large number of reference management software products available. Many of these systems are free, while other come at a considerable cost. However, in order to fit with the ethics of the COPIM project, it is appropriate to concentrate on free and open source software. This reduces the list to only a few (Wikipedia, 2020). Of these, Zotero is probably one of the best known.

Zotero (n.d.) is a project of the Corporation for Digital Scholarship (2019), which was set up in 2009 to ensure the long-term sustainability and independence of Zotero and other software. The “Corporation is organized exclusively for charitable educational purposes” under the Virginia Nonstock Corporation Act (Commonwealth of Virgina, 2009).

Zotero can import bibliographic data from a variety of formats, including MARC and BibTeX (Zotero, 2018). Zotero also provides extensive documentation and active forums.

Recommendation 29. COPIM should test the ODS with Zotero and liaise regarding any unforeseen issues

In their report on African university presses Van Schalkwyk and Luescher (2017) note that these presses need to work with their authors to maximise their use of social media in order to reach foreign academics, bloggers and reviewers. However, as Wennström et al. (2019) show, this is still a developing area, with authors rating metrics from Twitter, LinkedIn etc. as low than usage and sales as measures of impact. The authors conclude that “authors are rather leaning towards using established bibliometrics such as citations, possibly number of downloads but remain sceptical about other measures of impact or attempts to understand the readership”.

However, services such as Kudos have shown that dissemination via social media channels does lead to increased readership. Kudos describes itself as “the only platform dedicated to dissemination across the multiple networks and channels available to researchers for sharing information about their work” (Kudos, 2020). The service is free to researchers and allows them to claim their work via ORCID or through citation indexes such as Scopus to pull data into an author’s profile. The author is then encouraged to create their own set of metadata to create a plain English version of their abstract etc. Once complete, Kudos then allows the author to push their publication profile to Linked-In, Twitter, Facebook and via a shareable link, which allows tracking of the impact via Altmetrics.

A 2017 study showed that “actions performed in Kudos, such as sharing via social media and other channels, are associated with 23.1% more full text downloads” (Erdt et al., 2017). A drawback is that an author has to supply the additional data themselves, they also need an ORCID and the work must have a DOI to be claimed. Therefore, a book chapter without a DOI would not register.

Publishers and institutions pay for services, which allows KUDOS to remain free for researchers. However, this may be a barrier to smaller OA presses with limited budgets.

Recommendation 30. COPIM should review the questionnaire run by Wennström et al. to understand the view of authors with respect to social media

Recommendation 31. COPIM to liaise with KUDOS to discuss potential flow of data

3.3 Communication between scholars

It is also important to acknowledge that scholars also communicate between themselves. In the traditional print monographs space where sales can often in the region of 200 copies (Willinsky, 2009), scholars may well know the majority of their readers.

In a digital OA scenario, scholars need a suite of services that will allow them to do this. KUDOS is one such example, reference management software is another. Although Wennström et al. (2019) observed that, in their preliminary analysis of OA monograph authors use of social media, this was not widely acknowledged by the authors themselves.

There is also potential for research infrastructure such as OPERAS to play a role in communication by building networks for HSS scholars in Europe.

Scholars also need feedback from publishers on the performance/impact of their publications. This is more difficult in an OA world, where the monographs could end up being hosted in any and all of the forms of dissemination discussed above. The link between dissemination and metrics is made below, where is it suggested that COPIM look to build a suite of services to allow scholars to communicate. This may include liaison with other funded projects, research infrastructure and commercial services.

3.3.1 H-Net book channel

Communication between scholars can also be by more formal means. For example, “H-Net is an international interdisciplinary organization of scholars and teachers dedicated to developing the enormous educational potential of the Internet and the World Wide Web” (H-Net, 2019a). H-Net is based at Michigan State University, but contributing scholars come from all over the world.

H-Net Book Channel is one of the many networks and services available. “The Book Channel is a book announcement service that helps readers stay informed about recently published titles in their fields”. It imports new publisher lists via the Edelweiss service (see 4.6) and sorts them into ‘field lists’ (H-Net, 2019b).

H-Net also provides a book reviews, H-Net Reviews (H-Net, 2019c). Although COPIM cannot necessarily influence what gets reviewed, it is important to monitor reviews of titles held within the ODS and for a link to be made.

Recommendation 32. COPIM should contact H-Net to understand if metadata held within the ODS can be supplied to the network

4.0 Metadata creation

This section of the scoping report looks at the various methods of metadata creation. Although this is by no means an exhaustive list, it gives a picture of the different solutions available and some of the issues for OA monograph publishers that accompany metadata creation for monographs.

In order to build a picture of the processes involved, publishers at the Cambridge workshop were asked how they approached metadata creation. A number of presses create the metadata themselves before adding this information to their chosen title management system (see 4.1), which then create ONIX files and spreadsheets. Other library presses use their library technical services departments to create MARC data. Other presses use their publishing systems, e.g. Ubiquity Press to create the metadata. Finally, a number of presses generate everything themselves, which can be a very time consuming and cumbersome process given the many minor variations in requirements from platforms and suppliers.

A worked example of how one press creates metadata was provided at the workshop. Metadata originates from the author or editor and is then refined and enriched by the press. It was noted that author generated metadata can vary in quality, it can also be quite abstract to talk about. This metadata is then taken by providers to fit the requirements of different systems. For example, through book suppliers or via title management systems (see 4.1). These records are then converted to ONIX, XML etc. Some of these services are free, the business model is to charge end users for the metadata files (such as MARC files for libraries), or is part of a modular system, which the publisher purchases. However, this metadata is owned by the suppliers. If a press wants to use or manipulate the metadata it needs to buy it back from the supplier at cost and with a license.

The National Bibliographic Knowledgebase run by Jisc faces similar issues with its open API. MARC records cannot be retrieved due to ownership. Therefore, it is important to understand what metadata is open, what is not and who is being denied access, e.g. publishers being denied access to metadata created about their publications. COPIM is trying to give power back to the publisher by providing open metadata, rather than metadata created and owned by external suppliers. It wants to create a master record with an open licence. There was general consensus at the Cambridge workshop that licences for metadata should be made open, ideally CC0. Although many publishers at the workshop had never seriously considered the licensing and ownership of their metadata. However, one publisher does add a licence to both content and metadata to the book in order to encourage metadata re-use. There are also further complications and uncertainty about where details about metadata licences would be put in an ONIX record.

It was also noted at the Cambridge workshop that libraries tend to create their own MARC records at an individual level. For a print monograph that may sell 200 copies, this seems to be a massive duplication of effort. Libraries may also buy in MARC records from library suppliers. The potential for CCO metadata could translate to a significant cost saving for libraries. Openness and transparency regarding metadata licences was something that it was thought COPIM should look at in order to break the system of reselling metadata.

In 2019, Jisc launched Plan M to help to address “the need to implement a more efficient bibliographic metadata supply model for UK academic & specialist libraries using the Jisc NBK/Library Hub as core infrastructure”. (Jisc, 2019) This project is looking at a commercial solution to bibliographic metadata in the supply chain, whereas COPIM is looking at an open solution. However, it is important that data from COPIM feeds into the data in this project to ensure that data from OA presses is included.

Recommendation 33. COPIM needs to be able to produce MARC records for OA monographs that have an open licence to promote re-use

4.1 Title management systems

Title management systems were discussed at the Cambridge workshop. One such system is operated by Firebrand technologies whose “Title Management Enterprise Software tracks titles from pre-acquisition through post-production, while our Eloquence on Demand service is the most accurate and cost-effective way to implement ONIX and to distribute metadata and digital content to more than 500 trading partners” (Firebrand Technologies, 2020a)

These systems manage and process the whole publication workflow, essentially, they are project management systems combining workflow management with metadata feeds. The systems provide all information on an output not just metadata, this enables a publisher to just push data out. The systems also provide flags when data is incorrect, so quality checking of that data is included.

It was thought by some participants at the workshop that there was a tipping point at which a publisher requires these systems and that this was around 25 publications a year as the cost of these systems may be prohibitive for smaller publishers, although they may also benefit from the systems.

The workshop noted that it is important for COPIM to understand how they fit into the supply chain and to collaborate with the systems in order to benefit the OA publishers already using the whole workflow. However, the workshop strongly advised COPIM not to try to recreate the whole workflow as every publisher is likely to do things differently and this would not be the best use of COPIM funding. COPIM was encouraged to look at the modular approach of these systems. Potentially creating a module such as Firebrand’s Eloquence on demand module to create and package metadata. Eloquence claims to provide “richly formatted bibliographic metadata, cover images, and ebook and audio digital assets” to over 500 trading partners, “such as Amazon, Audible, Apple, Baker & Taylor, Barnes & Noble, Google, Kobo, Ingram, Nielsen, Gardners, Booktopia, and TitlePage” (Firebrand Technologies, 2020b) and metadata in ONIX formats. This is very close to the aims of COPIM WP5.

Regarding smaller publishers, it was also noted at the workshop that there were services available for smaller publishers, such as BooksoniX in the UK (2020), which also offers a modular metadata creation service.

However, it was observed in the workshop that these systems may not have the fields to add ONIX for OA monographs (see section 5.1) or metadata at chapter level, which has beenidentified in this report as important. There is certainly no mention of OA on either website, both Firebrand and BooksoniX are very much set up for sales.

Recommendation 34. COPIM should not try to recreate title management systems and this should be out of scope for the project

Recommendation 35. COPIM should liaise with title management systems in order to understand how COPIM data for OA monographs could be ingested

4.2 BDS (UK)

BDS is a UK metadata provider (BDS, n.d.a.). Metadata may be sent via ONIX feeds or an online form. BDS emphasise the importance of quality metadata from publishers. BDS send metadata to library retailers free of charge to publishers. However, they charge for a feed of the metadata back to publisher online catalogues etc. BDS also supplies records to the British Library and some UK library suppliers. Many UK libraries will purchase MARC records from BDS at the same time as acquiring shelf ready print or e-book stock via library suppliers.

BDS is also part of the CIP programme, as such, data is supplied to the British National Bibliography (BNB) (British Library, n.d.).

BDS were present at the Jisc workshop (Stone, 2018) and observed that the more fields that could be supplied in the metadata the better, such as content, summaries, abstracts. BDS do offer enhanced metadata to publishers, but this is at a cost. They also offer an RDA enhancement service (BDS, n.d.b.).

Recommendation 36. As the main supplier of data to the British Library CIP programme and to UK academic libraries, COPIM should liaise with BDS in order to understand workflows and business models

4.3 British Library free metadata distribution services

Although BDS supplies metadata to the British Libraries CIP programme, the British Library offers free metadata distribution services (MARC 21 records via Z39.50) for non-commercial use. However, is an OA press non-commercial use?

Recommendation 37. COPIM should liaise with the British Library regarding metadata supply and free distribution of OA monograph metadata on a CC0 licence

4.4 BiblioVault

Developed in 2001 by the University of Chicago Press and backed by the Andrew W. Mellon Foundation, “BiblioVault is a virtual warehouse for academic books that serves more than 90 scholarly publishers in the U.S. and Europe” (Wikipedia, 2020, BiblioVault, 2020a). BiblioVault members are able to submit and retrieve files and edit metadata. However, in addition, publishers can upload content to BiblioVault and send the files to vendors and aggregators, arrange digital printing and send material to reviewers and booksellers (BiblioVault, 2020a).

Recommendation 38. COPIM should conduct further work to understand whether it wants to offer the full service that BiblioVault offers, or just a metadata service

4.5 CoreSource

CoreSource from Ingram “allows book publishers to archive, manage and streamline digital asset management and distribution through a single, powerful platform” (Ingram, 2020). Publishers upload e-content and metadata to Ingram’s secure platform. In 2017, Ingram reported that data can then be released to over 430 business partners (Ingram, 2017). Pricing and currency conversation are included, but it is unclear as to whether OA is an option.

Ingram provide metadata standardization and distribute to retailers and libraries as well as new distribution channels such as: Amazon Kindle, Kobo, Apple iBook. Audit trails and reporting are also included. Managed file conversion services are also available via a ‘community of partners.’

In addition, CoreSource has expanded to other business partners including Bookshare, British Library, Royal Dutch Library and National Library of Germany, CLOCKS, Edelweiss, MarkMonitor and RoyaltyShare (Ingram, 2016a).

CoreSource provide information on their fees for partners, the standard rate being a $5,000 platform integration fee, a yearly Title Management Fee per title group, per year and a monthly distribution fee of 9.5% of net sales (although publishers are not charged the distribution fee if they use Ingram for distribution (Ingram, 2016b)). They also provide a list of distribution partners (Ingram, 2015a, 2015b).

4.6 Edelweiss+

“Edelweiss+ is a digital catalog platform that allows publishers to share catalogs and review copies with booksellers, librarians, reviewers, and other book professionals.” (Above the Treeline, 2020)

4.7 National Bibliographic Knowledgebase and Library Hub Create

Library Hub Create is a planned service to allow contributors who do not have their data in a Library Management System (or other exportable database/format) to easily create records for submission to the NBK. As well as these records being exposed through the Library hub services, the contributor would be able to download these records in a variety of formats, most likely MARC21, MODS, ONIX, and text. They will be free to use these records as they wish.

The intention is to create a simple form which captures the key bibliographic information required by discovery services, while not requiring any metadata creation expertise or knowledge of bibliographic metadata standards.

The NBK team would be pleased to collaborate with COPIM on creating a service that works for the Open Access publishing community, as well as more traditional library contributors.

4.8 FOLIO

“FOLIO is a collaboration of libraries, developers and vendors building an open source library services platform. It supports traditional resource management functionality and can be extended into other institutional areas” (Folio, 2020a). Folio is an interesting development, as it is a completely open source system, which encourages participation for libraries, developers and vendors. For example, the source code is available (Folio, 2020b).

Folio also runs a wiki, which includes a number of special interest groups. (SIGs). Of particular interest to COPIM is the Metadata Management SIG (Folio, 2020c), which works with developers create, edit and refine bibliographic data. The SIG also “[c]onsiders metadata storage and harmonization between traditional library materials, knowledge bases, and other forms of information managed by libraries”, which might go some way to resolve some of the issues discussed in the sections on library systems and discovery systems above.

The SIG also advises developers on interactions required between libraries and vendors to allow for creation and loading of bibliographic data. Again, this could also help with some of the issues described above.

Folio is an open system. However, EBSCO have invested heavily in the development, but do not claim any ownership. EBSCO also own book suppliers, discovery systems and vendor platforms. This creates an opportunity for COPIM if it becomes involved in discussions with both Folio and EBSCO. This could ensure that OA monographs were considered and could potentially enter the library supply chain via EBSCOs products.

Recommendation 39. COPIM should contribute to the FOLIO developer’s community

Recommendation 40. COPIM should meet with EBSCOs discovery team to discuss the potential of supplying OA monograph metadata into the library supply chain

5.0 Metadata types

This section introduces the metadata types that have been drawn together as part of this scoping report. In order to make sure all distribution channels are provided with the correct data in the correct format, a ‘superset’ of data is needed. Bull and Quimby use the term ‘standards jungle’ to describe “the sheer scale and complexity” of metadata. (2016, p.147)

Publishers at the Cambridge workshop reported that metadata schemes often do not talk to each other and that this often means that basic information such as author, affiliation or table of contents do not translate between the schemes. Furthermore, it was observed that although ONIX is a baseline solution to avoid duplication, many vendors use the data quite differently to others.

It was suggested that, in addition to the work in this report to identify a superset of data, COPIM conduct a piece of research to find where data gets lost in order to find the pain points. It might be possible to ask publishers interacting with COPIM to provide case studies to show how data gets distorted or lost in the system. These case studies could then be taken to system vendors, and maybe more importantly libraries who are paying for the service.

Recommendation 41. COPIM should add a piece of research to investigate the pain points in the workflows to try to identify where metadata gets lost in translation between the different stakeholders

Publishers at the Cambridge workshop confirmed that ONIX, MARC, CSV, XML and OAI-PMH were the main forms of metadata created. There was also discussion around why an ONIX record was, by itself, not sufficient as libraries cannot ingest ONIX, instead requiring MARC records and KBART files. It was also noted that the translation between the two is not easy because MARC and KBART are not XML based. It appears that other formats, such as BIBFRAME have more promise. These and other metadata types are discussed below.

5.1 ONIX for books

Book suppliers at the Jisc workshop (Stone, 2018) noted that for the supply chain, metadata needs to be supplied in ONIX and that this is even better if it is part of a BDS/Nielsen Bookdata feed. Suppliers can take data direct from publishers, but that would mean that every publisher would need to supply all book suppliers, a task that would be too big for any small-scale publisher.

ONIX for books is an XML-based open standard, developed and maintained by EDItEUR (2009). Book Industry Communication (BIC) define ONIX as

“a framework encompassing standardised terminology, operating methods and a data file format that’s used to communicate information about books, audiobooks, e-books and other related products. The data format is widely used across the global book publishing industry to pass data between publishers, data aggregators, wholesalers and distributors, and High Street and web-based retailers”. (BIC, 2019)

Although predominantly a framework for driving high street sales, ONIX is also important for OA monographs and the library supply chain as it has been widely adopted around the world and is used by the major metadata aggregators, such as Nielsen, Bowker and BDS. BIC (2019) note that ONIX needs to develop to meet both commercial challenges and the supply chain as a whole. They also acknowledge that ONIX is a complex data standard, but that it is possible for “even the smallest publisher” to implement ONIX.

BIC also manages an accreditation scheme to allow publishers and data providers to demonstrate the quality of their metadata. (BIC, 2020)

ONIX does cater for open access. Bell (2014) observes that metadata does not differ for books that are open access for ISBNs, formats, titles, contributors, edition, language, extent, subject, and publishing details, such as imprint and publication date. The EDItEUR FAQ (Bell, 2014) notes that specific fields such as funding bodies (see section 6.3), OA licence and an open access statement are specific to monographs. Examples given in the FAQ have been incorporated into the metadata requirements section below (see sction 8.0). However, this would mean that vendors would need to accept ONIX for books.

Recommendation 42. COPIM should follow up on ONIX for books in any vendor discussion

Publishers in the Cambridge workshop highlighted some of the issues they face with ONIX. When OA metadata is fed in the ONIX standard to commercial vendors (except Muse and JSTOR), they find that some crucial fields (e.g., license type) are not mapped, so data in those fields does not display. Google Scholar is also frustrating because even if the book is exposed as OA, it does not get indexed if it isn’t displayed chapter-by-chapter with abstracts for each chapter. Essentially Google Scholar only indexes books if they are presented as articles.

This highlights that even if COPIM can create high quality ONIX feeds, the data may not translate in various systems. It also shows that COPIM must be able to expose chapter level metadata.

Recommendation 43. COPIM needs to be able to provide high quality ONIX for books feeds at both title level and chapter level as part of its requirements

Recommendation 44. COPIM should investigate the BIC accreditation scheme to demonstrate the quality of their metadata

5.2 MARC 21

Full details of the MARC 21 schema for library catalogues are available from the Library of Congress (2000). Although MARC records are far from perfect for any digital format and despite the introduction of library discovery systems and related metadata types, MARC still dominates the library sector. As Bull and Quimby note, the “MARC format is not geared well to delivering up-to-date, dynamic and durable metadata relating to web location” (2016, p.148).

Regarding library discovery in the supply chain, it has been observed that one of the biggest issues is how to work with library systems as there is nowhere to add ‘free’ to MARC records (Stone, 2018). OA presses wish to know how they can work with libraries to help make OA content discoverable as MARC records tend to favour paywall formats.

In 2018, OCLC and the German National Library proposed an improvement to the MARC format in order for open and restricted access works to be identified. The proposal (Library of Congress, 2018) was approved in 2019 with recommendations to alter MARC 21 fields 506 (Restrictions on Access Note), 540 (Terms Governing Use and Reproduction Note), and 856 (Electronic Location and Access) in order to display open access versions.

The proposal paper (Library of Congress, 2018) does not include any examples of open access monograph catalogue records. However, this change needs to be considered in the COPIM metadata requirements and must be noted for the technical report, e.g. the ability to populate 506, 540 and 856 fields.

The proposal notes that there is uncertainty as to whether additions to the BIBFRAME vocabulary (see section 5.3) will be required. However, 506 and 540 fields both have corresponding properties in BIBFRAME.

However, integration between MARC and other standards is necessary, e.g. Bibframe and KBART. It should also be noted that libraries are finally starting to move away from MARC records in favour of using metadata from their research discovery systems, e.g. BIBFRAME and potentially CODEX from Folio.

Recommendation 45. COPIM must include the new versions of MARC 21 506, 540 and 856 fields

5.3 BIBFRAME

BIBFRAME was originally introduced as a potential replacement to MARC. However, this has not proved the case as libraries continue to use MARC in many library systems. The emergence of web scale discovery systems has seen the adoption of BIBFRAME, which “utilizes the Linked Data paradigm” (Tharani, 2015). Regarding library discovery, it would be interesting to investigate to see if the problems described in section 3.1.7 around the loss of certain attributes of metadata is related to the translation between ONIX, MARC and BIBFRAME. Therefore, a direct export of BIBFRAME into these systems should be in scope.

In addition, FOLIO (2020a) has used BIBFRAME in building its CODEX metadata model (see 5.4).

Recommendation 46. A BIBFRAME export is in scope for COPIM due to the use by Library Discovery systems

Recommendation 47. As part of the work on tracking the metadata workflow, COPIM should investigate whether a direct BIBFRAME export would help to resolve the issues encountered by OA presses

5.4 CODEX

FOLIO appears to be gaining traction, with a list of early adopters that include: Chalmers University of Technology, Cornell University, Five College Consortium (Amherst College, Hampshire College, Mount Holyoke, Smith College, UMASS Amherst), Lehigh University, Simmons University, The State and University Library Bremen, Texas A&M University, The University of Alabama, The University of Chicago, and ZBW – Leibniz Information Centre for Economics, Wentworth Institute of Technology. (FOLIO, 2020d)

FOLIO uses a metadata model, CODEX, which is “inspired by both the BIBFRAME 2 conceptual model and the Dublin Core (DC) Elements” (FOLIO, 2020d). As such it is not based on BIBFRAME but takes some elements.

The FOLIO wiki (2020e) gives further detail on the FOLIO app and the CODEX metadata model. Both need to be included in the technical report. CODEX will be considered in the metadata requirements below.

Recommendation 48: COPIM should take CODEX into account in the metadata requirements

5.5 KBART

KBART (Knowledge Bases and Related Tools) is a NISO Recommended Practice (NISO, 2020a), which facilitates the transfer of metadata from content providers, such as the ODS to library discovery systems knowledge bases in order to support link resolvers. Therefore, while it might be appropriate to provide MARC21 metadata to library management systems, it is KBART data that should be supplied to discovery systems.

By ingesting KBART data directly from the ODS into a discovery systems knowledge base, libraries would be able to select ODS as a package in order to be able to link to OA content. In theory FRBRisation within the discovery system would all for titles in ODS that had a print counterpart to be linked in the discovery system if the library already held a print copy.

NISO maintains a set of resources for content suppliers and also offers endorsement by the KBART Standing Committee, which ensures that files “are reviewed by and meet the needs of content providers, knowledge base/discovery suppliers, and librarians” (NISO, 2020b).

Recommendation 49. COPIM should review the resources made available on KBART and seek to seek endorsement of its files

Recommendation 50. Once KBART files are available, COPIM should test with partner institutions that it is available as a package to ‘subscribe’ to in library discovery systems

5.6 Subject codes/headings

Subject codes (BIC (EDItEUR, 2010a), BISAC (BISG, 2019), Thema (EDItEUR, 2020b), LOC (2020), etc.) were discussed at the Cambridge workshop. Publishers recommended that, in general, subject codes are difficult to work with and are very conservative, being based on trade books. In addition, different plaftforms have different categorizations. The general consensus was that they were a ‘nice to have’, rather than an essential feature of the metadata requirements and should be out of scope for the minimun set of metadata for COPIM.

Author generated keywords are an alternative. It was also noted that some systems, such as Library Hub cover the full record for their keyword searches. Therefore, table of contents details are far more helpful than subject headings or author generated keywords. However, this does depend on the platform.

An alternative view is given in the Nielsen report, which observed that titles that hold keywords saw 34% more sales than those that did not. (Walter, 2016)

Recommendation 51. Subject codes/headings and author generated keywords are out of scope for the minimum set of metadata

Recommendation 52. It should be possible to give the option to enter subject codes/headings and author generated keywords in an enriched set of data on the ODS. This should also include table of contents

6.0 Persistent Identifiers (PIDs)

“A persistent identifier is a long-lasting reference to a digital resource” (ORCID, 2019a). Bertino et al. note that in order to strengthen resume, cross-linking and discovery, the following three metadata types are required:

All documents published should be identified by Crossref Digital Object Identifiers (DOI) to enable “usable, interoperable, and persistent identification of digital objects”
Authors should have an ORCID ID (Open Researcher and Contributor ID) to address “the problem that the contributions of certain authors can be difficult to recognize since most names of persons are not unique, could change (e.g. in the case of marriages), have cultural differences in the presentation order of names, may contain inconsistent use of first name abbreviations and or utilize different writing systems”
Funders Registry (formerly FundREF) enables the identification of a funding institution and the research project behind a specific publication. “A taxonomy of standardized names of the funding agencies is offered by the Open Funder Registry, and associated funding data is then made available via Crossref search interfaces and APIs for sponsors and other interested parties”.
(Bertino et al., 2019, p.7)

While it is important for COPIM to be able to provide various metadata types, e.g. MARC, ONIX, BIBFRAME etc., it is also important to make sure that the use of PIDs is encouraged. PIDs are a growing trend and while some exist and are widescale use, others are only emerging while some do not exist yet. However, one particular PID, the DOI is in widescale use for journal articles, but not necessarily for OA monographs.

PIDs are important because they are unique for the entity they describe and can be used as a reference. However, they may not be compatible with certain metadata types at present, but it is important that they feature as part of COPIMs minimum metadata requirements (or enriched data requirements).

For a comprehensive report on the need for PIDs and their adoption and integration in the UK, see the report by Brown (2020), which was prepared in responce to Prof. Adam Tickell’s recommendations to UK Government (2018). The PIDs discussed below are drawn from that report.

6.1 Digital Object Identifiers (DOI)

Grimme et al. (2019, p.8) note that the book publishing industry have been slow to adopt this particular PID preferring to rely on ISBNs instead. The authors consider that publishers who do not offer book or chapter level DOIs “are doing their authors a disservice” (Grimme et al., 2019, p9). This view was also supported by a report from Universities UK (2019).

However, Grimme et al. go on to express concerns that lack of governance of DOIs with lead to the system becoming as chaotic as the ISBN system (2019, p.10). However, this issue may be resolved by Crossref’s co-access service, part of Crossref’s Content Registration Service, which “allows multiple Crossref members to register content and create DOIs for the same book content; both whole titles or individual chapters”. (Crossref, 2017). Therefore, any system created by COPIM would need to take this into account so the publisher prefixes could be registered.

The Cambridge workshop reported that many smaller publishers do not have the ability to mint DOIs, which can limit the discovery of outputs in library discovery systems. The former HIRMEOS project attempted to assist with this (Bertino, 2017) and it was suggested that COPIM should look at the experience.

It is worth noting ISBNs at this point. They are certainly not PIDs and multiple ISBNs can exist for the same work, print, digital, vendor editions etc., but as noted above, publishers continue to rely on them instead of the DOI. Grimme et al. (2019, p.9) comment “that ISBNs were designed as retail identifiers” and this creates an issue for OA as there is no incentive for a retailer to distribute the OA version because there is no sales commission on the OA version. This is despite evidence to suggest that there is no impact on sales, and in some cases sales more increase (Ferwerda et al. 2013).

DOIs can also be used to link to other PIDs, such as ORCID, which can then feed into institutional profiles (Grimme et. al., 2019; Jisc, n.d.)

Recommendation 53. DOIs at title and chapter level are in scope for minimum metadata requirements

Recommendation 54. COPIM should investigate how it can support smaller OA presses to implement DOIs and to refer to experience of the HIRMEOS project

6.2 Open Researcher and Contributor Identifier (ORCID)

“ORCID is an international, interdisciplinary, open, non-proprietary, and not-for-profit organization created by the research community for the benefit of all stakeholders, including you and the organizations that support the research ecosystem.” (ORCID, 2019b)

One of the main advantages of ORCID is author disambiguation. Authors can register for an ORCID and then claim their work via Crossref. A further advantage for institutions is the ability to track and manage research via consortium membership (Jisc, n.d.).

However, many publishers do not include an ORCID for authors in their metadata. Of those that do, only the lead author’s ORCID is included. In 2015 ORCID published an open letter to publishers, which was a call for publishers to implement a minimum standard (ORCID, 2015). Best practise guidelines for publishers are also available to assist in the automating of processes (ORCID, n.d.).

Recommendation 55. COPIM should fully implement and automate ORCID for all authors, contributors and editors on behalf of OA monograph presses. Although there may be a cost implication, which needs further investigation

6.3 Funder Registry

As can be seen from the example of ORCID above, not all PIDs are fully implemented. However, there is great potential in implementing them, or at least being prepared.

Funder Registry from Crossref is one such example. It is a freely downloadable “taxonomy of grant-giving organizations” with a CC0 licence (Meddings, 2017) and is available on Github (Meddings, 2020).

In 2019, Funder Registry held 19,000 records, it has not yet been widely adopted by funders themselves and there not an overlap between Funders Registry and institutional identifiers, such as Ringgold and GRID (see 6.4). However, a review by DataSalon concluded that “the Funder Registry has a useful role to play in linking up content and funders. And, because of its open licensing terms, it can be built into other manuscript tracking and analytical tools – if these can also address some of its quality issues and enhance the data set by linking it up with Ringgold or GRID, then it has the potential to become a powerful resource” (Margolis, 2019).

Recommendation 56. COPIM should keep a watching brief on Funder Registry and build in the possibility of adding three additional pieces of metadata - funder name, funder id, and grant id

6.4 Institutional Identifiers

There are a number of institutional identifiers currently available including Ringgold and GRID. However, it appears that there are two frontrunners.

Research Organisation Registry (ROR) is registry of research organizations, defined as “any organization that conducts, produces, manages, or touches research” (ROR, n.d.). Launched in January 2019 and seeded from Digital Science’s Global Research Identifier Database (GRID) (Digital Science, 2018) ROR makes its IDs and metadata available on a CCO licence.

The International Standard Name Identifier (ISNI) is an ISO standard (ISO 27729) that identifies public identities of individuals and organizations. It is “designed to act as a ‘bridge identifier’ to link systems where comprehensive information is held, such as Ringgold’s Identify Database”. (ISNI, n.d.; Ringgold, n.d.). Proprietary data from Ringgold is not held as part of the dataset.

Therefore, there are two different institutional identifiers. However, crosswalks with ROR and other identifiers (GRID, ISNI, Crossref Funder Registry, Wikidata) are also available (ROR, n.d.).

Recommendation 57. COPIM should keep a watching brief on ROR and ISNI and include institutional identifiers when and as needed or adopted

6.5 Research Activity Identifier (RAiD)

RAiD is a PID that is currently used in Australia and claims to connect “researchers, institutions, outputs and tools together to give oversight across the whole research activity and make reporting and data provenance clear and easy” (RAiD, 2019) Use of Raid is free, although it does not appear to be being used outside of Australia and New Zealand currently.

Recommendation 58. COPIM should keep a watching brief on RAiD as required

7.0 Metrics and metadata

Before outlining the metadata requirements for COPIM, it is important to highlight the link between quality metadata and metrics, such as usage. Essentially, the better the metadata and the understanding of the workflows involved, the better the metrics that can be retrieved in order to evaluate impact.

In a paywall model, feedback on sales can be relayed back to editors. However, metrics for OA monographs are still in the early stages of development. Metrics for monographs include COUNTER compliant usage (from various platforms), weblog data and Google analytics. Data collection is currently very mechanical with no underlying management system. A key question is how to build a narrative from these different methods of usage collection. In Europe, this is an area that OPERAS are currently investigating (OPERAS, n.d.), in the United States, Mellon funded projects (O’Leary & Hawkins, 2019) are also investigating this area. COPIM must keep lines of communication open with these initiatives in order to agree appropriate metadata requirements.

There is also the question of repository data, which services such as Jisc’s IRUS-UK (Institutional Repository Usage Statistics UK) could potentially assist with (Jisc, 2020)

Engagement must also be tracked. This includes Altmetric data, but also community engagement (not just around sales and citations). There are no solutions to this yet, but good quality metadata is key. Reviews and awards/prizes are important to authors and these need to be tracked too.

Getting information on metrics back to editors and authors is a major problem. However, studies such as Wennström et. al. (2019) have shown that authors do value this feedback. Therefore, COPIM needs to engage with various projects looking into metrics and must assist in gathering as much data as possible (within the limits of ethics and privacy) to enable a minimum viable dataset approach for metrics.

Recommendation 59. COPIM should work with OPERAS, Mellon funded projects and IRUS-UK to understand metrics requirements and how good quality metadata can feed into this process

8.0 Metadata requirements

This section will discuss further considerations and complexities regarding metadata requirements that COPIM needs to take before issuing a set of metadata requirements.

For example, BIC (2019) note that “the ’minimum set of metadata fields’ for a publisher or retailer is dependent on the nature of their business and requires thorough analysis”. For example, Nielsen have grouped together what they see as basic data elements

ISBN
Title
Format/Binding
Publication Date
BISAC subject code
Retail Price
Sales Rights
Cover image
Contributor

In their 2016 report the “see the positive correlation between the completeness of this basic set of metadata and sales, with titles meeting this level of completeness seeing sales 75% higher than those that don’t” (Walter, 2016).

Cover image could be seen as enriched data. However, Nielsen also state that titles holding a cover image have sales 51% higher than those that do not. This is true for both the US (2016) and UK (2012) studies carried out by Nielsen. Nielsen’s preferred format is .jpg, 650 pixels high at a resolution of 100 dpi (Nielsen, n.d.).

Nielsen then go on to describe what they call descriptive data, or enriched data for the purposes of this report. They demonstrate in their report that three further descriptive elements lead to increased discoverability (sales) of 72% higher than those with no descriptive data. Nielsen describe descriptive data as follows:

Title description
Author biography
Review

With the exception of review, there is no reason why this descriptive data could not be supplied alongside basic metadata. Therefore, COPIM needs to define minimum requirements for basic and enriched data, with the option to add additional data, such as review.

Other metadata types can be added to either minimum requirements or an optional set in order to give a more complete set of requirements. For example, CHORUS advises on the use of Funder Registry, author affiliation and licensing (Owens, 2018).

Regarding licensing, EDItEUR (Bell, 2014) offer some examples of how to provide OA licensing, including the ability to add statements and these should be into the requirements. EDItEUR also give advise on ‘free of charge’ products, which require the use of <UnpricedItemType> element:

<UnpricedItemType>01</UnpricedItemType>

Bell (2014, p.5) also notes that it is not possible to be explicit about green and gold open access and that “there is no way to specify a post-publication delay after which a previously non-OA monograph becomes open. Each of these could be addressed in a future update of ONIX 3.0, should the requirement arise, but at present, no such need is apparent.”

It is also noted that openness of a publication applies to the content of the book and not the book itself, so an ONIX record could have an OA licence AND a price point, if a print copy was available to purchase.

Recommendation 60. COPIM should include a pricing element that reflects ‘free of charge’

The Nielsen report also notes that, in addition to providing good quality metadata, the timeliness of the metadata being made available is also important. In the UK BIC and ONIX compliance standards require data to be supplied 16 weeks ahead of publication (Walter, 2016). The Nielsen report was able to show that if this was the case it equated to higher sales. From an open access viewpoint, this might also encourage greater discovery.

There is also a requirement to look at linguistic diversity and metadata. The English language (unsurprisingly) dominates. Many software packages assume English is the first language both as language of the software interface, and as input language. However, many OA monograph publishers exist to publish in other languages. OPERAS is looking into translation software and this could be extended to metadata (Leão et. al., 2018). However, there may be problems with characters outside the standard Latin character set, as well as RTL scripts (Hebrew, Arabic), TTB scripts (Mongolian), context-sensitive scripts (Arabic), and abigudas (most South Asian scripts, Ethiopian, Inuit).

Linguistic diversity was discussed at the Cambridge workshop. However, while the above argument is true, it was also noted by a non-English press that metadata in English is not necessarily a problem. Although metadata might be English, content can be in any language. Communication and marketing tend to be written in English and the English-language as lingua franca is accepted as the norm because researchers have to collaborate with international colleagues. Nevertheless, even if metadata are in English, non-English names and titles will need to be able to properly transcribed, for this Unicode compliance is necessary. It was suggested that COPIM should involve LIBER or IFLA and other transnational library organizations to investigate linguistic variety.

Recommendation 61. COPIM should support as far as possible non-standard Latin script forms as metadata input, and to discuss this further with IFLA and LIBER

Rather than reinventing the wheel, COPIM’s metadata requirements will build on the outcomes of the Jisc/OAPEN investigating OA monographs services project, which created a metadata model for OA monographs using ONIX, LCC, Dublin Core and CrossRef. This metadata model was then adopted for the OAPEN library. In 2016 the joint Jisc/OAPEN metadata for open access monographs project produced two documents, which detailed a metadata model for open access monographs (2016a). This project was “[d]eveloped in consultation with research funders, academics and institutional staff and OA monograph publishers, the model recommends a provisional list of metadata for OA book publishers and other stakeholders”. (Jisc/OAPEN, 2016)

The main parts of this model are:

Book –a description of the monograph or chapter
Creator –the person(s) responsible for the content of the book
Funder –the organisation(s) supporting the research
Format –a description of the digital format(s) that have been made available
Collection –a description of the collection(s) the book is part of

(Jisc/OAPEN, 2016a)

This model has been used as the basis for the NBK metadata model. These additions will also been added to the requirements together with the recommendations in the report for additional data.

9.0 Recommendations and further work

The 61 recommendations in this report have been group together in a number of themes below. The recommendations were gathered during the compilation of the scoping report as part of the narrative. These recommendations are suggestions for further investigation, rather than a list of deliverables for the work package or the wider COPIM project. The project team will review these recommendations before deciding on those that will go forward as part of the project plan.

9.1 Recommendations

The recommendations have been loosely themed in order to add context. However, it could be argued that some of the recommendations could fit more than one theme. It is hoped that a discussion of these themes and their relevance to the project deliverables will follow this scoping report in due course.

9.1.1 Metadata requirements

This group of recommendations relate directly to the metadata requirements section of the report. Unsurprisingly, they form one of the largest group of recommendations. While Thoth has the potential to hold all metadata recommendations, the presses submitting the information may be restricted in what they can feasibly contribute due to capacity issues. There is an overlap in some of these recommendations with the metadata outputs section below. In addition to a decision on basic and enriched metadata requirements, chapter level metrics is perhaps the most important area to investigate.

Recommendation 1. COPIM should consider developing two metadata requirements for OA monographs, a minimum set of metadata requirements and an enriched set. Any technical report used to build the ODS should consider both

Recommendation 14. COPIM should consider the inclusion of chapter level metadata as within scope for the minimum metadata requirements

Recommendation 24. DOIs at a book chapter level should be considered for inclusion in the minimum set of requirements for enriched metadata

Recommendation 45. COPIM must include the new versions of MARC 21 506, 540 and 856 fields

Recommendation 48: COPIM should take CODEX into account in the metadata requirements

Recommendation 52. It should be possible to give the option to enter subject codes/headings and author generated keywords in an enriched set of data on the ODS. This should also include table of contents

Recommendation 53. DOIs at title and chapter level are in scope for minimum metadata requirements

Recommendation 55. COPIM should fully implement and automate ORCID for all authors, contributors and editors on behalf of OA monograph presses. Although there may be a cost implication, which needs further investigation

Recommendation 56. COPIM should keep a watching brief on Funder Registry and build in the possibility of adding three additional pieces of metadata - funder name, funder id, and grant id

Recommendation 57. COPIM should keep a watching brief on ROR and ISNI and include institutional identifiers when and as needed or adopted

Recommendation 58. COPIM should keep a watching brief on RAiD as required

Recommendation 60. COPIM should include a pricing element that reflects ‘free of charge’

Recommendation 61. COPIM should support as far as possible non-standard Latin script forms as metadata input, and to discuss this further with IFLA and LIBER

9.1.2 Endorsement

Related to the metadata requirements are two endorsements that it would be beneficial for COPIM to seek in order to demonstrate quality of metadata.

Recommendation 44. COPIM should investigate the BIC accreditation scheme to demonstrate the quality of their metadata

Recommendation 49. COPIM should review the resources made available on KBART and seek to seek endorsement of its files

9.1.3 Metadata output

In addition to creating a high standard of metadata, WP5 needs to understand where and how this metadata will be delivered. This group of recommendations considers output of the ODS metadata. Ultimately, this would require the ODS to hold (and keep up to date) a number of output templates. Potentially, publishers could select the methods of dissemination and discovery that they felt were relevant to their output using these templates. That would then inform them regarding the minimum set of metadata required for those routes. This would cover traditional and emerging discovery channels. An issue here would be the currency of the templates, which would need to be sustainable.

Recommendation 23. COPIM needs to understand more about the Amazon Seller Account’s backend section in order to optimize discovery

Recommendation 25. In addition to understanding Amazon’s algorithm, COPIM should try to understand more about SEO in Google and Google Books

Recommendation 29. COPIM should test the ODS with Zotero and liaise regarding any unforeseen issues

Recommendation 32. COPIM should contact H-Net to understand if metadata held within the ODS can be supplied to the network

Recommendation 33. COPIM needs to be able to produce MARC records for OA monographs that have an open licence to promote re-use

Recommendation 37. COPIM should liaise with the British Library regarding metadata supply and free distribution of OA monograph metadata on a CC0 licence

Recommendation 43. COPIM needs to be able to provide high quality ONIX for books feeds at both title level and chapter level as part of its requirements

Recommendation 46. A BIBFRAME export is in scope for COPIM due to the use by Library Discovery systems

Recommendation 50. Once KBART files are available, COPIM should test with partner institutions that it is available as a package to ‘subscribe’ to in library discovery systems

9.1.4 Metrics

The scoping report makes a direct link between metadata for discovery and dissemination and the metadata required to evaluation open access books. This recommendation covers the major players in OA book metrics.

Recommendation 59. COPIM should work with OPERAS, Mellon funded projects and IRUS-UK to understand metrics requirements and how good quality metadata can feed into this process

9.1.5 Outreach and engagement

Outreach and engagement can be categorised by priority. Some contacts could be categorised as ‘nice to have’, while others are high priority if WP5 is to effectively disseminate its metadata to a range of key vendors, which libraries and researchers see as essential. Further work is required in order to prioritise the recommendations further.

Recommendation 3. COPIM should keep a watching brief on the Metadata 2020 project and makes findings available if appropriate

Recommendation 5. COPIM to liaise with the British Library regarding its metadata services and the Cataloguing-in-Publication (CIP) Programme

Recommendation 6. COPIM to make contact with the Library of Congress committee to initiate a discussion about OA books

Recommendation 11. COPIM to discuss outputs of the OAPEN workshops in order to develop this area

Recommendation 12. Liaise further with LYRASIS to better understand the approach of Library Simplified and SimplyE

Recommendation 13. Some OA publishers see JSTOR as an essential part of their dissemination. Therefore, COPIM should engage with JSTOR

Recommendation 18. Discuss with SPARC and COAR the adoption of best practice principles for data repositories for OA monograph metadata

Recommendation 21. Further work is required to understand how COPIM can communicate with indexing systems, e.g. is there a peer review level to pass through?

Recommendation 26. COPIM should open lines of communication with online public libraries to understand how open access books can be included in their collections

Recommendation 27. Liaise with the Internet Archive about other file formats in addition to PDF and also bout bulk uploading

Recommendation 28. COPIM should liaise with Dimensions and OCLC to understand what work is being carried out in this area, how COPIM can contribute, how this effects the minimum requirement for metadata, and whether full text deposit can also be achieved

Recommendation 35. COPIM should liaise with title management systems in order to understand how COPIM data for OA monographs could be ingested

Recommendation 54. COPIM should investigate how it can support smaller OA presses to implement DOIs and to refer to experience of the HIRMEOS project

9.1.6 Collaboration

Related to outreach and engagement are a number of direct recommendations, which suggest formal collaboration between WP5 and other infrastructure services.

Recommendation 2. COPIM WP5 should consider developing a set of formal links with OPERAS and NBK/Library Hub in order for a two-way exchange of information and metadata. This should include key deliverables.

Recommendation 39. COPIM should contribute to the FOLIO developer’s community

9.1.7 Workflow

Directly related to both metadata output and engagement are a number of recommendations around understanding workflow. There is also potential overlap between some of these recommendation and research below. Others are related directly to metadata output. Section 9.2 develops some of these workflow ideas further.

Recommendation 19. COPIM should engage with the large vendor platforms that do not ingest OA data in order to understand their workflows

Recommendation 31. COPIM to liaise with KUDOS to discuss potential flow of data

Recommendation 36. As the main supplier of data to the British Library CIP programme and to UK academic libraries, COPIM should liaise with BDS in order to understand workflows and business models

Recommendation 47. As part of the work on tracking the metadata workflow, COPIM should investigate whether a direct BIBFRAME export would help to resolve the issues encountered by OA presses

9.1.8 Research

During the preparation of the scoping report it was soon became apparent that a number of further research projects could be undertaken to better understand certain areas of discovery and dissemination. Suggested areas for further research are detailed below. WP5 needs to consider these recommendations to assess whether there are in scope for the project during the funding period.

Recommendation 4. Further work is required to find out how researchers discover OA content

Recommendation 9. To keep a watching brief on cultural change in order for the work package to be successfully adopted by the supply chain and libraries

Recommendation 16. Consider re-running the research of Wiersma & Tovstiadi for a selection of OA books once the ODS is released

Recommendation 20. Review scoping report and survey researchers once metadata and discovery channels are in place

Recommendation 30. COPIM should review the questionnaire run by Wennström et al. to understand the view of authors with respect to social media

Recommendation 38. COPIM should conduct further work to understand whether it wants to offer the full service that BiblioVault offers, or just a metadata service

Recommendation 41. COPIM should add a piece of research to investigate the pain points in the workflows to try to identify where metadata gets lost in translation between the different stakeholders

9.1.9 Further workshops

The work package has facilitated one workshop during the project. The COVID-19 pandemic made a second workshop difficult to organise. However, during the preparation of the scoping report is became clear that a second workshop, or more likely a series of structured interviews with library discovery system vendors would be highly beneficial to the project. This set of recommendations centres on this area.

Recommendation 7. Release a briefing paper aimed at library suppliers to increase engagement

Recommendation 10. COPIM to develop channels to address distributors and library systems vendors

Recommendation 15. To conduct a number of interviews with key library discovery vendors to better understand their use of metadata in relation to that agreed by COPIM

Recommendation 40. COPIM should meet with EBSCOs discovery team to discuss the potential of supplying OA monograph metadata into the library supply chain

Recommendation 42. COPIM should follow up on ONIX for books in any vendor discussion

9.1.10 Out of scope

The Cambridge workshop was largely responsible for this set of recommendations, which centre on topics that the delegates felt should be out of scope for the project, at least in the short term.

Recommendation 22. Digital learning environments and reading lists are out of scope at present, but the project may wish to come back to them at a later date

Recommendation 34. COPIM should not try to recreate title management systems and this should be out of scope for the project

Recommendation 51. Subject codes/headings and author generated keywords are out of scope for the minimum set of metadata

9.1.11 Recommendations for COPIM work package 2

There is a large amount of overlap between all work packages in COPIM. In particular, this scoping report has identified two recommendations for work package 2.

Recommendation 8. COPIM WP2 may wish to consider the traditional library supply chain in its modelling

Recommendation 17. Raise awareness with libraries of their purchasing power as part of the work package 2 workshops

9.2 Workflows

In 2018, participants at the Jisc workshop suggested that there needed to be further discussion with open access publishers in order to comprehensively map out the library supply chain and discovery workflow in order to understand the flow of metadata between the different stakeholders. It was hoped that this would be of benefit for new OA monograph publishers as well as to understand the various costs in the systems and to highlight poor quality metadata (Stone, 2018).

Initially we had hoped to do this as part of the scoping report. However, research into this report has shown that the workflow was even more complicated than we imagined. In their literature review, Gregg et al. note that there have been a number of attempts to map the scholarly communication lifecycle and that this provides a useful understanding, but “[n]one of the diagrams attempt to detail the responsible parties behind the creation of metadata or how and by whom it is used. A possible explanation for this gap is the granularity of metadata and the expertise required to understand its uses in a particular context.” (Gregg et. al., 2019).

Unless there is a change in the scholarly communications workflow, good quality OA metadata will not reach the point of need to guide scholars to open access content. Bull and Quimby (2016) call for “a clear business case for engagement with metadata that speaks to all individual stakeholders – around discovery, usage, best value, impact, and relationship management”. Based on the work of Grimme et al. (2019, p.9) (Figure 3), this is something that could be produced by COPIM in order to understand the flow of minimum and enriched metadata in all parts of the supply chain.

Figure 3: A typical book supply chain showing metadata and usage data flows, and impediments to their smooth operation. (Grimme et al., 2019) Published under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence https://creativecommons.org/licenses/by/4.0/

Therefore, COPIM will map these workflows accordingly alongside other projects such as the Library Publishing Coalition (2019), a two year project that will look into journal workflows.

10.0 Conclusion

COPIM WP5 will use this scoping report to inform the minimum requirements for basic and enhanced metadata. In addition, a set of additional ‘nice to have’ or emerging metadata will be included.

The intention is to ask for community feedback on the requirements and to develop them iteratively.

Furthermore, we will produce a paper, or later version of this report, which will include best practice guidelines for publishers alongside a set of case studies from early adoptors.

11.0 References

Above the Treeline. (2020). Edelweiss+. Edelweiss UK. http://www.abovethetreeline.com/uk

Adema, J. (2019). Towards a Roadmap for Open Access Monographs. https://repository.jisc.ac.uk/7413/

Adema, J., & Moore, S. A. (2018). Collectivity and collaboration: Imagining new forms of communality to create resilience in scholar-led publishing. Insights: the UKSG Journal, 31. https://doi.org/10.1629/uksg.399

Adema, J., & Stone, G. (2017a). Changing publishing ecologies: A landscape study of new university presses and academic-led publishing. JISC. http://repository.jisc.ac.uk/6666/1/Changing-publishing-ecologies-report.pdf

Adema, J., & Stone, G. (2017b). The Surge in New University Presses and Academic-Led Publishing: An Overview of a Changing Publishing Ecology in the UK. LIBER Quarterly, 27(1), 97–126. https://doi.org/10.18352/lq.10210

Albanese, A. (n.d.). Internet Archive to End ‘National Emergency Library’ Initiative. PublishersWeekly.Com. https://www.publishersweekly.com/pw/by-topic/digital/copyright/article/83584-internet-archive-to-end-national-emergency-library-initiative.html

Ball, J., Stone, G., & Thompson, S. (2019). Opening up the Library: Transforming our Structures, Policies and Practices. http://repository.jisc.ac.uk/id/eprint/7422

Barnes, C., Welzenbach, R., & Folger, K. (2017). Surveying the Scalability of Open Access Monograph Initiatives: Final Report. Michigan Publishing. https://deepblue.lib.umich.edu/handle/2027.42/139888

Barnes, L. (2020). COPIM Publishers Workshop – March 2020 – Report. COPIM. https://doi.org/10.21428/785a6451.8e138355

Bascones, M., & Staniforth, A. (2018). What is all this fuss about? Is wrong metadata really bad for libraries and their end-users? Insights the UKSG Journal, 31, 41. https://doi.org/10.1629/uksg.441

BDS. (n.d.). About Us. BDSLive. https://www.bdslive.com/about-us/

BDS. (2019). BDS Offers RDA Boost for Catalogues. BDSLive. https://www.bdslive.com/bds-offers-rda-boost-for-catalogues/

BDS. (2020). Why Work with BDS. BDSLive. https://www.bdslive.com/publishers/why-work-with-bds/

Bell, G. (2014). Open Access monographs in ONIX for Books: EDITEUR FAQ on Open Access in ONIX 2.1 and ONIX 3.0. https://www.editeur.org/files/ONIX%203/20140722%20Open%20Access%20e-books%20in%20ONIX%20FAQ.pdf

Bertino, A. (2017, October 10). Identification Services: Standards Implemented! – Hirmeos Project. https://www.hirmeos.eu/2017/10/10/identification-services/

Bertino, A., Foppiano, L., Romary, L., & Mounier, P. (2019). Leveraging Concepts in Open Access Publications. https://hal.inria.fr/hal-01981922

BiblioVault. (2020). A Digital Repository for Scholarly Books. https://www.bibliovault.org/BV.about.epl

BIC. (2009). ONIX for Books. https://www.bic.org.uk/files/pdfs/090721%20intro%20to%20onix%20rev_revised%202019.01.31.pdf

BIC. (2020). Product Data Excellence Awards. https://bic.org.uk/90/Product-Data-Excellence-Awards/

BISG. (2019). Complete BISAC Subject Headings List, 2019 Edition. Book Industry Study Group. https://bisg.org/page/bisacedition

BooksoniX. (n.d.). BooksoniX. http://www.booksonix.info/

British Library. (n.d.). Metadata services. The British Library; The British Library. https://www.bl.uk/collection-metadata/metadata-services

Brown, J. (2020). Developing a Persistent Identifier Roadmap for Open Access to UK Research. Jisc. https://repository.jisc.ac.uk/7840/2/PID_roadmap_for_open_access_to_UK_research.pdf

Bull, S., & Quimby, A. (2016). A renaissance in library metadata? The importance of community collaboration in a digital world. Insights the UKSG Journal, 29(2), 146–153. https://doi.org/10.1629/uksg.302

Collins, E., & Stone, G. (2019). Motivations for textbook and learning resource publishing: Do academics want to publish OA textbooks? LIBER Quarterly, 29(1), 1-19. https://doi.org/10.18352/lq.10266

Commonwealth of Virginia. (2009). Articles of incorporation of Corporation for Digital Scholarship organized pursuant to the Virginia non-stock corporation act. https://digitalscholar.org/assets/downloads/Corporation%20for%20Digital%20Scholarship%20%E2%80%93%20Articles%20of%20Incorporation.pdf

Confederation for Open Access Repositories (COAR), & Scholarly Publishing and Academic Resources Coalition (SPARC). (2019). Good Practice Principles for Scholarly Communication Services. https://sparcopen.org/our-work/good-practice-principles-for-scholarly-communication-services/

Corporation for Digital Scholarship. (2019). Digital Scholar. https://digitalscholar.org/

Crossref. (2017). Co-access. Support Center. http://support.crossref.org/hc/en-us/articles/115003688983

Deville, J., Sondervan, J., Stone, G., & Wennström, S. (2019). Rebels with a Cause? Supporting Library and Academic-led Open Access Publishing. LIBER Quarterly, 29(1), 1-28. https://doi.org/10.18352/lq.10277

Digital Science. (2018). GRID - Global Research Identifier Database. https://grid.ac/

Directorate-General for Research and Innovation (European Commission). (2019). Future of scholarly publishing and scholarly communication: Report of the Expert Group to the European Commission. (Website & PDF KI-05-18-070-EN-N). Publications Office of the European Union. https://doi.org/10.2777/836532

EDItEUR. (2009). ONIX for books: Overview. https://www.editeur.org/83/Overview/

EDItEUR. (2010, November). BIC Subject Categories. https://ns.editeur.org/bic_categories/

EDItEUR. (2020, April). Thema Subject Categories 1.4. https://ns.editeur.org/thema/en

Edmunds, J., & Enriquez, A. (2020). Increasing Visibility of Open Access Materials in a Library Catalog: Case Study at a Large Academic Research Library [Preprint]. LIS Scholarship Archive. https://doi.org/10.31229/osf.io/e5dvm

Enis, M. (2014, May 2). EBSCO Opens Metadata to Third-Party Discovery Services. Library Journal. https://www.libraryjournal.com?detailStory=ebsco-opens-metadata-to-third-party-discovery-services

Erdt, M., Aung, H. H., Aw, A. S., Rapple, C., & Theng, Y.-L. (2017). Analysing researchers’ outreach efforts and the association with publication metrics: A case study of Kudos. PLOS ONE, 12(8), e0183217. https://doi.org/10.1371/journal.pone.0183217

Ferwerda, E. (2020, April 15). OAPEN launches on a new platform, based on DSpace 6. OAPEN Newsletter. https://web.archive.org/web/20200415161809/https://us4.campaign-archive.com/?u=314fa411ba5eaaee7244c95e1&id=e9bc5d6a0c

Ferwerda, E., Snijder, R., & Adema, J. (2013). OAPEN-NL - A project exploring Open Access monograph publishing in the Netherlands, Final Report. Retrieved from https://www.oerknowledgecloud.org/archive/OAPEN%20Rapport_%20A%20project%20exploring%20Open%20Access%20monograph%20publishing%20in%20the%20Netherlands_22102013.pdf

Firebrand Technologies. (2020a). About Firebrand. https://firebrandtech.com/community/about-firebrand/

Firebrand Technologies. (2020b). Eloquence on Demand. Firebrand Technologies. https://firebrandtech.com/solutions/eloquence/

FOLIO. (2019). The Codex Metadata Model. https://wiki.folio.org/download/attachments/1415393/Codex%20Metadata%20Model%202017-07-07.png?version=1&modificationDate=1503009830000&api=v2

FOLIO. (2020a). The future of libraries is open. https://www.folio.org/

FOLIO. (2020b). FOLIO source code overview. https://dev.folio.org/source-code/

FOLIO. (2020c). Metadata Management SIG. https://wiki.folio.org/display/MM/Metadata+Management+SIG

FOLIO. (2020d). Supporting Partners & Contributors. https://www.folio.org/community/support/

FOLIO. (2020e). Welcome to the FOLIO Wiki. https://wiki.folio.org/

Fulcrum. (n.d.). Fulcrum Publishing. http://www.fulcrum-books.com/

Godby, C. J., Smith, D., & Childress, E. (2008). Toward element-level interoperability in bibliographic metadata. The Code4Lib Journal, 2. https://journal.code4lib.org/articles/54

Godby, J., Smith-Yoshimura, K., Washburn, B., Davis, K. K., Detling, K., Eslao, C. F., Folsom, S., Li, X., McGee, M., Miller, K., Moody, H., Thomas, C., & Tomren, H. (2019). Creating Library Linked Data with Wikibase: Lessons Learned from Project Passage. OCLC Research. https://www.oclc.org/research/publications/2019/oclcresearch-creating-library-linked-data-with-wikibase-project-passage.html

Gregg, W., Erdmann, C., Paglione, L., Schneider, J., & Dean, C. (2019). A literature review of scholarly communications metadata. Research Ideas and Outcomes, 5, e38698. https://doi.org/10.3897/rio.5.e38698

Grimme, S., Taylor, M., Elliott, M. A., Holland, C., Potter, P., & Watkinson, C. (2019). The State of Open Monographs (p. 1511246 Bytes). Digital Science. https://doi.org/10.6084/m9.figshare.8197625.v4

HathiTrust. (n.d.). HathiTrust Digital Library. https://www.hathitrust.org/

H-Net. (2019a). About H-Net. https://networks.h-net.org/node/513/pages/1301/about

H-Net. (2019b). H-Net Book Channel. https://networks.h-net.org/h-net-book-channel

H-Net. (2019c). H-Net Reviews. https://networks.h-net.org/reviews

IFLA. (2020). Functional Requirements for Bibliographic Records. https://www.ifla.org/publications/functional-requirements-for-bibliographic-records

Inger, S., & Gardner, T. (2016). How readers discover content in scholarly publications. Information Services & Use, 36(1–2), 81–97. https://doi.org/10.3233/ISU-160800

Ingram. (2015a). CoreSource Plus Business Partners. https://help.lightningsource.com/hc/en-us/article_attachments/360037654431/CORESOURCEPLUS_PARTNERS_12.10.15.pdf

Ingram. (2015b). CoreSource Business Partners. https://help.lightningsource.com/hc/en-us/article_attachments/360037663012/CORESOURCE_PARTNERS_12.10.15.pdf

Ingram. (2016a, August 10). CoreSource Distribution: It’s Not Just About the Retailers. IngramContent. http://www.ingramcontent.com/blog/coresource-distribution

Ingram. (2016b). CoreSource: EBook Compensation and Fees. Lightning Source. http://help.lightningsource.com/hc/en-us/articles/360021391691

Ingram. (2017, January 18). Publishing with CoreSource in the 21st Century | Ingram Content Group. IngramContent. http://www.ingramcontent.com/blog/publishing-in-21st-century

Ingram. (2020). CoreSource. https://coresource.ingramcontent.com/CoreSourceUI/login.action

Internet Archive. (2020b). Books and Texts—A Basic Guide. Internet Archive Help Center. http://help.archive.org/hc/en-us/articles/360016405152

Internet Archive. (2020a). EBooks and Texts. https://archive.org/details/texts

Internet Archive. (2020b). Internet Archive: Digital Library of Free & Borrowable Books, Movies, Music & Wayback Machine. https://archive.org/

ISNI. (n.d.). How ISNI Works. https://isni.org/page/how-isni-works

ITHAKA. (2020). JSTOR. https://www.jstor.org/

Jisc. (n.d.). UK ORCID consortium membership. Jisc. Retrieved from https://www.jisc.ac.uk/orcid

Jisc. (2020). IRUS-UK. https://irus.jisc.ac.uk/

Jisc/OAPEN & Snijder, R. (2016). Metadata for open access monographs. Jisc/OAPEN. https://www.jisc-collections.ac.uk/Global/Projects/IOAMS/Guides/Guide%20on%20OA%20books%20metadata%20Feb%202016.pdf

Kaiser, K., Kemp, J., Paglione, L., Ratner, H., Schott, D., & Williams, H. (2020). Methods & Proposal for Metadata Guiding Principles for Scholarly Communications. Research Ideas and Outcomes, 6. https://doi.org/10.3897/rio.6.e53916

Kemp, J., Dean, C., & Chodacki, J. (2018). Can Richer Metadata Rescue Research? The Serials Librarian, 74(1–4), 207–211. https://doi.org/10.1080/0361526X.2018.1428483

Knowledge Unlatched. (2020). Open Research Library – Knowledge Unlatched. https://www.knowledgeunlatched.org/openresearchlibrary/

Kortext. (2020). Kortext. https://www.kortext.com/

Kudos. (2020a). About. https://www.growkudos.com/about

Kudos. (2020b). Kudos—Take control of your impact. https://info.growkudos.com

Lammey, R., Mitchell, D., & Counsell, F. (2018). Metadata 2020: A collaborative effort to improve metadata quality in scholarly communications. Septentrio Conference Series, 1. https://doi.org/10.7557/5.4471

Leão, D., Angelaki, M., Bertino, A., Dumouchel, S., & Vidal, F. (2018). OPERAS Multilingualism White Paper. https://doi.org/10.5281/zenodo.1324026

Library Genesis. (2020). Library Genesis. http://gen.lib.rus.ec/

Library of Congress. (2018). MARC proposal no. 2019-01. https://www.loc.gov/marc/mac/2019/2019-01.html

Library of Congress. (2020a [2000]). MARC 21 Format for Bibliographic Data: Table of Contents (Network Development and MARC Standards Office, Library of Congress). https://www.loc.gov/marc/bibliographic/

Library of Congress. (2020b). LC Linked Data Service: Authorities and Vocabularies. https://id.loc.gov/authorities/subjects.html

Margolis, R. (2019, February 18). Funder Registry: A review. The DataSalon Blog. https://blog.datasalon.com/2019/02/18/funder-registry-a-review/

Mars, M., Zarroug, M., & Medak, T. (2015). Public Library (Essay). In T. Medak & M. Mars (Eds.), Javna knjižnica, 27/5 –13/06, 2015, Galerija Nova, Zagreb | Public Library exhibition, 27/5 –13/06 2015, Gallery Nova, Zagreb (pp. 121–137). What, How and for Whom/WHW Multimedia Institute. https://www.whw.hr/download/books/medak_mars_whw_public_library_javna_knjiznica.pdf

McCollough, A. (2017). Does It Make a Sound: Are Open Access Monographs Discoverable in Library Catalogs? Portal: Libraries and the Academy, 17(1), 179–194. https://doi.org/10.1353/pla.2017.0010

Meddings, K. (2017). Funder Registry. Crossref. https://www.crossref.org/services/funder-registry/

Meddings, K. (2020). Crossref / Funder Registry. GitLab. https://gitlab.com/crossref/open_funder_registry

Memory of the World. (2020). Memory of the World Library. https://library.memoryoftheworld.org/#/books/

Metadata2020. (n.d.a). Metadata2020. http://www.metadata2020.org/

Metadata2020. (n.d.b). Metadata Recommendations and Element Mappings. http://www.metadata2020.org/projects/mappings/

Monoskop. (2020). Monoskop. https://monoskop.org/

Montgomery, L., Saunders, N., Pinter, F., & Ozaygen, A. (2017). Exploring the Uses of OA Books via the JSTOR Platform. KU Research. https://espace.curtin.edu.au/handle/20.500.11937/69069

Needham, P., & Stone, G. (2012). IRUS-UK: Making scholarly statistics count in UK repositories. Insights: The UKSG Journal, 25(3), 262–266. https://doi.org/10.1629/2048-7754.25.3.262

Neylon, C., Montgomery, L., Ozaygen, A., Saunders, N., & Pinter, F. (2018). The Visibility of Open Access Monographs in a European Context: Full Report. Zenodo. https://doi.org/10.5281/zenodo.1230342

Nielsen. (n.d.). Notifying changes about your publications. Important Information for Publishers and self-published authors. https://www.nielsenisbnstore.com/documents/Important_Notes_For_Publishers.pdf

NISO. (n.d.). KBART for Content Providers. https://www.niso.org/standards-committees/kbart/kbart-content-providers

NISO. (2020). Knowledge Bases And Related Tools (KBART). https://www.niso.org/standards-committees/kbart

Nunberg, G. (2009, August 31). Google’s Book Search: A Disaster for Scholars. The Chronicle of Higher Education. https://www.chronicle.com/article/Googles-Book-Search-A/48245

OAPEN. (2020). Online library and publication platform. https://www.oapen.org/

Oh, K., & Colón-Aguirre, M. (2019). A Comparative Study of Perceptions and Use of Google Scholar and Academic Library Discovery Systems. College & Research Libraries, 80(6), 876–891. https://doi.org/10.5860/crl.80.6.876

O’Leary, B., & Hawkins, K. (2019). Exploring Open Access Ebook Usage. New York, NY: Book Industry Study Group. https://doi.org/10.17613/8rty-5628

OPERAS. (n.d.). OPERAS Metrics Portal. https://metrics.operas-eu.org

ORCID. (2015). ORCID Open Letter—Publishers. https://orcid.org/content/requiring-orcid-publication-workflows-open-letter

ORCID. (2019a). What are persistent identifiers (PIDs)? ORCID. http://support.orcid.org/hc/en-us/articles/360006971013

ORCID. (2019b). What is ORCID? ORCID. http://support.orcid.org/hc/en-us/articles/360006973993

ORCID, & Haak, L. (2015, December 30). Best practices for publishing organizations. https://orcid.org/organizations/publishers/best-practices

Owens, E. (2018). CHORUS Publisher Implementation Guide v.2.2. https://www.chorusaccess.org/wp-content/uploads/CHORUS-Publisher-Implementation-Guide-2018-v2.2.pdf

Oxford University Press, & Cambridge University Press. (2019). Researchers’ perspectives on the purpose and value of the monograph: Survey results 2019. https://global.oup.com/academic/news/OUP-and-CUP-scholarly-monograph-survey

Project MUSE. (2020). Project MUSE. https://muse.jhu.edu/

Pyne, R., Emery, C., Lucraft, M., & Pinck, A. S. (2019). The future of open access books: Findings from a global survey of academic book authors. Springer Nature Open Access Books. https://doi.org/10.6084/m9.figshare.8166599

RAiD. (2019). Research Activity Identifier (RAiD). RAiD. https://www.raid.org.au

Ringgold. (n.d.). ISNI Registration Agency. https://www.ringgold.com/isni/

ROR. (n.d.). Facts. https://ror.org/facts/

Schneider, J., & Steinle, K. (2019). The Heart of the Cycle: How Can Metadata 2020 Improve Serials Metadata for Scholarly Communications and Research? The Serials Librarian, 76(1–4), 156–158. https://doi.org/10.1080/0361526X.2019.1585169

Snijder, R. (2020, May 15). Google Scholar and the OAPEN Library. OAPEN - Open Access for Books. https://oapen.hypotheses.org/79

Science Europe. (2019). Briefing Paper on Open Access to Academic Books (D/2019/13.324/2). Science Europe. https://www.scienceeurope.org/media/qk2b1cq4/se_bp_oa_books_092019.pdf

Shepherd, P., & Needham, P. (2010). PIRUS2: Creating a Common Standard for Measuring Online Usage of Individual Articles. Against the Grain, 22(4). https://doi.org/10.7771/2380-176X.5598

Southern University Purchasing Consortium. (2020). UK Books, E-books, Standing Orders and Related Material. Inter-regional Agreement. https://www.supc.ac.uk/agreements-suppliers/agreements/agreement/577/

Stone, G. (2009). Resource discovery. In H. M. Woodward & L. Estelle (Eds.), Digital information: Order or anarchy? (p. 24). http://eprints.hud.ac.uk/id/eprint/5882

Stone, G. (2018, October 25). OA monographs discovery in the library supply chain: Draft report and recommendations – Jisc scholarly communications. https://scholarlycommunications.jiscinvolve.org/wp/2018/10/25/oa-monographs-discovery-in-the-library-supply-chain-draft-report-and-recommendations/

Talis. (2020). Talis Aspire. https://talis.com/talis-aspire/

Tallerås, K., Dahl, J. H. B., & Pharo, N. (2018). User conceptualizations of derivative relationships in the bibliographic universe. Journal of Documentation, 74(4), 894–916. https://doi.org/10.1108/JD-10-2017-0139

Textz.com. (2020). Aaaaarg.fail. https://textz.com/

Tharani, K. (2015). Linked Data in Libraries: A Case Study of Harvesting and Sharing Bibliographic Metadata with BIBFRAME. Information Technology and Libraries, 34(1), 5–19. https://doi.org/10.6017/ital.v34i1.5664

Tickell, A. (2018, June). Open access to research: Independent advice - 2018. GOV.UK. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/774956/Open-access-to-research-publications-2018.pdf

UbuWeb. (2020). UbuWeb. http://www.ubu.com/

Universities UK Open Access Monographs Group. (2019). Open Access and Monographs—Engagement with academic and publisher stakeholders. https://www.universitiesuk.ac.uk/policy-and-analysis/reports/Documents/2019/open-access-and-monographs.pdf

Unizin. (2019). Unizin – Empowering Universities. Retrieved 3 June 2020, from https://unizin.org/

Van Schalkwyk, F., & Luescher, T. (2017). The African University Press. https://doi.org/10.5281/zenodo.889744

Walter, D. (2016). Nielsen Book US Study: The Importance of Metadata for Discoverability and Sales. https://ia601402.us.archive.org/19/items/nielsen-book-us-study-the-importance-of-metadata-for-discoverability-and-sales/Nielsen-book-us-study-the-importance-of-metadata-for-discoverability-and-sales.pdf

Watkinson, C., Welzenbach, R., Hellmann, E., Gatti, R., & Sonnenberg, K. (2017). Mapping the Free Ebook Supply Chain: Final Report to the Andrew W. Mellon Foundation. http://hdl.handle.net/2027.42/137638

Wennström, S., Schubert, G., Stone, G., & Sondervan, J. (2019, June 5). The significant difference in impact: An exploratory study about the meaning and value of metrics for open access monographs. ELPUB 2019 23d International Conference on Electronic Publishing. Jun 2019, Marseille, France. https://doi.org/10.4000/proceedings.elpub.2019.9

Wiersma, G., & Tovstiadi, E. (2017). Inconsistencies between Academic E-book Platforms: A Comparison of Metadata and Search Results. Portal: Libraries and the Academy, 17(3), 617–648. https://doi.org/10.1353/pla.2017.0037

Wikipedia. (2020a). BiblioVault. In Wikipedia. https://en.wikipedia.org/w/index.php?title=BiblioVault&oldid=933596530

Wikipedia. (2020b). Comparison of reference management software. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Comparison_of_reference_management_software&oldid=963383518

Willinsky, J. (2009). Toward the Design of an Open Monograph Press. The Journal of Electronic Publishing, 12(1). https://doi.org/10.3998/3336451.0012.103

Zotero. (n.d.). Zotero. https://www.zotero.org/

Zotero. (2018, November 4). How do I import BibTeX or other standardized formats? https://www.zotero.org/support/kb/importing_standardized_formats

Cover Photo by Drew Graham on Unsplash

Open Zotero library for this report available at: ScholarLed / COPIM | Zotero

WP5 Scoping Report: Building an Open Dissemination System

You're viewing an older Release (#1) of this Pub.

1.0 Introduction

1.1 Building an Open Dissemination System

2.0 About this scoping report

2.1 Background

2.2 Metadata and discovery

2.3 Overlap with other initiatives

2.3.1 COPIM, OPERAS and Jisc’s Library Hub

2.3.2 Metadata 2020

3.0 Distribution channels

3.1 Traditional library supply chain

3.1.1 Library suppliers

3.1.2 Cultural change

3.1.3 Print supply chain

3.1.4 Library systems

3.1.5 Publisher platforms

3.1.6 Vendor platforms

3.1.7 Discovery systems

3.1.8 Indexing systems

3.1.9 Digital learning environments and reading lists

3.2 New forms of dissemination

3.2.1 Dissemination via online book sellers

3.2.2 Dissemination via Digital Platforms

3.2.3 Dissemination via new forms of online Public Libraries

3.2.4 Current Research Information Systems and Institutional Repositories

3.2.5 Reference management software

3.2.6 Social media

3.3 Communication between scholars

3.3.1 H-Net book channel

4.0 Metadata creation

4.1 Title management systems

4.2 BDS (UK)

4.3 British Library free metadata distribution services

4.4 BiblioVault

4.5 CoreSource

4.6 Edelweiss+

4.7 National Bibliographic Knowledgebase and Library Hub Create

4.8 FOLIO

5.0 Metadata types

5.1 ONIX for books

5.2 MARC 21

5.3 BIBFRAME

5.4 CODEX

5.5 KBART

5.6 Subject codes/headings

6.0 Persistent Identifiers (PIDs)

6.1 Digital Object Identifiers (DOI)

6.2 Open Researcher and Contributor Identifier (ORCID)

6.3 Funder Registry

6.4 Institutional Identifiers

6.5 Research Activity Identifier (RAiD)

7.0 Metrics and metadata

8.0 Metadata requirements

9.0 Recommendations and further work

9.1 Recommendations

9.1.1 Metadata requirements

9.1.2 Endorsement

9.1.3 Metadata output

9.1.4 Metrics

9.1.5 Outreach and engagement

9.1.6 Collaboration

9.1.7 Workflow

9.1.8 Research

9.1.9 Further workshops

9.1.10 Out of scope

9.1.11 Recommendations for COPIM work package 2

9.2 Workflows

10.0 Conclusion

11.0 References

Connections