COPIM Publishers Workshop: Discovery & dissemination — the present

Lucy Barnes

doi:doi:10.21428/785a6451.3cc46720

In preparation for the workshop, we sent our participants a questionnaire to fill in as a basis for our discussions on the day.

Questions 1 and 2 centred on the traditional library supply chain, either via book suppliers or through library discovery systems.

Question 1: The traditional/legacy supply chain

First we asked about the channels that participants are using to enable discovery and dissemination of Open Access titles. A few themes were highlighted before the workshop, such as

• A need for a metadata black-box approach: a system that does it all.

• Different forms of dissemination:

◦ Publication platforms (e.g. OAPEN, UCL Discovery, Fulcrum).

◦ Library discovery systems (e.g. EBSCO, ProQuest).

◦ Other platforms (e.g., MUSE, JSTOR, INGENTA open).

• Crossover between dissemination and metadata and also with archiving.

• The importance of good metadata, but also enriched metadata such as chapter-level metadata.

• The traditional supply chain is still very different from others. For book suppliers there is no incentive to improve things.

There was a sobering observation in the workshop: the COPIM project needs to acknowledge that challenges in the supply chain are real and that poorly distributing metadata for open access books will be used as an excuse by legacy organisations. If it goes wrong, the community will be blamed.

Discussion amongst the four groups was very varied in respect to the question. The major themes are captured below.

Title management systems

One of the areas for discussion, but so far only touched upon in our scoping study is the group of systems that publishers are using to dispatch metadata to various platforms (e.g. Klopotek, FireBrand, BookSoniX title management tools).

These systems manage and process data, and help transform feeds into ONIX. However, it was observed that these systems may not have the fields that would be required to add ONIX for OA monographs, particularly with chapter-level metadata. Essentially, these tools are project management systems, combining feeds with a workflow management system. These tools provide all available information in one output, not just metadata, which enables a publisher to push data out. The systems provide flags when data is incorrect: so quality-checking of that data is included. It was noted that it is important for COPIM to understand how they fit into the supply chain.

One group of participants noted that there was a tipping point at which a publisher needs these kinds of systems (at around 25 books per year). It was thought that this might be less of an issue for smaller publishers, although there could still be a benefit. The possibility of COPIM collaborating with these systems was raised in order to benefit these publishers.

This idea was further explored by discussing the modular approach that these integrated title management systems take. For example, Firebrand has a module called Eloquence, which captures metadata and bundles it. A publisher of a certain size needs the whole workflow tool. However, COPIM could look to develop a module such as Eloquence for smaller publishers. The group strongly advised COPIM not to try to recreate the whole workflow, as every publisher likes to do things differently.

It was also noted that these systems (such as Firebrand, Klopotek, Consonance, and VirtuSales) are expensive. However, BooksoniX seems to be an affordable solution for at least one publisher at the workshop, which debunks the idea that only larger publishers might use these systems. It was suggested that COPIM should look into this further.

Print supply chain

It was seen as very important to get open access monographs into the print supply chain. There is an obvious blurring between traditional methods (library supply chain) and other forms of distribution (Amazon, which is often the number one discovery platform for print). Both are print-centric and both have a problem dealing with zero cost. This creates a problem for ONIX feeds as a zero-price point is rejected. For example, CoreSource struggles with a zero-price point.

Should publishers maintain two forms of distribution? Possibly not, as this does little to link open access to print. It was suggested that there is evidence that for researchers there isn't a digital and print book. The search starts digitally but there is an inflexion point where researchers move to print and they want this to be seamless.

A takeaway point was the need to think about things from a researcher’s perspective. For scholars, digital and print are living concurrently. But there is 16 times more engagement with open access content than paywalled content.

Discovery vs. dissemination

COPIM needs to understand the difference between pushing content and discoverability. For non-OA publishers, a lot of time is spent on promotion. For OA, there might not be a paywall, but there is a gap between being discoverable and people actually knowing about it. This is a relevant issue for the library supply chain and the supply of metadata from COPIM. However, it is important to note that print is not at risk; both digital OA and print can co-exist. The issue is, how can we get the metadata to express that so that OA can convert to print sales (if that is a model used), if researchers prefer to read the print (which we know many prefer).

The role of ACCUCOMS and Burgundy is worth considering in this respect.

A takeaway was to consider how we can get scholars to engage with OA content – to cut through the noise. What does promotion in an OA context actually mean?

Publisher platforms

A number of publishers have their own platforms for digital delivery (e.g. MIT Press) and now have a more direct relationship with libraries. This has made them realize the amount of work that aggregators do on publisher’s behalf. Other publishers use an eCommerce site. However, it was noted that for each platform the metadata information needs to be uploaded again. More efficient ways are required to do this to lessen the staff time for smaller publishers. In addition, some platforms require metadata at a chapter level, others do not.

Therefore, the issue is not whether the book is open or behind a paywall, the issue is the difference between platforms. Larger publishers have resources to do this, smaller publishers do not. A consortial way of working would work for these publishers.

Discovery systems

The prioritising of publishers by discovery services was discussed. Often those publishers that are seen as the ‘most important’ or those where the vendors have received complaints are seen are higher priority. This can often lead to smaller OA presses being caught in the backlog. It was suggested that system vendors often blame this on the publishers rather than vendor prioritisation of workload. The power is with the libraries as customers. However, this is not always realised and the strong voice that libraries have is often forgotten. Understanding this power distribution is really important for COPIM as a project.

Library Simplified and SimplyE from LYRASIS and the Digital Public Library of America is a middleware solution with a user app for patrons. It functions like an RSS feed. This is something that COPIM needs to investigate further. Publishers have platforms for more reasons than discoverability. However, it might be possible to pull from publisher platforms as a form of blog feeder. It could encourage better metadata.

It was suggested that COPIM needs to understand more about researcher discovery. This step needs to happen after discovery in all channels is improved and monitored. That will allow more focus on the most popular channels. This may mean that the scoping report needs to be developed as the project continues (e.g. after each scoping, development, outreach phase).

Library systems

COPIM was reminded that libraries and library systems still play a very important function as a broker for information and should not be forgotten. They help build trust in the metadata. However, many cataloguers still need a print book and then add everything else on to this record. It was suggested that librarians might trust the printed form more than the digital, and both more than OA. This is based on a perception of value – the more you pay, the better the ‘quality’ must be. This works for metadata too, and many libraries will buy metadata alongside shelf-ready books.

It was felt that small publishers can be at a disadvantage. Therefore, can COPIM develop channels that are large enough to speak to the large distributors and to get the (open) metadata taken seriously? What kind of alliances are needed?

It was also suggested that COPIM tap into some of the outputs of recent OAPEN workshops, which looked at library services. For example, add-on services, feeds on books and automated reports.

Link between metadata and usage

There is an important link between good metadata and metrics, such as usage. In a paywall model, feedback on sales can be relayed back to editors. Getting information on usage back to editors and authors is a huge problem. Therefore, COPIM needs to engage with various projects looking into metrics, e.g. Mellon-funded projects and OPERAS, to ensure that metadata supports both discovery and evaluation of use.

DOIs

Many smaller publishers do not have the ability to mint DOIs, which can limit the discovery of outputs in library discovery systems, such as Primo Central. The former HIRMEOS project attempted to assist with this and it was suggested that COPIM should look at the experience.

However, DOIs are heavily misused, often being assigned to every format of the book. In addition, many publishing platforms have their preferred version of the book, e.g. PDF.

Question 2. How do you presently supply metadata to these channels?

Question 2 asked what metadata was created, who creates it and what licence (if any) is attached.

What metadata is created?

Discussion in questions 3 and 4 touched upon metadata creation and its appropriate use. This discussion fits with question 2, so is noted here. It was not surprising to hear that the ONIX, MARC, CSV, XML and OAI-PMH were the main forms of metadata created.

There was discussion around why an ONIX record was not sufficient. Libraries cannot ingest ONIX, so MARC records and KBART files are parallel systems that are preferred by libraries. However, the translation between the two is not easy because they are not XML based. It appears that other formats, such as bibframe have more promise. See ‘Lost in translation’ below.

Who provides the metadata?

A worked example of how one press creates metadata was provided: metadata originates from the author or editor and is then refined and enriched by the press. It was noted that author-generated metadata can vary in quality. It can also be quite abstract to talk about. This metadata is then taken by providers to fit the requirements of different systems.

Subject codes/headings and ‘enriched data’

Subject codes were also mentioned. It was decided that these codes are very hard to work with and are very conservative, being based on trade books. Subject headings are not seen as necessary and should probably be out of scope for COPIM.

Author-generated keywords are an alternative. It was noted that some keyword searches cover the full record. Therefore, table-of-contents details are far more helpful than subject classification. This does depend on the platform, however.

Beyond any minimum requirement that COPIM might produce for metadata, it would seem to be useful to include a minimum requirement for enriched data too.

Lost in translation

It was noted that metadata schemes often do not talk to each other and that this often means that basic information such as author, affiliation or table of contents does not translate between the schemes. The groups taking part in the discussion on question 3 and 4 also noted that although ONIX is a baseline solution to avoid duplication, many vendors use the data quite differently to others.

It was suggested that COPIM conduct a piece of research to find where data get lost in order to find the pain points. It might be possible to ask publishers interacting with COPIM to provide case studies to show how data gets distorted or lost in the system. These case studies could then be taken to system vendors, and maybe more importantly libraries who are paying for the service.

Licences

COPIM is trying to give power back to the publisher by providing open metadata, rather than metadata created and owned by external suppliers. It wants to create a master record with an open licence. There was general consensus that licences for metadata should be made open, ideally CC0.

One publisher adds the licence of both the content and the metadata to the book in order to encourage metadata to be re-used as widely as possible. However, not all publishers have considered licences for metadata. This is further complicated by uncertainty about where details about metadata licences would be put in an ONIX record.

It was also noted that in addition to vendors, libraries tend to create their own (MARC) records at an individual level. For a print monograph that may sell 200 copies, this seems to be a massive duplication of effort. Libraries may also buy in MARC records from suppliers. The potential for CCO metadata could translate to a significant cost saving for libraries.

Openness and transparency regarding metadata licences was something that it was thought COPIM should look at in order to break the system of reselling metadata.

Final thoughts

This first workshop was primarily for publishers; further workshops and meetings need to take place with the whole community (e.g. libraries and vendors) to ensure that the final outcome of the project is seen as a collaborative, community-driven effort by all stakeholders—for example, a solution that libraries would feel sympathetic towards and support. There is crossover here with the other work packages in the project.

The other reports from the Publishers Workshop can be accessed here.

Header: photo by Noble Mitchell on Unsplash