In preparation for the workshop, we sent our participants a questionnaire to fill in as a basis for our discussions on the day.
Questions 3 and 4 looked at new ways of dissemination away from the traditional library supply chain, for example Amazon and Google.
The groups discussing questions 1 and 2 inevitably touched upon new and emerging routes to discovery too. This discussion has been included in the following sections.
We can get a little bit obsessed with library systems, but we need to remember that many people go to Google first. It was also noted that Amazon is very influential, but they use their own way of cataloguing and we are unsure if it abides to any standards. However, the biggest problem for OA monographs is that platforms such as Amazon are trade platforms and they want to sell the print (or digital) book, so there is no link to the OA editions. It appears that they scrap that part of the metadata. In fact, for some publishers who sell print on demand in addition to the OA version, the Amazon metadata will say not in stock or used.
There is also a link between the more traditional discovery platforms. Many academics might use Google and Amazon to aid discovery, however, in the early day of the JSTOR OA platform it was observed that there was a considerable number of users coming in from outside the institution. Therefore, the books reached beyond the academy.
The H-Net book announcement service was mentioned as something for COPIM to investigate further. Similarly we will need to look much closer to platforms such as EBSCO and Gobi to better understand how they receive, process, and output metadata, and how that influences monograph discoverability.
Conversation for both questions took various directions and although new and emerging forms of discovery were not particularly touched upon, we covered a number of useful areas around discovery.
There is a need to look at linguistic diversity of both metadata and the systems in which these metadata are stored. The English language (unsurprisingly) seems to dominate. Many software packages assume English is the first language. However, many OA monograph publishers exist to publish in their native language(s). OPERAS is looking into translation software and this could be extended to metadata. However, there may be problems with including larger (non-Latin) character sets and diacritics, as LTR orthography.
However, is metadata in English a problem? Although metadata might be English, content can be in any language. Comms tend to be written in English and the English language as lingua franca is accepted as the norm because researchers have to collaborate with international colleagues. It was suggested that COPIM should involve LIBER or IFLA and other transnational library organizations to investigate linguistic variety.
There is no broad standard yet for the inclusion of accessibility metadata.
The availability of a FLOSS solution would be valuable, especially to smaller/start-up presses, as it would offer the possibility of developing a system that is fully ebook-friendly, incorporating the latest standards (such as Orcid, Thema classifications, chapter-level metadata, etc).
Discussion in questions 1 and 2 looked at the perspective of the scholar and it seems prudent to look at them under question 4. It was observed that scholars want to talk to each other. Therefore, they need a suite of services that allows them to do this. This could be the next logical step for the publishing industry, and it was argued that this is what Elsevier are pursuing. Do we need to build services for these groups of scholars so that they can access OA monographs, not for profit, but to sustain research? This links to the work of OPERAS in this area, but also to open licences for metadata.
However, publishers might add their books to DOAB, but they won’t always know how the books then get into other indexes, repositories etc. While this is a good thing, it means that feedback on the publication is often hard to get. It would be good to understand these workflows and processes further, and also the link to metrics.
The link between metrics and metadata was made in the discussion around questions 1 and 2. Question 4 allowed for further discussion in this area.
All participants agreed that metrics for OA monographs are still in the early stages of development. It was suggested that we need to gather as much data as possible, within the limits of ethics and privacy. This will then enable us to move forward with a minimum viable dataset approach.
Metrics for monographs include COUNTER-compliant usage (from various platforms), weblog data and Google analytics. Data collection is described as very mechanical with no underlying management system. A key question is how to build an overarching narrative from these different methods of usage collection. This is an area that OPERAS are currently looking at in Europe, and Mellon-funded projects are looking at in the United States. COPIM must keep lines of communication open with these initiatives. What about repository data?
Engagement must also be tracked. This includes Altmetric data, but also community engagement (not just around sales and citations). There are no solutions to this yet, but it is understood that high-quality metadata is key.
Reviews and awards/prizes are important to authors and these need to be tracked too.
We also need to understand how to feed this back to authors. This is often facilitated by hand-crafted reports, which might work for smaller presses, but it needs to be automated for larger publishers.
Regarding metadata, there are no current standards to support altmetrics or other metrics in metadata. However, this would be an area that could be followed up.
A bigger-picture question was also brought up: What is the boundary between data and metadata? At the moment that there are discussions to consider references in article metadata (or to include the referenced DOIs as metadata), and at the moment that in an ePub the format of the text is essentially the same as the format of the metadata record (both XML-based): what is the distinction? How do we sensibly enable both human- and machine-readability of both data and metadata?