A summary of the findings from our Thoth Archiving Network workshop, including an examination of challenges and barriers for institutional repositories wishing to join.
Our Thoth Archiving Network workshop was held virtually on Tuesday, 2nd November 2022. Around 30 participants attended, and we thank all of you who participated and provided feedback. The video of the first half of the workshop (the presentation portion) can be found here, with many thanks to the DPC for hosting: https://www.youtube.com/watch?v=tHgq1KWzgL4
Work Package 7 Lead Gareth Cole began the workshop with a presentation, updating attendees on the activities of the COPIM Project, including Opening the Future (Work Package 3), the Open Book Collective (Work Package 4), and the Thoth metadata management system (Work Package 5), Experimental Publishing (Work Package 6), and of course, Archiving & Preservation (Work Package 7).
Gareth explained the overall values and goals of the COPIM Project and introduced the core objectives and activities of each work package. This led into the important discussion of the proposed Thoth Archiving Network, a collaboration between Work Packages 5 and 7, to create a simple dissemination system for small publishers to archive their monographs in a network of participating institutional repositories. Proof-of-concept has been developed and tested, and several universities have already agreed to take part.
Small and scholar-led presses make up much of the “long tail” of publishers without an active preservation policy in place, putting their significant contributions to the scholarly record at risk. While large-scale publishers have existing agreements with digital preservation archives, such as CLOCKSS and Portico, the small press often languishes without financial or institutional support, alongside challenges in technical expertise and staff resource. The Thoth Archiving Network would not solve every issue, but it would be an initial step towards essential community infrastructure, allowing for presses to use a push-button deposit option to archive their publications in multiple repository locations. This would create an opportunity to safeguard against the complete loss of their catalogue should they cease to operate.
For the second half of the workshop session, the attendees and COPIM colleagues were divided into three breakout rooms. The same two questions were posed for each group: ‘Would you be interested in joining the Thoth Archiving Network?’ and ‘What are the potential barriers for you joining the Thoth Archiving Network?’.
During earlier the presentation, Gareth posed a set of acknowledged challenges, and these were carried forward for discussion within the breakout groups, alongside the two main questions. These challenges were:
How many repositories have a preservation policy?
Is there metadata consistency across the repositories? If not, how to approach to assure fullest metadata record is preserved alongside the content?
Different versions of software and how this might impact archiving
At the forefront of the discussions was the typical Content Policy at universities and institutions. For many, the policy is to only include content or research created by the institution's own academics and researchers. This potential barrier had already been identified by WP7, but discussion among the attendees provided a useful confirmation. While this type of content policy is indeed widespread, there is growing momentum within some institutions towards supporting open access infrastructure and initiatives as part of the university library’s investment in research. As the role of institutions, their libraries, and research support bodies evolves within a changing open research landscape, there is little reason why the role of institutional repositories should not also evolve.
The practice of archiving with institutions is not unprecedented: CLOCKSS employs 12 mirror repositories within academic institutions across the world, which are called “archive nodes.” Though the structure and purpose are different, the premise is similar. Content policies at interested institutions will certainly be a primary barrier to the Thoth Archiving Network, but we do not see this as insurmountable. One significant shift emerging is the recognition within institutions of the need for their committed support as the academic publishing model evolves. As mentioned within the workshop by one of our participants, when comparing the cost of BPCs (and APCs) against potential investment in evolving the open research infrastructure, the balance is largely tipped toward the future of truly open scholarship: a sustainable move in the right direction, rather than perpetuating a failing business model.
One interesting suggestion considered the packaging of publishers or collections of open access monographs on a particular theme or area of research. These could relate to research specialisms at institutions participating in the Thoth Archiving Network, which could provide an angle for institutional investment, as funding choices and external involvement require justification. An institution specialising in the Arts and Humanities, for instance, could support one or a number of small and scholar-led humanities publishers by creating a “special collection” of archived monographs in their repository as part of the Thoth Archiving Network. The use and promotion of materials archived within the repositories was also raised, highlighting the importance of knowledge contribution to university research culture.
Another point raised was the level of expected involvement between the institutional repositories and the publishers, for instance if there would be expected communication on a regular basis. Based on what is envisioned, this additional communication is unlikely to be required, and the main contact would be between Thoth/COPIM and the repositories. As the Thoth Archiving Network is intended to be an easy, quick solution for under-resourced small and scholar-led publishers, the anticipated interaction would be either minimal or non-existent. The aim would not be to add significantly to the workloads of repository managers, though there would need to be some initial onboarding support at the institution.
The importance of engaging the correct decision-makers at institutions was also raised, as there could be a different combination of roles involved depending on the university. The key individuals could be the Head of the Library or Head of Research Office, or decisions may be made by representative groups, such as the Open Research Group or Research Committee. This ties into consideration of governance within the involved or potentially involved organisations, and how this will impact each step of the process as the network is implemented.
Further discussion around the capabilities of different repository software centred around the content of monographs, their formats, and potential additional requirements. For instance, what would happen if the monographs in question were complex monographs with additional content and audio/visual files, or experimental monographs in an unconventional format? Would certain repositories be unable to accommodate the archiving of this material? Our response to this insightful question was that the Thoth Archiving Network would have participating repositories of various types (EPrints, DSpace, Figshare, HAPLO, etc.), and would have a push-button functionality allowing deposit in appropriately selected repositories. There would be user guidelines indicating the appropriate archiving location for a spectrum of open access monograph types, assuring the content would be deposited in a repository that could effectively contain the monograph content.
The landscape of open access monographs in this context is only beginning to emerge1. Not all institutional repositories have a preservation layer, which is something that COPIM’s WP7 colleagues recognise, and this was a point of conversation in most of the breakout rooms during the workshop.
The Thoth Archiving Network is therefore named because the initially envisioned solution here is that there is at least one if not several additional online locations where the open access monographs would continue to exist if the publisher ceased to operate and disappeared – they are archived. This does not always guarantee “preservation”, whether bit preservation or active preservation, as a fair few institutional repositories do not have a preservation layer. While many larger publishers, as well as some small-to-medium sized publishers, do pay for their publications to be preserved in a digital preservation archive or have them preserved via a third party (OAPEN, etc.), there are still many small and scholar-led publishers that do not have any relationship with a preservation service. We do hope to involve some repositories that do have preservation offered as part of their archiving; however, this is not envisioned as a prerequisite. While it would be ideal to have a “perfect” solution, such as a central, national repository for all open access monographs for publishers of any size2, this is not yet possible. In the meantime, the importance of protecting the “long tail” of small publishers without sufficient resource to engage preservation players means that creative, community thinking is required.
University IT systems, and in particular, layers of network security, could pose a potential barrier to institutions participating in the Thoth Archiving Network. For most institutional systems, access is dependent on user identity, which could therefore complicate automated deposit via API depending on the credentials needed (or allowed). One workshop participant raised a particular query about Symplectic Elements and their authentication process.
The potential cost of hosting material on the institutional repository, particularly for those who have a preservation layer and hence additional operating costs, was raised as a key point. The small presses the Thoth Archiving Network would largely serve are likely not to be prolific publishers. Some publish as few as 5-10 monographs per year. Therefore the storage burden would not likely be very sizeable if a repository only wished to support one or two publishers. However, this is an important consideration, one which is tied to the possible future business models for the Thoth Archiving Network and which is already under consideration within the team.
Additional discussion and questions considered what might happen to existing deposited monograph records if the institution migrated to a new platform (and how this scenario might be handled by the Thoth Archiving Network); Library workflows and how content will be managed in the repositories; and questions around necessary rights and copyright that would impact the participating institution. Work Package 7 of COPIM is currently arranging a workshop around copyright, and will report the findings at a later date.
While those of us in Work Package 7 have been aware of most of the potential barriers to implementing the Thoth Archiving Network, and confirmation of these from the workshop participants was an important milestone, there were some useful and unexpected questions and challenges raised by the workshop participants that are deeply helpful. These will benefit the next steps in development now that we have nearly completed the proof-of-concept stages.
In the end, while there were certainly understandable reservations and questions about the Network, the workshop participants were generally quite supportive of the concept presented and some were certainly interested in knowing more about the Network and keen to be involved. We are hopeful that current development work will continue to progress the functionality and look forward to pilot testing with the Universities already beginning to participate.