Part of a short series of posts regarding some of the challenges that exist for small publishers of open access monographs, this post considers the phenomena of link rot and reference rot.
This post is part of a short series of posts regarding some of the challenges that exist for small publishers of open access monographs. Other posts discuss metadata challenges and where existing pathways to preservation may exclude the small publisher due to various platform requirements.
Link rot is an issue which affects all areas of the web. However, it is particularly important for the scholarly record and the ability to build on the work of others, verify results, and identify the provenance of ideas, metadata, and research. Approximately 60-70% of links fail to resolve after 10 years.1 For CrossREF DOIs, a small sample was recently analysed and it returned a hit rate of 97%. However, this still meant that 3% of DOIs did not resolve.2 All of these findings have implications for the archiving of books and their long-term accessibility and discoverability.
DOIs are known as a Persistent Identifier (PID). However, even if the failure rate is only 3%, what does that mean for the wider scholarly record? For works which are only available via a publisher website (or a personal page for self-published material) this has an even more grim implication. Ultimately, a DOI needs somewhere to resolve to. When a publisher ceases operating and its webpage is no longer accessible, what happens to the DOIs which were pointing to those pages? Unless another organisation or person takes over responsibility for resolving those DOIs, they will either go to a 404 page or perhaps, more seriously, could point at something entirely unrelated. As such, PIDs in the form of DOIs are only actually persistent if there is an organisation or person who has responsibility for them. The ultimate manifestation of this came in 2015 when CNRI staff failed to renew the doi.org domain name causing a global outage to a large proportion of the DOI network.3
Any publisher which uses DOIs (or another form of PID) should have a plan in place for managing those DOIs once they cease trading and/or managing their website. For instance, management of a publisher’s DOIs by CrossRef allows CrossRef to redirect to a successor publisher or archival source, such as Portico, CLOCKSS, or a National Library. For publishers who cease to operate, CrossRef will redirect to the digital preservation archive specified by publishers. But if there is no digital preservation archive specified, the DOI will stop working, because it has nowhere to direct to anymore.4 However, archival locations must be specified when a publisher deposits their metadata with CrossRef, and sadly the feature is underutilised. Additionally, at present, though a publisher may indicate an archive, there is no way for CrossRef to verify this at deposit, so the material may not actually be preserved in the specified archive and therefore lead to a broken DOI in the future, according to a January 2024 post by Martin Eve.
Having multiple copies and versions of a work available in a number of archiving or preservation platforms helps with the long-term accessibility of a book, but having a coherent plan for future discoverability can also help. Publishers should consider not only where their catalogue will be archived, but how it will be found in the future.
Link rot of links within a work, or reference rot,5 is more complex. If a DOI is not used, as noted above, it could mean that 60-70% of links will not work within ten years of a book being published. Of course, it could be even more (or sooner) than this because the research could have been done 2-3 years before a book was published so a link could be three or more years old before a work is even published. Even when persistent identifiers such as DOIs are employed, there is still some chance that it will not resolve to the originally intended target.
Some publishers have investigated automated means of mitigating the effects of link rot, e.g. Open Book Publishers. In addition the Internet Archive does some work on material added to its servers.
It can be hard for publishers to insist authors use links which are more likely to be robust. However, one tool which could be used would be the Internet Archive’s Wayback Machine which can create a snapshot of a website which can then be cited/referenced. However, because this is a snapshot of a moment in time, any interactivity can be lost as well as the context of the wider site. However, what it does provide is a robust link to the page being referenced at the point of referencing. This is useful to help preserve the scholarly record. Another tool which uses a different methodology to help make a link more robust is the “Robust Links” tool.6 However, both of these tools rely on the author of a work to make the link more robust and long-term. It can be difficult for publishers to insist their authors use these tools, which is why an automated process would be preferable.
In year two of the Open Book Futures project we are investigating options to embed automated link rot checkers and tools into the publisher/archiving workflows. We will update via the project’s PubPub pages.
Photo by Bryson Hammer on Unsplash