The catalogue prototype covered using an open-source computational publishing toolset
In collaboration with COPIM’s Computational Book Publishing Pilot Project and a plethora of other partners (including NFDI4Culture and the Open Science Lab at the German National Library of Science and Technology (TIB) Hanover) we have been working on creating technical and publishing workflows for the creation of computational books, as part of which we have been working on a selection of prototypes. One prototype, a joint publishers’ catalogue pulling in publications from several publishers, has already been discussed here. This blogpost will discuss a further prototype that focuses on creating exhibition catalogues, which is also based on the earlier described technical workflow for computational publishing.
The catalogue prototype was created using an open-source computational publishing toolset. Our objective was to test automatic retrieval of remote media and linked open data sources and then auto‑typeseting the collated publication as multi‑format. The prototype is available for community reuse to enable others to make their own publications and is accompanied by a step‑by‑step guide.
Our publishing experiment involved looking at how a painting exhibition catalogue can be produced using modern machine-readable and networked tools as opposed to conventional publishing tooling and workflows.
The catalogue that was made contains all the parts of a book you would expect in an exhibition catalogue: cover, colophon, essay, a catalogue of artworks, and a bibliography. Art exhibition catalogues are usually one of the few records of an exhibition, and serve as important cultural records, or as a career milestone for those involved, but unfortunately they are one of many blind spots in the scholarly record and easily get overlooked or even orphaned.
The challenge that the experiment took on board was to see if a computational publishing toolset taken from STEM disciplines and data science could be used to bring together media and content from remote sources: Wikidata — Linked Open Data (LOD) including paintings and their records, 3D models, video, bibliographic data, an essay, and PID data.
The sample content used in the publication prototype was from a Baroque frescos and ceiling painting collection in Germany catalogued by the Barocke Deckenmalerei in Deutschland CbDD (Baroque Ceiling Paintings in Germany). Our connection to the collection was through existing work by colleagues from Semantic Kompakkt, partners on te German research infrastructure project NFDI4Culture, who were already working with LOD 3D cataloguing of the castles housing the artworks.
The outcome of our publishing experiment was the successful development of a workflow to make a mock online exhibition catalogue publication. A workflow that produced multi-format outputs with automatic heuristic typesetting, could automate the collection of remote LOD and media, and store the content in a versioned Git storage. The end result prototype publication was christened Baroque TOC after the simple way of bringing together the remote sources using an inventory list or table of contents.
The working process adopted during the project was interactive. Firstly, there was a process of pulling together ideas, defining questions, scoping requirements, and accessing tooling, etc. The next stage was an initial proof of concept to show that technically the systems being used delivered on what was promised. After seeing that the systemw worked — the bird could fly — we moved onto user testing with workshops and classes. Two hands-on-workshops were held, and in addition a class unit was run with eleven students at the Hochschule Hannover, where the students each made their own custom collections.
The result of this project has also been to produce a workshop and publication guide as well as a library of Jupyter Notebooks, with each Notebook performing a specific content retrieval process.
Conventionally a publication is made by collecting content in a directory and then using a word processor or layout program like InDesign to make a layout design for print and PDF. The content then might be post-processed for web and e-publications, and deposits.
Our model of publishing used in this experiment taps directly into data sources — archives, repositories, linked open data, bibliographic sources, and PID data — cutting out any intermediaries and potential manual data dislocation.
For the prototype the ADA Computational Publishing Pipeline was used, this has three components for publishing automation:
Jupyter Notebooks to write scripts to retrieve data,
Quarto for collation and multi-format typesetting, and
Git for storage, collaboration, and publishing.
The project relied on the collaboration of a number of partners, and without their dedicated input, support, and shaping of ideas the positive results would not have been possible. Important to mention here is Open Book Publishers (OBP), who provided the benchmark for what is needed to get a publication over the line. Team COPIM — and especially Simon Bowie — for bringing a keen interest in the question of computational publishing and developer know-how. The NFDI4Culture TA4 Data – Cultural Publications Working Group have given input on the needs of repository managers and on probing the relationship of LOD and library records data. In addition, TIB's Semantic Kompakkt – NFDi4Culture team have been invaluable for breaking ground with Wikidata and Wikibase implementations for this project to work on top of.
In conclusion, it is worth commenting on the state of computational publishing for culture. This project could be realised because a technology base and ways of working have been accruing for more than a decade around open science, FAIR data methods, and openGLAM. But there is still a very steep skills gap to climb to work with the tooling and data science. And although there are many initiatives out there to support training, it would appear — across the board — a lack of data science skills is a barrier for the majority of people, including authors and publishers.
Hence, in an attempt to tackle this in our own modest way, ongoing contributions will involve workshops, classes, and libraries, and we will also share best practices and workflows such as these to put computational publishing into practice, which might be helpful for publishers thinking to adopt computational publishing workflows or the experiment with the publication of computational books.. The first class for Publishing from Collections will be at the FORCE11 Summer School Institute (FSCI) in cooperation with UCLA Library from July 31 to August 4, 2023, with registration open here.
Here you can find links to the publication and resources produced for the project:
Exhibition catalogue prototype (fork me): Baroque TOC
Step-by-step guide: Automating Exhibition Catalogue Creation — A Guide
Jupyter Notebooks library: ADA Computational Publishing Notebook library
Poster: Baroque TOC: Publishing from Collections Using Computational Publishing and Linked Open Data
Open-source software project: ADA Computational Publishing Pipeline (ADA CP Pipeline)
Initial engineering prototype - https://github.com/NFDI4Culture/cp4c
COPIM Workshop guide and videos (Feb 2023)
By Simon Worthington, NFDI4Culture – TA4 Data Publications @ TIB, Open Science Lab. ORCiD: 0000-0002-8579-9717. Berlin, April 2023.
Reuse: Venus und Cupido, Heinrich Bollandt, between circa 1620 and circa 1630. This work is in the public domain. | Baroque pearl with enamelled gold mounts set with rubies. Creative Commons CC0 1.0 Universal Public Domain Dedication. This file was donated to Wikimedia Commons as part of a project by the Metropolitan Museum of Art.