Description
Details of a technical model for computational book publishing using Jupyter Notebook and Quarto
Applying our technical model for computational publishing to building a publishers' catalogue for ScholarLed presses
Our previous blogpost about computational publishing outlined the technical model that we are using to create computational books that combine text with Python code to pull in datasets and digital media objects. This blogpost discusses a particular use case for this computational publishing model and how we have designed a working prototype of a publishers’ catalogue for ScholarLed.
ScholarLed is a publishing consortium of scholar-led open access book presses and has been a key partner in the COPIM project. During COPIM’s computational publishing pilot project, we have worked closely with ScholarLed (and Open Book Publishers in particular) to get the perspective of a publishing press and to ensure that our outputs can be fit into a traditional publishing workflow.
As a consortium of six open access presses, ScholarLed had a use case for a publishers’ catalogue that would present all their recent book publications in one catalogue. This would involve collecting bibliographic metadata from all of the presses, collating and arranging it, and then publishing the list of published books in a single publication. Fortunately all the presses have included the metadata for their monograph publications in Thoth, the open metadata management and dissemination platform that has been produced as another COPIM output. This made it possible to easily conceive of a catalogue published as both a website and a PDF that pulls in and arranges the bibliographic metadata automatically and that can be updated on a regular basis without manual intervention.
ScholarLed’s use case therefore had two main aspects that our proposed catalogue was able to address. First, ScholarLed specified a catalogue that pulls in publications metadata from all the different publishers in the consortium into one shared catalogue to enable collaborative marketing: Thoth provides all the metadata from one single source and makes it available through open APIs that make it easy to access and with a CC0 license that makes it permissible for us to use and distribute the metadata in any way we want. Second, ScholarLed specified an automated catalogue so that manual labour wasn’t required to update it when new publications were released: our model for computational publishing is fully automatable and so the catalogue could be rebuilt and refreshed on a regular basis so that any new publications that appear on Thoth are available in the catalogue within at most 24 hours.
Our computational publishing model and workflow allowed us to put together this catalogue prototype very easily using only a few pieces of readily-available open source software: Quarto, Jupyter Notebook, and Git. This provided us an instant framework for web publication that didn’t require editing any HTML or CSS.
Using the model discussed in our previous blogpost, we started by creating a Jupyter Notebook that performs the computational functions we needed i.e. pulling bibliographic metadata from the Thoth API and arranging it into the form we want to display. Thoth has a well-documented GraphQL API that can retrieve the metadata required and a Python client specifically for accessing Thoth’s APIs. By getting all works
in Thoth which have work_status=’ACTIVE’
, which have work_types=’MONOGRAPH’
, and which have publisher IDs corresponding to those of the ScholarLed press we want to retrieve, we get a list of all the books published by that press. We can also order them in descending order of publication date by adding the parameter order='{field: PUBLICATION_DATE, direction: DESC}'
. Bringing it all together, here’s a Python code snippet for retrieving all works for punctum books and printing the raw output:
from thothlibrary import ThothClient
# publisher ID variables
punctum = '9c41b13c-cecc-4f6a-a151-be4682915ef5'
publishers_ids = '["' + punctum + '"]'
# calling the Thoth GraphQL API
thoth = ThothClient()
response = thoth.works(publishers=publishers_ids, work_status='ACTIVE', work_types='MONOGRAPH', order='{field: PUBLICATION_DATE, direction: DESC}')
print(response)
For each work in that list of works, we then display the appropriate metadata in the order we want it displayed: full title, contributor and contribution type (most likely authors and editors), publication place, publisher, date of publication, DOI, short abstract if available, and a cover image if there is a URL for a cover image. We also break the list up by month and year so each month’s published titles appear under a heading for that month.
The whole Jupyter Notebook for punctum books looks like this:
Each ScholarLed press has a separate Jupyter Notebook to display its publication output with an additional Notebook to cover the outputs of all the presses combined. Using Quarto, each Notebook is then specified as a ‘chapter’ of our eventual catalogue.
To render the Notebook files as a book, we define the book structure in _quarto.yml as above and run Quarto’s rendering process with the Unix command quarto render
. As discussed in the previous blogpost, this process takes the input files (Markdown and Jupyter Notebook files) and renders them into a static HTML site and whatever other format we specify (in this case, PDF). We then push the site to GitHub and GitHub Pages displays it as a website. Our prototype catalogue is stored as a repository on my GitHub and displayed on GitHub Pages at https://simonxix.github.io/scholarled_catalogue/ (embedded below).
Every aspect of this workflow can be automated on the Unix command line. By putting the repository on a Linux server and setting a cron job to run the rendering process via the following Bash script, we end up with a catalogue of the ScholarLed presses’ books that refreshes its contents and pushes back to GitHub Pages automatically on a regular basis. We’ve set the process to run once a day at 0100 GMT. It may also be possible to automate this workflow without a dedicated server by using GitHub Actions to host a virtual machine to run the Jupyter Notebook files and render using Quarto.
Our ScholarLed publishers’ catalogue offers a working prototype of an automatically-updated book that retrieves the data for its content directly from an API. It does so with readily available open source software that can be installed with relative ease by anyone who wants to use this model to create a publication with computational elements. From this prototype, it’s easy to imagine building publications to display particular book collections from library catalogues or catalogues of works referenced in a particular scholarly work.
The banner image for this blogpost is ‘books on ground photo’ by Laura Kapfer licensed under the Unsplash License.