Description
As part of COPIM's work package on experimental book publishing, we are investigating the emerging area of computational book publishing.
Within the Computational Books pilot project we have previously looked at creating technical models and workflows to publish computational books. Work is ongoing on several prototypes of computational publications, together with author and publisher communities (including ScholarLed and the NFDI4Culture consortium). Together with Rupert Gatti from Open Book Publishers, we have also been exploring what it would mean for a publisher to either adapt their standard print/digital book publishing workflow to be able to incorporate computational books, or create a different workflow that sits alongside the standard publishing workflow. The latter is what we will be focusing on in this post. We have also further discussed issues of archiving and preservation with our COPIM WP7 colleagues Gareth Cole and Miranda Barnes together with TIB colleague Simon Worthington, which we will focus on more in depth in a future blogpost.
Developing a workflow for the publication of computational books brings with it several questions about the functions of an academic book and what we want to be able to do with a book once it is published, which might differ according to which perspective you take into consideration (i.e. the perspective of an author, publisher, or reader). This includes how to balance the potential of more dynamic and interactive elements that a computational publication can offer with a requirement for more fixed and stable outputs to serve dissemination and preservation purposes. It also includes considerations of the needs of authors and readers when it comes to the kinds of computational aspects that would be useful for them when creating in comparison to when interacting with book-based research. Finally any workflow for this type of experimental book publication will have to consider issues of linearity versus modularity as we have seen throughout our pilot research.
The publishing workflow we have attempted to model underneath is based on a technical workflow previously outlined by Simon Bowie here, using a combination of Jupyter Notebook and Quarto. Quarto is essentially a static site generator that turns Markdown files and Jupyter Notebooks into other formats, in other words it combines a Markdown workflow with Jupyter Notebook platforms, Pandoc (which allows multi-format outputs), and static site generator output, meaning that it can output in a variety of formats that are relevant for book publishers (e.g. ePub, PDF, HTML, see a full list here). Quarto also has options for: revealing or hiding source code for the reader to review if needed; including interactive content blocks, and; styling and formatting available for book Front Matter, Table of Contents, etc.
For the computational book publishing prototypes we have been working on, the ScholarLed catalogue and the Baroque paintings exhibition catalogue, we have experimented with pulling through a range of digital objects in Jupyter Notebooks to see how they render in Quarto, including the integration of code, Wikidata linked open data, 3D models, high-res images, video and live ORCID queries. One of the main questions is whether the rendering that Quarto offers is always a stabilisation of the computational or interactive objects in Jupyter Notebooks, in other words, whether it is just a display mode in which files are rendered in a directory structure. Would these renderings still function as a ‘computational book’? It seems that although Quarto does indeed stabilise many of the interactive elements, it does keep some interactively within HTML. Interaction can also still happen independently from the platform, as files can be edited via different tools (online with Binder, or with Jupyter Notebook, etc.). In other words, the underlying files can be interacted with independently from Quarto, as the computational book would basically exist as a GitHub repository. You can run the files in this repository in any programme/platform that runs a Jupyter Notebook. From there you can again run the Quarto rendering process and it outputs them into a directory. This process therefore gives publishers a lot of liberty in relation to the tools they want to use to produce the input files with (whether in JupyterLab, Visual Studio Code, GitHub itself). This would also allow reuse in the form of creating a fork of the files, creating different and clearly delineated releases of the repository.
One example of this would be the book Aesthetic Programming (2021), authored by Winnie Soon and Geoff Cox and published by Open Humanities Press, which is a book that exists in a GitLab repository, which has also been forked by Mark Marino and Sarah Ciston (Marino, 2021; Marino & Ciston, 2021). The authors are currently working on a Chinese translation of the book as a separate fork.
But what kinds of interactive engagements with the digital objects in a computational book are then possible for a reader when it is rendered as a HTML output (and potentially also as ePub)? In terms of interactivity the user can for example move and rotate a 3D model, and observe it in three dimensions. A small dataset that can be rendered in JavaScript can be rendered interactively, which means that custom charts and graphs can be created for example, which the user can adjust slightly based on the data in that set. However, with different output formats we are also going to lose certain elements of interactivity. Interactions or queries could still happen on the underlying codebase/files via Jupyter Notebook though. Further to that, interactivity within a rendered output (charts, graphics, 3D model) can take place based on JavaScript implementation, which is feasible for HTML, but incompatible with ePub.
What is interesting for a publisher is that Quarto collects everything in a directory structure. This is the publication and something that a publisher can point towards. Within that, Quarto does two things: first of all it runs on top of a set of data (Jupyter Notebooks, Markdown files). These files, which are going to render out to the web, can also be interacted with as notebooks on their own. Quarto takes these notebooks and renders them to a static or fixed point, which would be a published release or version. If you wanted to interact with this version as a reader, then publishers can enable this in two ways: 1) by linking straight back to those Jupyter Notebooks. For example, a published web or PDF rendering could be annotated to inform a reader that to interact with or query the digital objects (figures, images) they can go directly to the notebook; 2) interaction via a loading of JavaScript libraries into the HTML, which enables interaction with figures, graphs etc.
The key thing to explore here from a publisher perspective would be what inputs and outputs would be needed and which are available through the above described technical workflow. What would be the input, the format a computational book is delivered in, and what output options will there be when we move this book through the distribution chain? We will focus on these inputs and outputs while going through some of the steps required to publish a book with Open Book Publishers (OBP).
Peer Review: what OBP would need at this stage in a standard print or digital workflow is to be able to generate a PDF (which is what most peer reviewers want) including interactive links or a link to interactive elements, which can then be sent to the reviewers. This might also include a HTML version. What is important to consider in this step is the version that the peer reviewer will get and whether there are interactive components in this version, in which case the reviewer will need the ability to interact with these. So in the case of pre-publication review the reviewer needs access to the environment and the database that people can interact with. The publisher (or author or combined) might need to spend some time to generate this environment or to provide access to it for the reviewer. Many experimental publishing platforms are therefore integrating options for peer review into their environments (e.g. PubPub, Manifold) or hypothes.is is often used as an overlay on a certain publishing environment. With Jupyter Notebooks, you can add users to a private or pre-release repository, so it is quite simple here to add reviewers to the interactive environment. Open peer review is quite common in these kinds of open and experimental environments, yet this isn’t always appropriate. Implementation of single-blind review seems to be feasible too (which would involve the publisher ensuring some form of anonymity of access for reviewer), double-blind review is much more complicated as it is often impossible to ensure the anonymity of the author in open, online, experimental and interactive environments.
Editing: this stage concerns the formatting, laying out the pages and running titles, making sure the page size is a format that can be printed etc. This means that styling presets will need to be defined. With respect to copy editing, the question is how editing could be handled in GitHub? Could this be done directly in GitHub via Markdown files (as default)? Technically this wouldn’t be complicated, and copyeditors can be trained to edit in Markdown (although this might be harder for authors), the main difficulty to work in Markdown might be on a human level because of non-compatibility with visual preferences. Generally speaking, most copyeditors might find it difficult to work directly in Markdown with all the code visible, meaning that a two-way interface would be needed with on one side how the Markdown will be displayed (also see e.g. Dillinger). Other languages such as LaTeX are also getting better at this. It would be helpful if code fields could be hidden from copyeditors in this context.1 Some code fields might be useful to display for illustration, but in the majority of cases we wouldn’t want to display this for a PDF output. The ability to add comments or track changes might also be important for copy-editing. One drawback of editing in Markdown is the lack of scholarly referencing solutions, tables, citation managers etc. in Markdown editing environments. One collaborative writing and editing environment that could potentially help with this is Fidus Writer, which offers a Word-like editing and commenting interface online, and could be used to edit source to output Markdown files. To edit Jupyter Notebook files, one would need to edit in a notebook interface (containing text cells and code cells), but other tools are available too (Binder connected to Juypter Hub, Visual Studio Code, for example). Nextcloud also offers Markdown editing.
Crediting: at this stage in the publication of computational books, expanded forms of attribution might be important to explore, given the variety of actors (human and non-human) involved in the creation of computational books, where the roles of e.g. developers and designers might become more important to acknowledge. We are moving towards book environments in which there are increasingly more actors involved, this has always been the case of course in a print environment too, but the changing roles of contributors and the importance of these roles in a digital environment is something to be aware of. There is the Contributor Roles Taxonomy (CRediT), but this is an incomplete framework particularly for HSS. Open Book Publishers adds a credit list (that also includes information about the software used to create this book etc.) at the end of their books, but this information is not kept in the metadata itself. Other presses such as Stanford University Press have adopted attribution methods for some of their projects that are more akin to film credits, which might also be an option to explore.
Printing: at this stage the outputs needed for print are PDF/x 2.1 (which allows a print-quality PDF to be generated). This should be feasible with Pandoc in the backend of Quarto, as long as the images come through. One issue that should be mentioned here is that clearly a printed work does not have any live URLs. At OBP engagement with multimedia is enabled and encouraged via QR-codes, which will mean generating QR-codes for the live links that would sit alongside the files, allowing for the reader to interact with it directly.
Distribution: at this stage conversion to other forms including AZW and S3 file systems makes a difference to whether the digital book can be distributed to places such as Amazon. But as long as ePub and PDF are available then distribution will be linked to those files. It is more about generating those files and ensuring their reader functionality.
Archiving: at this stage ePub generation becomes important. OBP would want a PDF/A output for archiving, and this would involve a need to consider what the nature of the PDF that is being displayed will be, to ensure that people can download it, and ideally can generate it identically from the general PDF output, from the print book (with an identical page outline, the same QR codes etc.). With respect to ePub it is important to keep in mind that people use this as a standalone file that they can take away without interacting with the web. This preference of readers for single-file format (with no connection to the web), brings us back to question of the interactivity of material within the ePub vs. the interactivity that is available by accessing the web. What is the functionality of that material in the ePub when it is not connected to the web? How would we indicate that something is interactive in the ePub and where would readers go to get to that interactivity (do they stay within the ePub or go somewhere else). HTML would be good to have too, and it would be interesting to know if an XML version plus a workflow from there would be possible.
Header Image: Photo by
Shubham Dhage
on Unsplash.