Publishing computational narratives has always been a dream of the Jupyter Project, and there is still a lot of work to be done in improving these use-cases. We’ve made a lot of progress in providing open infrastructure for reproducible science with JupyterHub and the Binder Project, but what about the documents themselves? We’ve recently been working on tools like Jupyter Book, which aim to improve the writing and publishing process with the Jupyter ecosystem. This is hopefully the first post of a few that ask how we can best-improve the state of publishing with Jupyter.
Python has a fairly sophisticated publishing tool in its stack. Sphinx has been a staple for publishing documentation for packages for several years now. Interestingly, publishing a book is more similar to publishing a package’s documentation than it is to, say, publishing a blog. Maybe we could use Sphinx more heavily for writing computational narratives.
One of the major challenges with Sphinx is that its default markup language is reStructuredText, a fairly old but battle-tested markup language. The benefit of reStructuredText is that it is a semantic language, meaning that it has ways to store more information about the nature of the text (e.g. something is an “author”, something is a “reference”, etc). It is also a standard that has remained very stable over time (whether that’s a good or bad thing I’ll leave to you to decide).
However, there are a few major problems with reStructuredText that have impeded its adoption by communities outside of the Python documentation world. I recently asked around on Twitter what these problems were. I got some interesting responses! Here is a quick summary of people’s thoughts.
The syntax of rST is too confusing¶
By far the most common response was that rST syntax is simply too confusing. Here were the main pain points.
Link syntax¶
Many folks particularly mentioned that they needed to look up how to construct links every time they wrote rST.
For reference, a link in rST looks like this:
This `Is a link to <https://google.com>`_.
Compared with markdown:
This [is a link to](https://google.com)
Header complexity¶
rST uses “setext” headers, which means that you put a bunch of underline-like characters under (or under+over) the header name itself, like so:
This is my first header
=======================
======================
This is another header
======================
Compare this to “ATX headers”, which markdown uses and look like this:
# This is a header
Setting aside the annoyance of having to hold down the “=” a bunch of times, the big problem with rST headers is that header characters in rST have no single mapping onto header hierarchy. For example:
This is my first header
~~~~~~~~~~~~~~~~~~~~~~~
And this is my second header
============================
is the same as
This is my first header
=======================
And this is my second header
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a case where too much flexibility makes life more difficult than it needs to be. Many
responses wished that rST simply used “ATX” headers (using #
in front of titles) or chose
a single hierarchy of header characters.
In-line code¶
Many folks also dislike the fact that in rST, you must denote in-line code blocks with two backticks instead of one. For example, this is rST:
Here is some inline code: ``a = 2``
While here is some markdown
Here is some inline code: `a = 2`
It may seem silly, but markdown’s ubiquity has given most people the assumption that backticks==code. The fact that rST deviates from this adds unnecessary cognitive burden to most users.
Nesting in-line markup¶
Finally, there were several mentions of rST’s strange inability to nest in-line formatting.
E.g. being able to bold a link by nesting **
inside of your link syntax.
A quick summary¶
Here is a quick list of the tweets that touched on the topic of syntax:
- Links syntax is confusing (9 total)
- Links are confusing: https://
twitter .com /asmeurer /status /1212468755768336384 - Bad syntax, esp links: https://
twitter .com /njgoldbaum /status /1212059796142055424 - Links structure: https://
twitter .com /SylvainCorlay /status /1212061834116816898 - External links: https://
twitter .com /WillingCarol /status /1212152304800894976 - Syntax / too complex to write: https://
twitter .com /minrk /status /1212119686009233410 - Syntax, esp links and tables: https://
twitter .com / _JacobTomlinson /status /1212104705809289219 - Improve links: https://
twitter .com /andreazonca /status /1212166686389858306 - Links and heading permalinks: https://
twitter .com /goerz /status /1212610069252263936 - Inline hyperlinks: https://
twitter .com /moorepants /status /1212072806994739200
- Links are confusing: https://
- Header / title syntax (3 total)
- Using underlines for headers: https://
twitter .com /phaustin /status /1212067821976375296 - Titles: https://
twitter .com /mfcabrera /status /1212126606170431489 - Title lines are confusing: https://
twitter .com /alienghic /status /1212119125398437888
- Using underlines for headers: https://
- Two backticks for code
- Two Backticks for code literals: https://
twitter .com /brettsky /status /1212422437683351553
- Two Backticks for code literals: https://
- Nested inline markup (e.g. ‘em’ inside of ‘strong’) (2 total)
- Nested inline markup: https://
twitter .com /asmeurer /status /1212468755768336384 - nested “em” in “strong”: https://
twitter .com /uranusjr /status /1212179877341720577
- Nested inline markup: https://
Error reporting and the complexity of Sphinx¶
The other major complaint people had was in the toolchain itself. Sphinx is an incredibly powerful tool, but this comes with a degree of complexity that many find difficult to work through. This isn’t helped by the fact that the Sphinx documentation is itself incomplete in many sections (the irony of this is not lost on me).
In particular, several people commented about the difficulty in surfacing and debugging errors that happen in the Sphinx build chain. They also mentioned that Sphinx can be slow to build sometimes, which bogs down the development and writing process.
Here are the tweets about the Sphinx toolchain itself:
- Error reporting / complexity of Sphinx itself (3 total)
- Error reporting: https://
twitter .com /asmeurer /status /1212468909078544384 - Error reporting in Sphinx etc: https://
twitter .com /GaelVaroquaux /status /1212058374981988352 - Documentation is bad: https://
twitter .com /AkhmerovAnton /status /1212059839033171968
- Error reporting: https://
There were also a few miscellaneous responses that didn’t quite fit into the above categories:
- Misc (6 total)
- Isn’t a standard: https://
twitter .com /aterrel /status /1212132546307350529 - Newlines in nested lists, not easy to deploy: https://
twitter .com /westurner /status /1212178375571365891 - No notebook support: https://
twitter .com /moorepants /status /1212062811599147008 - Has state: https://
twitter .com /Mbussonn /status /1212069224245551104 - General complexity: https://
twitter .com /mrocklin /status /1212123762595811328
- Isn’t a standard: https://
What can be done?¶
Overall, it seems like reStructuredText could be much-improved with a few minor modifications to its syntax. These don’t seem like they are structurally incompatible with rST, and would alleviate some of the cognitive burden that users report when they use it. Here’s a quick list of simple things:
- rST could use markdown syntax for external links.
- rST could decide on a fixed interpretation of header characters to levels in the header hierarchy
- rST could default to interpreting single backticks as raw code spans
and a list of slightly more complex things:
- rST could support nested styling inside of links and other elements
- Sphinx could improve its error reporting and debugging machinery
- Sphinx could improve its documentation so that it was easier to find an answer to a question
Or could we bring reStructuredText into Markdown?¶
There is another option, of course, which is to go in the opposite direction. Start with markdown,
and then ask “how could we build the flexibility of rST into markdown” rather than bringing the
simplicity of markdown into reStructuredText. I often wonder if the easiest thing to do would be
to simply decide on a markdown syntax that maps on to “directives” and “roles” (perhaps the Pandoc
code fence :::
for directives, and link attributes []{attribute}
for roles). I think that both
are worth exploring.
In summary, I was surprised at the consistency of people’s complains about the rST language. It seems that many people are hung up about the same relatively minor syntax choices, and that making modifications to these choices would improve the experience for many. It’s also clear that Sphinx could use some developer time to make it more robust, debuggable, and well-documented. I hope that we can make some progress on these issues in the coming years.