What do people think about rST?#
Publishing computational narratives has always been a dream of the Jupyter Project, and there is still a lot of work to be done in improving these use-cases. We’ve made a lot of progress in providing open infrastructure for reproducible science with JupyterHub and the Binder Project, but what about the documents themselves? We’ve recently been working on tools like Jupyter Book, which aim to improve the writing and publishing process with the Jupyter ecosystem. This is hopefully the first post of a few that ask how we can best-improve the state of publishing with Jupyter.
Update!
Many of the ideas in this post have now made their way into a new flavor of markdown called Markedly Structured Text, or MyST. It brings all of the features of rST into Markdown. Check it out!
Python has a fairly sophisticated publishing tool in its stack. Sphinx has been a staple for publishing documentation for packages for several years now. Interestingly, publishing a book is more similar to publishing a package’s documentation than it is to, say, publishing a blog. Maybe we could use Sphinx more heavily for writing computational narratives.
One of the major challenges with Sphinx is that its default markup language is reStructuredText, a fairly old but battle-tested markup language. The benefit of reStructuredText is that it is a semantic language, meaning that it has ways to store more information about the nature of the text (e.g. something is an “author”, something is a “reference”, etc). It is also a standard that has remained very stable over time (whether that’s a good or bad thing I’ll leave to you to decide).
However, there are a few major problems with reStructuredText that have impeded its adoption by communities outside of the Python documentation world. I recently asked around on Twitter what these problems were. I got some interesting responses! Here is a quick summary of people’s thoughts.
The syntax of rST is too confusing#
By far the most common response was that rST syntax is simply too confusing. Here were the main pain points.
Link syntax#
Many folks particularly mentioned that they needed to look up how to construct links every time they wrote rST.
For reference, a link in rST looks like this:
This `Is a link to <https://google.com>`_.
Compared with markdown:
This [is a link to](https://google.com)
Header complexity#
rST uses “setext” headers, which means that you put a bunch of underline-like characters under (or under+over) the header name itself, like so:
This is my first header
=======================
======================
This is another header
======================
Compare this to “ATX headers”, which markdown uses and look like this:
# This is a header
Setting aside the annoyance of having to hold down the “=” a bunch of times, the big problem with rST headers is that header characters in rST have no single mapping onto header hierarchy. For example:
This is my first header
~~~~~~~~~~~~~~~~~~~~~~~
And this is my second header
============================
is the same as
This is my first header
=======================
And this is my second header
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a case where too much flexibility makes life more difficult than it needs to be. Many
responses wished that rST simply used “ATX” headers (using #
in front of titles) or chose
a single hierarchy of header characters.
In-line code#
Many folks also dislike the fact that in rST, you must denote in-line code blocks with two backticks instead of one. For example, this is rST:
Here is some inline code: ``a = 2``
While here is some markdown
Here is some inline code: `a = 2`
It may seem silly, but markdown’s ubiquity has given most people the assumption that backticks==code. The fact that rST deviates from this adds unnecessary cognitive burden to most users.
Nesting in-line markup#
Finally, there were several mentions of rST’s strange inability to nest in-line formatting.
E.g. being able to bold a link by nesting **
inside of your link syntax.
A quick summary#
Here is a quick list of the tweets that touched on the topic of syntax:
Links syntax is confusing (9 total)
Links are confusing: https://twitter.com/asmeurer/status/1212468755768336384
Bad syntax, esp links: https://twitter.com/njgoldbaum/status/1212059796142055424
Links structure: https://twitter.com/SylvainCorlay/status/1212061834116816898
External links: https://twitter.com/WillingCarol/status/1212152304800894976
Syntax / too complex to write: https://twitter.com/minrk/status/1212119686009233410
Syntax, esp links and tables: https://twitter.com/_JacobTomlinson/status/1212104705809289219
Improve links: https://twitter.com/andreazonca/status/1212166686389858306
Links and heading permalinks: https://twitter.com/goerz/status/1212610069252263936
Inline hyperlinks: https://twitter.com/moorepants/status/1212072806994739200
Header / title syntax (3 total)
Using underlines for headers: https://twitter.com/phaustin/status/1212067821976375296
Titles: https://twitter.com/mfcabrera/status/1212126606170431489
Title lines are confusing: https://twitter.com/alienghic/status/1212119125398437888
Two backticks for code
Two Backticks for code literals: https://twitter.com/brettsky/status/1212422437683351553
Nested inline markup (e.g. ‘em’ inside of ‘strong’) (2 total)
Nested inline markup: https://twitter.com/asmeurer/status/1212468755768336384
nested “em” in “strong”: https://twitter.com/uranusjr/status/1212179877341720577
Error reporting and the complexity of Sphinx#
The other major complaint people had was in the toolchain itself. Sphinx is an incredibly powerful tool, but this comes with a degree of complexity that many find difficult to work through. This isn’t helped by the fact that the Sphinx documentation is itself incomplete in many sections (the irony of this is not lost on me).
In particular, several people commented about the difficulty in surfacing and debugging errors that happen in the Sphinx build chain. They also mentioned that Sphinx can be slow to build sometimes, which bogs down the development and writing process.
Here are the tweets about the Sphinx toolchain itself:
Error reporting / complexity of Sphinx itself (3 total)
Error reporting: https://twitter.com/asmeurer/status/1212468909078544384
Error reporting in Sphinx etc: https://twitter.com/GaelVaroquaux/status/1212058374981988352
Documentation is bad: https://twitter.com/AkhmerovAnton/status/1212059839033171968
There were also a few miscellaneous responses that didn’t quite fit into the above categories:
Misc (6 total)
Isn’t a standard: https://twitter.com/aterrel/status/1212132546307350529
Newlines in nested lists, not easy to deploy: https://twitter.com/westurner/status/1212178375571365891
No notebook support: https://twitter.com/moorepants/status/1212062811599147008
Has state: https://twitter.com/Mbussonn/status/1212069224245551104
General complexity: https://twitter.com/mrocklin/status/1212123762595811328
What can be done?#
Overall, it seems like reStructuredText could be much-improved with a few minor modifications to its syntax. These don’t seem like they are structurally incompatible with rST, and would alleviate some of the cognitive burden that users report when they use it. Here’s a quick list of simple things:
rST could use markdown syntax for external links.
rST could decide on a fixed interpretation of header characters to levels in the header hierarchy
rST could default to interpreting single backticks as raw code spans
and a list of slightly more complex things:
rST could support nested styling inside of links and other elements
Sphinx could improve its error reporting and debugging machinery
Sphinx could improve its documentation so that it was easier to find an answer to a question
Or could we bring reStructuredText into Markdown?#
There is another option, of course, which is to go in the opposite direction. Start with markdown,
and then ask “how could we build the flexibility of rST into markdown” rather than bringing the
simplicity of markdown into reStructuredText. I often wonder if the easiest thing to do would be
to simply decide on a markdown syntax that maps on to “directives” and “roles” (perhaps the Pandoc
code fence :::
for directives, and link attributes []{attribute}
for roles). I think that both
are worth exploring.
In summary, I was surprised at the consistency of people’s complains about the rST language. It seems that many people are hung up about the same relatively minor syntax choices, and that making modifications to these choices would improve the experience for many. It’s also clear that Sphinx could use some developer time to make it more robust, debuggable, and well-documented. I hope that we can make some progress on these issues in the coming years.