--- tags: markup, documentation category: report redirect: 2020-01-22-rst-thoughts date: 2020-01-22 --- # What do people think about rST? Publishing computational narratives has always been a dream of the Jupyter Project, and there is still a lot of work to be done in improving these use-cases. We've made a lot of progress in providing open infrastructure for reproducible science with [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) and [the Binder Project](https://mybinder.org/), but what about the documents themselves? We've recently been working on tools like [Jupyter Book](https://jupyterbook.org), which aim to improve the writing and publishing process with the Jupyter ecosystem. This is hopefully the first post of a few that ask how we can best-improve the state of publishing with Jupyter. :::{admonition} Update! :class: tip Many of the ideas in this post have now made their way into a new flavor of markdown called [Markedly Structured Text](https://myst-parser.readthedocs.io), or MyST. It brings all of the features of rST into Markdown. Check it out! ::: Python has a fairly sophisticated publishing tool in its stack. [Sphinx](http://sphinx-doc.org/) has been a staple for publishing documentation for packages for several years now. Interestingly, publishing a book is more similar to publishing a package's documentation than it is to, say, publishing a blog. Maybe we could use Sphinx more heavily for writing computational narratives. One of the major challenges with Sphinx is that its default markup language is reStructuredText, a fairly old but battle-tested markup language. The benefit of reStructuredText is that it is a *semantic language*, meaning that it has ways to store more information about the nature of the text (e.g. something is an "author", something is a "reference", etc). It is also a standard that has remained very stable over time (whether that's a good or bad thing I'll leave to you to decide). However, there are a few major problems with reStructuredText that have impeded its adoption by communities outside of the Python documentation world. I recently [asked around on Twitter](https://twitter.com/choldgraf/status/1212054861132521472) what these problems were. I got some interesting responses! Here is a quick summary of people's thoughts. # The syntax of rST is too confusing By far the most common response was that rST syntax is simply too confusing. Here were the main pain points. ## Link syntax Many folks particularly mentioned that they needed to look up how to construct links every time they wrote rST. For reference, a link in rST looks like this: ```rst This `Is a link to `_. ``` Compared with markdown: ``` This [is a link to](https://google.com) ``` ## Header complexity rST uses "setext" headers, which means that you put a bunch of underline-like characters under (or under+over) the header name itself, like so: ```rst This is my first header ======================= ====================== This is another header ====================== ``` Compare this to "ATX headers", which markdown uses and look like this: ``` # This is a header ``` Setting aside the annoyance of having to hold down the "=" a bunch of times, the big problem with rST headers is that header **characters in rST have no single mapping onto header hierarchy**. For example: ```rst This is my first header ~~~~~~~~~~~~~~~~~~~~~~~ And this is my second header ============================ ``` is the same as ```rst This is my first header ======================= And this is my second header ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This is a case where too much flexibility makes life more difficult than it needs to be. Many responses wished that rST simply used "ATX" headers (using `#` in front of titles) or chose a single hierarchy of header characters. ## In-line code Many folks also dislike the fact that in rST, you must denote in-line code blocks with **two** backticks instead of one. For example, this is rST: ```rst Here is some inline code: ``a = 2`` ``` While here is some markdown ``` Here is some inline code: `a = 2` ``` It may seem silly, but markdown's ubiquity has given most people the assumption that backticks==code. The fact that rST deviates from this adds unnecessary cognitive burden to most users. ## Nesting in-line markup Finally, there were several mentions of rST's strange inability to nest in-line formatting. E.g. being able to bold a link by nesting `**` inside of your link syntax. ## A quick summary Here is a quick list of the tweets that touched on the topic of syntax: * Links syntax is confusing (9 total) * Links are confusing: * Bad syntax, esp links: * Links structure: * External links: * Syntax / too complex to write: * Syntax, esp links and tables: * Improve links: * Links and heading permalinks: * Inline hyperlinks: * Header / title syntax (3 total) * Using underlines for headers: * Titles: * Title lines are confusing: * Two backticks for code * Two Backticks for code literals: * Nested inline markup (e.g. 'em' inside of 'strong') (2 total) * Nested inline markup: * nested "em" in "strong": # Error reporting and the complexity of Sphinx The other major complaint people had was in the toolchain itself. Sphinx is an incredibly powerful tool, but this comes with a degree of complexity that many find difficult to work through. This isn't helped by the fact that the [Sphinx documentation](http://www.sphinx-doc.org/en/master/) is itself incomplete in many sections (the irony of this is not lost on me). In particular, several people commented about the difficulty in surfacing and debugging errors that happen in the Sphinx build chain. They also mentioned that Sphinx can be slow to build sometimes, which bogs down the development and writing process. Here are the tweets about the Sphinx toolchain itself: * Error reporting / complexity of Sphinx itself (3 total) * Error reporting: * Error reporting in Sphinx etc: * Documentation is bad: There were also a few miscellaneous responses that didn't quite fit into the above categories: * Misc (6 total) * Isn't a standard: * Newlines in nested lists, not easy to deploy: * No notebook support: * Has state: * General complexity: # What can be done? Overall, it seems like **reStructuredText** could be much-improved with a few minor modifications to its syntax. These don't seem like they are structurally incompatible with rST, and would alleviate some of the cognitive burden that users report when they use it. Here's a quick list of simple things: * rST could use markdown syntax for external links. * rST could decide on a fixed interpretation of header characters to levels in the header hierarchy * rST could default to interpreting single backticks as raw code spans and a list of slightly more complex things: * rST could support nested styling inside of links and other elements * Sphinx could improve its error reporting and debugging machinery * Sphinx could improve its documentation so that it was easier to find an answer to a question # Or could we bring reStructuredText into Markdown? There is another option, of course, which is to go in the opposite direction. Start with markdown, and then ask "how could we build the flexibility of rST into markdown" rather than bringing the simplicity of markdown into reStructuredText. I often wonder if the easiest thing to do would be to simply decide on a markdown syntax that maps on to "directives" and "roles" (perhaps the Pandoc code fence `:::` for directives, and link attributes `[]{attribute}` for roles). I think that both are worth exploring. In summary, I was surprised at the consistency of people's complains about the rST language. It seems that many people are hung up about the same relatively minor syntax choices, and that making modifications to these choices would improve the experience for many. It's also clear that Sphinx could use some developer time to make it more robust, debuggable, and well-documented. I hope that we can make some progress on these issues in the coming years.