Re-building my blog with MySTMD

Wow it has been a long time since I’ve last-written here. It turns out that having two small children and a very demanding job means you don’t have as much time for blogging. But that’s a whole different blog post...

I’ve decided to convert my blog to use the new MyST Document Engine. This is part of a dogfooding experiment to see what’s possible with MyST, since it’s where the Jupyter Book project is heading, and I want to see how close to “production-ready” we are already.

To begin, I wanted to share a few things I learned today as I tried to re-tool my old blog for use with MyST MD.

There’s no blog functionality in MyST yet¶

As MyST is still quite young, there’s no out-of-the-box functionality for MyST (see jupyter-book/mystmd#840 for the issue tracking that). So, I wanted to accomplish at least two things with my initial transfer:

Generate a list of recent blog posts that I can insert into a few places in my site.
Generate an RSS feed that keeps the same URLs, content, etc.

What didn’t work: using the parsed MyST documents¶

My first thought was to use a MyST plugin that defines a directive I could use to insert a list of blog posts. However, I learned that MyST plugins have no way to access all of the parsed documents at build time (see this issue about accessing all of the parsed documents to track that one).

Fortunately, @rowanc1 made me realize that I could just manually parse my blog files and use that to build up something like a blog list. So that’s what the rest of this post is about.

You can run scripts in JavaScript as part of your MyST build¶

The MySTMD plugins documentation shows a few examples for how to define your own MyST plugins. These are little JavaScript files that get executed every time you build your MyST site.

The easiest thing to do here would be to write a JavaScript plugin for MyST that I can attach to my site build. You could use a JS library to parse the markdown files in blog/, grab their YAML metadata, and return MyST AST structure that would be inserted into the document. But I’m not too comfortable with JavaScript, and I found two ways that are much hackier, but much more accessible for somebody that is comfortable with Python 😎.

Write a MyST extension in Python¶

I bet most folks don’t know that you can write MyST extensions entirely in Python (or any other language that you can execute locally). Here is the MyST documentation on writing an executable MyST extension.

Executable extensions are treated like a black box in MyST - the MyST build process simply executes a file that you specify in myst.yml, and treats anything printed to stdout as MyST AST that will be inserted back into the MyST document.

How do you know what MyST AST looks like?¶

I mention “all you need to do is output MyST AST”, but what does that even mean? The MyST AST is the abstract structure of a MyST document. It has all of the relevant information about the content in a document, as well as all the semantic tags for different types of content that can exist (e.g., “italics”, or “admonition boxes”).

When a MyST Markdown document is parsed, the result is MyST AST. You can see a ton of examples of AST for various MyST markdown in the MyST guide. Just look for the litte “interactive demo” boxes that show off sample MyST Markdown.

In my case, I needed a list of blog posts, so I found the relevant AST for what this looks like at the following locations:

The list items documentation showed me the AST for lists.
The Definition lists documentation had sample AST for an internal link.
The Inline text formatting documentation had examples for things like bold, italics, etc.

Turning this into a Python plugin for MyST¶

With this in mind, I wrote a little Python extension that:

At build time, parses all of my blog post markdown files and extracts metadata from their YAML headers.
Defines a bloglist directive that will insert a list of blog posts where it exists.
Manually converts the blog post metadata into MyST AST that it prints to stdout.

As a result, when I call the directive in my blog, it will replace the directive with whatever AST is spit out by the Python script. You can take a look at the entire Python script here.

Now I can insert lists of blog posts wherever I like, for example here’s a list of the latest three:

Why open source foundations try to fund systems, not development

This is a brief reflection on something that I've been hearing consistently from the Linux Foundation and its member projects as part of serving on the [Board of the Jupyter Foundation](https://jupyterfoundation.org). Here's a point that originally surprised me when I heard it: > Most foundations within the Linux Foundation network recommend

Jupyter can align the needs of its community and its foundation by enabling contribution

This week was my first time attending the [Linux Foundation Member Summit](https://events.linuxfoundation.org/lf-member-summit/). This is an annual meeting for all of the Linux Foundation member organizations and projects. I joined because of my new role [on the Jupyter Executive Council](./jec.md), and so I tried to go into this meeting with the

The relationship between the Jupyter Executive Council, Software Steering Council, and Foundation

This is a question that I've been asked many times now that I'm serving on the JEC and the JF. I'm writing up a quick response so that I have something to refer back to and align my own thinking on. **How most Linux Foundation projects seem to be structured**. Linux

Adding an RSS feed¶

Because I’m running Python to define my MyST plugin, I can also use Python to build a custom RSS feed! This was relatively easy since I’d already parsed the metadata from each file.

I found the python-feedgen package, which is a little helper package for generating RSS feeds in Python. Since my MyST plugin was already written in Python, I just added a few more lines to do so.

Click to see the whole Python plugin script

blogpost.py

#!/usr/bin/env python3
import argparse
import json
import sys
from yaml import safe_load
from pathlib import Path
import pandas as pd
from feedgen.feed import FeedGenerator
import unist as u

DEFAULTS = {"number": 10}

root = Path(__file__).parent.parent

# Aggregate all posts from the markdown and ipynb files
posts = []
for ifile in root.rglob("blog/**/*.md"):
    if "drafts" in str(ifile):
        continue

    text = ifile.read_text()
    try:
        _, meta, content = text.split("---", 2)
    except Exception:
        print(f"Skipping file with error: {ifile}", file=sys.stderr)
        continue

    # Load in YAML metadata
    meta = safe_load(meta)
    meta["path"] = ifile.relative_to(root).with_suffix("")
    if "title" not in meta:
        lines = text.splitlines()
        for ii in lines:
            if ii.strip().startswith("#"):
                meta["title"] = ii.replace("#", "").strip()
                break
    
    # Summarize content
    skip_lines = ["#", "--", "%", "++"]
    content = "\n".join(ii for ii in content.splitlines() if not any(ii.startswith(char) for char in skip_lines))
    N_WORDS = 50
    words = " ".join(content.split(" ")[:N_WORDS])
    if not "author" in meta or not meta["author"]:
        meta["author"] = "Chris Holdgraf"
    meta["content"] = meta.get("description", words)
    posts.append(meta)
posts = pd.DataFrame(posts)
posts["date"] = pd.to_datetime(posts["date"]).dt.tz_localize("US/Pacific")
posts = posts.dropna(subset=["date"])
posts = posts.sort_values("date", ascending=False)

# Generate an RSS feed
fg = FeedGenerator()
fg.id("http://chrisholdgraf.com")
fg.title("Chris Holdgraf's blog")
fg.author({"name": "Chris Holdgraf", "email": "choldgraf@gmail.com"})
fg.link(href="http://chrisholdgraf.com", rel="alternate")
fg.logo("http://chrisholdgraf.com/_static/profile.jpg")
fg.subtitle("Chris' personal blog!")
fg.link(href="http://chrisholdgraf.com/rss.xml", rel="self")
fg.language("en")

# Add all my posts to it
for ix, irow in posts.iterrows():
    fe = fg.add_entry()
    fe.id(f"http://chrisholdgraf.com/{irow['path']}")
    fe.published(irow["date"])
    fe.title(irow["title"])
    fe.link(href=f"http://chrisholdgraf.com/{irow['path']}")
    fe.content(content=irow["content"])

# Write an RSS feed with latest posts
fg.atom_file(root / "atom.xml", pretty=True)
fg.rss_file(root / "rss.xml", pretty=True)

plugin = {
    "name": "Blog Post list",
    "directives": [
        {
            "name": "postlist",
            "doc": "An example directive for showing a nice random image at a custom size.",
            "alias": ["bloglist"],
            "arg": {},
            "options": {
                "number": {
                    "type": "int",
                    "doc": "The number of posts to include",
                }
            },
        }
    ],
}

children = []
for ix, irow in posts.iterrows():
    children.append(
        {
          "type": "card",
          "url": f"/{irow['path'].with_suffix('')}",
          "children": [
            {
              "type": "cardTitle",
              "children": [u.text(irow["title"])]
            },
            {
              "type": "paragraph",
              "children": [u.text(irow['content'])]
            },
            {
              "type": "footer",
              "children": [
                u.strong([u.text("Date: ")]), u.text(f"{irow['date']:%B %d, %Y} | "),
                u.strong([u.text("Author: ")]), u.text(f"{irow['author']}"),
              ]
            },
          ]
        }
    )


def declare_result(content):
    """Declare result as JSON to stdout

    :param content: content to declare as the result
    """

    # Format result and write to stdout
    json.dump(content, sys.stdout, indent=2)
    # Successfully exit
    raise SystemExit(0)


def run_directive(name, data):
    """Execute a directive with the given name and data

    :param name: name of the directive to run
    :param data: data of the directive to run
    """
    assert name == "postlist"
    opts = data["node"].get("options", {})
    number = int(opts.get("number", DEFAULTS["number"]))
    output = children[:number]
    return output


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    group = parser.add_mutually_exclusive_group()
    group.add_argument("--role")
    group.add_argument("--directive")
    group.add_argument("--transform")
    args = parser.parse_args()

    if args.directive:
        data = json.load(sys.stdin)
        declare_result(run_directive(args.directive, data))
    elif args.transform:
        raise NotImplementedError
    elif args.role:
        raise NotImplementedError
    else:
        declare_result(plugin)

Annoyingly, you cannot just tell MyST to put a file in a particular location (see jupyter-book/mystmd#1196 tracking this one). So I had to manually move this file to my build output folder in my GitHub action. Hopefully this functionality gets updated soon. Here’s what that looks like:

deploy.yml

    # Move RSS feeds to output folder
    - name: Move RSS feeds
      run: |
        cp atom.xml _build/html/atom.xml
        cp rss.xml _build/html/rss.xml

    # If we've pushed to main, push the book's HTML to github-pages

2024

How I’m trying to use BlueSky without getting burned again

2024

Generate MyST with Jupyter and insert it into content programmatically