Wow it has been a long time since I’ve last-written here. It turns out that having two small children and a very demanding job means you don’t have as much time for blogging. But that’s a whole different blog post...
I’ve decided to convert my blog to use the new MyST Document Engine. This is part of a dogfooding experiment to see what’s possible with MyST, since it’s where the Jupyter Book project is heading, and I want to see how close to “production-ready” we are already.
To begin, I wanted to share a few things I learned today as I tried to re-tool my old blog for use with MyST MD.
There’s no blog functionality in MyST yet¶
As MyST is still quite young, there’s no out-of-the-box functionality for MyST (see jupyter
- Generate a list of recent blog posts that I can insert into a few places in my site.
- Generate an RSS feed that keeps the same URLs, content, etc.
What didn’t work: using the parsed MyST documents¶
My first thought was to use a MyST plugin that defines a directive I could use to insert a list of blog posts. However, I learned that MyST plugins have no way to access all of the parsed documents at build time (see this issue about accessing all of the parsed documents to track that one).
Fortunately, @rowanc1
made me realize that I could just manually parse my blog files and use that to build up something like a blog list. So that’s what the rest of this post is about.
You can run scripts in JavaScript as part of your MyST build¶
The MySTMD plugins documentation shows a few examples for how to define your own MyST plugins. These are little JavaScript files that get executed every time you build your MyST site.
The easiest thing to do here would be to write a JavaScript plugin for MyST that I can attach to my site build. You could use a JS library to parse the markdown files in blog/
, grab their YAML metadata, and return MyST AST structure that would be inserted into the document. But I’m not too comfortable with JavaScript, and I found two ways that are much hackier, but much more accessible for somebody that is comfortable with Python 😎.
Write a MyST extension in Python¶
I bet most folks don’t know that you can write MyST extensions entirely in Python (or any other language that you can execute locally). Here is the MyST documentation on writing an executable MyST extension.
Executable extensions are treated like a black box in MyST - the MyST build process simply executes a file that you specify in myst.yml
, and treats anything printed to stdout
as MyST AST that will be inserted back into the MyST document.
How do you know what MyST AST looks like?¶
I mention “all you need to do is output MyST AST”, but what does that even mean? The MyST AST is the abstract structure of a MyST document. It has all of the relevant information about the content in a document, as well as all the semantic tags for different types of content that can exist (e.g., “italics”, or “admonition boxes”).
When a MyST Markdown document is parsed, the result is MyST AST. You can see a ton of examples of AST for various MyST markdown in the MyST guide. Just look for the litte “interactive demo” boxes that show off sample MyST Markdown.
In my case, I needed a list of blog posts, so I found the relevant AST for what this looks like at the following locations:
- The list items documentation showed me the AST for lists.
- The Definition lists documentation had sample AST for an internal link.
- The Inline text formatting documentation had examples for things like bold, italics, etc.
Turning this into a Python plugin for MyST¶
With this in mind, I wrote a little Python extension that:
- At build time, parses all of my blog post markdown files and extracts metadata from their YAML headers.
- Defines a
bloglist
directive that will insert a list of blog posts where it exists. - Manually converts the blog post metadata into MyST AST that it prints to
stdout
.
As a result, when I call the directive in my blog, it will replace the directive with whatever AST is spit out by the Python script. You can take a look at the entire Python script here.
Now I can insert lists of blog posts wherever I like, for example here’s a list of the latest three:
- Re-building my blog with MySTMD - November 01, 2024
- Generate MyST with Jupyter and insert it into content programmatically - October 04, 2024
- A few random opportunities in AI for Social Good - October 02, 2023
Adding an RSS feed¶
Because I’m running Python to define my MyST plugin, I can also use Python to build a custom RSS feed! This was relatively easy since I’d already parsed the metadata from each file.
I found the python-feedgen
package, which is a little helper package for generating RSS feeds in Python. Since my MyST plugin was already written in Python, I just added a few more lines to do so.
Click to see the whole Python plugin script
#!/usr/bin/env python3
import argparse
import json
import sys
from yaml import safe_load
from pathlib import Path
import pandas as pd
from feedgen.feed import FeedGenerator
DEFAULTS = {"number": 10}
root = Path(__file__).parent.parent
# Aggregate all posts from the markdown and ipynb files
posts = []
for ifile in root.rglob("blog/**/*.md"):
text = ifile.read_text()
yaml = safe_load(text.split("---")[1])
yaml["path"] = ifile.relative_to(root).with_suffix("")
if "title" not in yaml:
lines = text.splitlines()
for ii in lines:
if ii.strip().startswith("#"):
yaml["title"] = ii.replace("#", "").strip()
break
content = text.split("---", 2)[-1]
content = "\n".join(ii for ii in content.splitlines() if not ii.startswith("#"))
N_WORDS = 100
words = " ".join(content.split(" ")[:N_WORDS])
yaml["content"] = yaml.get("description", words)
posts.append(yaml)
posts = pd.DataFrame(posts)
posts["date"] = pd.to_datetime(posts["date"]).dt.tz_localize("US/Pacific")
posts = posts.dropna(subset=["date"])
posts = posts.sort_values("date", ascending=False)
# Generate an RSS feed
fg = FeedGenerator()
fg.id("http://chrisholdgraf.com")
fg.title("Chris Holdgraf's blog")
fg.author({"name": "Chris Holdgraf", "email": "choldgraf@gmail.com"})
fg.link(href="http://chrisholdgraf.com", rel="alternate")
fg.logo("http://chrisholdgraf.com/_static/profile.jpg")
fg.subtitle("Chris' personal blog!")
fg.link(href="http://chrisholdgraf.com/rss.xml", rel="self")
fg.language("en")
# Add all my posts to it
for ix, irow in posts.iterrows():
fe = fg.add_entry()
fe.id(f"http://chrisholdgraf.com/{irow['path']}")
fe.published(irow["date"])
fe.title(irow["title"])
fe.link(href=f"http://chrisholdgraf.com/{irow['path']}")
fe.content(content=irow["content"])
# Write an RSS feed with latest posts
fg.atom_file(root / "atom.xml", pretty=True)
fg.rss_file(root / "rss.xml", pretty=True)
plugin = {
"name": "Blog Post list",
"directives": [
{
"name": "postlist",
"doc": "An example directive for showing a nice random image at a custom size.",
"alias": ["bloglist"],
"arg": {},
"options": {
"number": {
"type": "int",
"doc": "The number of posts to include",
}
},
}
],
}
children = []
for ix, irow in posts.iterrows():
children.append(
{
"type": "listItem",
"spread": True,
"children": [
{
"type": "link",
"url": f"/{irow['path'].with_suffix('')}",
"children": [
{
"type": "text",
"value": irow["title"],
}
],
},
{"type": "text", "value": f' - {irow["date"]:%B %d, %Y}'},
],
}
)
def declare_result(content):
"""Declare result as JSON to stdout
:param content: content to declare as the result
"""
# Format result and write to stdout
json.dump(content, sys.stdout, indent=2)
# Successfully exit
raise SystemExit(0)
def run_directive(name, data):
"""Execute a directive with the given name and data
:param name: name of the directive to run
:param data: data of the directive to run
"""
assert name == "postlist"
opts = data["node"].get("options", {})
number = int(opts.get("number", DEFAULTS["number"]))
output = (
{
"type": "list",
"ordered": False,
"spread": False,
"children": [
{
"type": "listItem",
"spread": True,
"children": children[:number],
}
],
},
)
return output
if __name__ == "__main__":
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group()
group.add_argument("--role")
group.add_argument("--directive")
group.add_argument("--transform")
args = parser.parse_args()
if args.directive:
data = json.load(sys.stdin)
declare_result(run_directive(args.directive, data))
elif args.transform:
raise NotImplementedError
elif args.role:
raise NotImplementedError
else:
declare_result(plugin)
Annoyingly, you cannot just tell MyST to put a file in a particular location (see jupyter
# Move RSS feeds to output folder
- name: Move RSS feeds
run: |
cp atom.xml _build/html/atom.xml
cp rss.xml _build/html/rss.xml
# If we've pushed to main, push the book's HTML to github-pages