https://www.jonashietala.se/blog/2022/08/29/rewriting_my_blog_in_rust_for_fun_and_profit/ Jonas Hietala * Archive * Series * Projects * Contact Rewriting my blog in Rust for fun and profit August 29, 2022 Background I've used Hakyll as my static site generator for around 9 years now. Before that I think I used Jekyll and also more dynamic pages with Mojolicious in Perl and Kohana in PHP, but I can't be completely sure as the git history doesn't go back that far. But all good things come to an end and I've now migrated to my own custom site generator written in Rust. Problems with my previous setup These are the main annoyances I wanted to solve with this rewrite: 1. It was starting to get slow. On my crappy laptop a full site rebuild took 75 seconds. (Not compile, just to generate the site.) I only have 240 posts so I don't think it should be that slow. While there is a good caching system and a watch command that only updates the changed post during editing, it was still annoyingly slow. 2. Several external dependencies. While the site generator itself is written in Haskell, there were other dependencies apart from a number of Haskell libs. My blog helper script was written in Perl, I used sassc to convert sass, pygments in Python for syntax highlighting, and s3cmd for uploading the generated site to S3. It's annoying to install all these and to keep them up-to-date, it would be great to just have a single thing to worry about. 3. Setup problems. Related to the previous point, sometimes things just break and I have to spend time to debug and fix them. This is especially frustrating when I have some good idea of something to write about, just to find out that I have some issue with my site generator. You might think, what could possibly go wrong? Well, sometimes I update some packages that might break things in weird ways. For example: + After GHC is updated it can no longer find cabal packages. + When I run the Haskell binary on I get this: [ERROR] Prelude.read: no parse (Only on my desktop, it works fine on my laptop.) + Or this Perl error: Magic.c: loadable library and perl binaries are mismatched (got handshake key 0xcd00080, needed 0xeb00080) (Only on my laptop, it works fine on my desktop.) + When Hakyll changed Pandoc arguments between versions, breaking code rendering in my Atom feed. I know that these are all solvable, but I just want something that just works(tm). 4. Mental overhead with Haskell. I kind of like Haskell--especially the purely functional parts of it, and I am a fan of the declarative approach Hakyll takes to site configuration. Take the generation of static (i.e. standalone pages) for instance: match "static/*.markdown" $ do route staticRoute compile $ pandocCompiler streams >>= loadAndApplyTemplate "templates/static.html" siteCtx >>= loadAndApplyTemplate "templates/site.html" siteCtx >>= deIndexUrls Even if you don't understand the $ and >>=, I still think it's clear that we're finding files from the static/ folder, sending them to pandocCompiler (to convert from markdown), to some templates and then de-indexing urls (to avoid links ending with index.html). Simple and clear! But I haven't used Haskell in years, and the overhead for me to add slightly more complex things to the site is quite large. For example, I wanted to add next/prev links to posts, but then I had to spend some time relearning Haskell and Hakyll. And even then the solution I came up with was super slow because it was a linear search to find the next/previous posts, but I couldn't figure out how to do this properly with the way Hakyll is setup. I'm sure you can do it in an efficient manner, but for me it was too much effort for such a small feature for me to bother. Why Rust? 1. I enjoy using Rust, and it's important for a hobby project like this. 2. Rust is quite performant and transforming text is a task it should be good at. 3. Cargo is fire and forget. As long as you have Rust installed, it should be enough to just do cargo build and it should just work(tm). Why reinvent the wheel again? I wanted to write my own static site generator as a fun and interesting project. It shouldn't be that hard and it would give me complete control over the site, which should give me a bit more flexibility than if I used an existing site generator. (Yes I know that projects like cobalt allows you to make any kind of transformation, using whatever language you want, to your pages.) Implementation details I'm not going to describe everything I did here, take a look at the source code if you're interested. Delegate the hard things At first I was worried that it would be hard to replicate all the features I enjoyed with Hakyll, like the templating engine, the syntax highlighting for numerous languages or the watch command that automatically regenerates any pages you edit and acts as a file server so I can view the post in the browser during the writing process. Turns out that it's easy if you let existing crates handle the hard parts. Here are some of the libraries I used to good effect: * tera for a templating engine. It's more powerful than what Hakyll provides, as it can do things like loop:
* pulldown-cmark for parsing Markdown. It's really for CommonMark which is a standard, unambiguous syntax specification for Markdown. While fast, it doesn't support as many things as Pandoc does, so I had to extend it with some more features. More on that later. * syntect for syntax highlighting, supporting Sublime Text's grammars. * yaml-front-matter to parse metadata from posts. * grass as a Sass compiler purely in Rust. * axum to create a static file server used to host the site locally. * hotwatch to watch for file changes, so we can update pages whenever a file is updated. * scraper to parse the generated html. Used in some of my tests and for some specific transformations. * rust-s3 to upload the generated site to S3 storage. Even with these libraries the Rust source files themselves contained over 6000 lines. In some cases Rust can be verbose, and my code is really not beautiful, but writing this project was still more work than expected. (Isn't that always the case?) Markdown transformations While it would be easy if my posts were just standard markdown, over the years I've included various features and extensions that my markdown parsers of choice pulldown-cmark don't support. So I had to code them myself. Preprocessing I have a preprocessing step that I use to create figures with multiple images. It's a generalized processing step that takes the form: :::
tag, while preserving the parsed
markdown.
While there are existing crates that adds code highlighting using
syntect, I wrote my own that wraps it in a tag and supports
inline code highlighting. For example "Inside row: let x = 2;" is
produced by:
Inside row: `let x = 2;`rust
Performance improvements
I didn't spend that much time into improving the performance, but two
things had a significant impact:
The first thing is that if you use syntect and have custom syntaxes,
you really should compress SyntaxSet to a binary format.
The other thing was to parallelize rendering using rayon. Rendering
is the markdown parsing, applying templates and creating the output
file. Rayon is great is this task is limited by CPU and it's very
easy to use (if the code is structured correctly). This is for
instance a simplified view of the rendering:
fn render(&self) -> Result<()> {
let mut items = Vec::new();
// Add posts, archives, and all other files that should be generated here.
for post in &self.content.posts {
items.push(post.as_ref());
}
// Render all items.
items
.iter()
.try_for_each(|item| self.render_item(*item))
}
To parallelize this all we need to do is change iter() into par_iter(
):
use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
items
.par_iter() // This line
.try_for_each(|item| self.render_item(*item))
And that's it!
Admittedly, my performance improvements are quite minor, and the big
performance gains comes from the libraries I use. For instance my old
site used an external pygments process written in Python for syntax
highlighting, while I now have a highlighter in Rust that's much
faster, and can easily be parallelized.
Sanity checks
One thing that always bothered me with the site is how easy it was to
make a mistake. For example linking to a non-existent page or image,
or not defining a link reference at all with [my link][todo], and
forgetting to update it before publishing.
So in addition to testing basic functionality like the watch command,
I also parse the whole site and check that all internal links exists
and are correct (so it validates some-title in /blog/my-post#
some-title too). I also check external links, but that's with a
manual command.
During generation I also take a hard stance and panic early and
often, just to reduce the risk that something weird slips by.
How did it go?
In the beginning of this post I listed some issues I had with my
previous setup, so let's see if I managed to improve them?
1. Performance
On my crappy laptop a full site rebuild (not including
compilation time) now takes 4 seconds. An 18x performance
improvement is not too shabby I'd say. I'm sure this could be
improved further--for example I use rayon for file IO while async
would be more beneficial, and I don't have a caching system so I
regenerate all files every time I build. (During watching it's
smarter though.)
Please note that this is not to say that Rust this much faster
than Haskell, instead see it as a comparison of two
implementations. I'm sure someone could make it super fast in
Haskell.
2. A single dependency
Now I've got everything in Rust, with no external scripts or
tools I need to install and keep working.
3. Cargo "just works"
As long as I have rust on the system, cargo build is just rock
solid. I think this is low-key one of Rust's biggest
strengths--the build system just works.
You don't have to manually hunt down missing dependencies,
sacrifice a child to make it cross-platform or pull your hair out
when the build system automagically pulls down an update that
breaks everything again. You just lean back and wait for your
code to compile.
4. I had an easier time with Rust
While I did find it easier to implement things like series or
prev/next links (which aren't live yet as I haven't styled them
properly), I don't think it means that Rust is simpler or easier
than Haskell, it just means that I personally had an easier time
to grok Rust than Haskell.
The biggest reason why probably comes down to practice. While
I've been using Rust a fair bit lately, I've barely touched
Haskell since I started using it for this site around a decade
ago.
I'm sure that if I don't use Rust for 10 years and you ask me to
use it again, I'll struggle a fair bit as well.
Overall, I'm quite pleased. It was a fun--albeit a larger than I had
anticipated--project that also removed some of my annoyances with this
site.
Posted in Rust, Webpage, Hakyll, Haskell.
* Home
* Archive
* Series
* Projects
* Contact
* GPG
Check out the source code for the site.