https://soupault.app/blog/soupault-4.0.0-release/ soupault logo soupault a static website management tool Home | Installation | Plugins | Reference manual | Tips and tricks | Blueprints Blog | Users | Support | Contributing Soupault 4.0.0: as extensible as Jekyll, still statically linked Estimated reading time: 12 minutes. Date: 2022-05-14 * Overview * Introducing soupault "blueprints" + Blog + Book + Future blueprints * Breaking changes + Index views have no default template anymore + absolute_links widgets without prefix cause errors now + Empty build_dir is no longer allowed * New features + Required index fields + Processing certain pages before all other + Page processing hooks + Lua index processors + Accessing the index entry of the page from plugins + Per-view index entry sorting settings + Index view actions + Inline Lua plugins + Support for all HTML elements in relative_links and absolute_links + New Lua plugin API functions and variables o New variables o New functions o Unicode string functions + New command line options + Misc * Bug fixes * Behaviour changes * Website improvements * Future plans + Caching and incremental builds + Embedded scripting language improvements + Multicore support * Acknowledgements - Overview Soupault 4.0.0 is available for download from my own server and from GitHub releases. It introduces page processing hooks, Lua index processors (which allow creating taxonomies and paginated indices), a new option to make the site index available to content pages, an option to process certain pages before all other, a way to mark certain index fields as required, and a bunch of new Lua plugin API functions. Well, and a few bug fixes, of course. In short, this release takes extensibility to a new level, comparable to static site generators written in interpreted languages. If you allow me a digression, I find it odd that the static site generator "scene" tends to treat extensibility and native code implementation as mutually exclusive properties. On one end of the spectrum, we have projects like Jekyll or Nikola that are infinitely extensible and allow plugins to redefine almost any built-in behavior but need an interpreter and have a lot of runtime dependencies. If some functionality is not there, you find or write a plugin that adds it (from scratch or using a library from gems/pypi/etc). On the other end, there are projects like Hugo or Zola that are available as native, statically linked executables but don't offer any extensibility other than a Turing-complete template language. If something is not there, all you can do is hope for its inclusion in the mainline code, maintain a fork, or look elsewhere. Soupault was already the most extensible static site generator available as a native, statically linked executable. By operating at the HTML element tree level, it made all functionality work the same for any source format, and it allows automatically loading any format with user-defined page preprocessors. By embedding a Lua interpeter, it allows manipulating the page element tree ("DOM") in the same ways as client-side JS (without interactivity, of course). It can also pipe element tree nodes through external helpers and inject their output back into the page--in place or in addition to the original node. Still, quite a few things were only possible with inelegant workarounds. For example, taxonomies and paginated index pages could only be made with a LaTeX-like approach: run soupault --index-only to produce a JSON dump of the index data, then run a custom script to generate new pages in site/ from it, and finally run soupault again to render a complete site. That approach is certainly workable; I used it for my own blog for a long time. But it's also clunky and takes more build setup than many people are willing to tolerate for a static site generator, especially when other projects provide easy (if not always very flexible) ways to do the same. I could add a "magical" pagination and taxonomy generator that would suit most common needs. However, my goal for soupault is to give every user complete control over the website generation process. Whenever I want new functionality, I start with thinking about a general mechanism that would allow me to do that if I couldn't modify soupault itself. Now there are two major features that allow doing those things without external tools and multi-pass workflows: * It's now possible to write index generators in Lua and make Lua code generate new pages (e.g., taxonomies and paginated indices). * There are now page processing hooks that allow Lua code to take over specific processing steps or run between them: pre-parse, pre-process, post-index, render, and save. Read on for details. But before that, let's discuss a closely related development. - Introducing soupault "blueprints" A common complaint about soupault is that it's hard to get started with or even see its full capabilities. It's a fair point. The usual way for static site generators to give people a quick start is to maintain a repository of "themes". I find the word "theme" deceptive, though. In reality, they are more like applications on top of the SSG framework that include both presentation and logic. Even themes with similar structures may not be interchangeable. Theme compatibility with newer SSG versions is also a very real issue for some SSGs. In any case, once you choose and download a theme, switching to another theme isn't guaranteed to be simple, especially if you modify anything. I suppose the real reason why soupault users aren't actively making reusable themes is not that they also think a theme repository isn't a solution to all beginner problems. People likely turn to soupault for use cases that no other SSG supports, and they were going to make their own setup from the start. However, the quick start problem does exist. This is why I took the time to prepare two reusable soupault setups that you can take and build upon: one for blogs and another for books. There's also a wiki setup in development, but it needs more design effort and work on the code. Instead of "themes", I'm calling them "blueprints". So far, there are blueprints for making blogs and books. They both require soupault 4.0.0 and serve as showcases for its new features, such as Lua index generators and the index_first option. - Blog The blog blueprint is a full-featured blog setup with support for tags and Atom feeds (global and per-tag). Code-wise, it's more or less a reusable version of my own blog, while the visual design resembles this website instead. - Book The book blueprint is meant for creating online books. You can see a real book project based on it at ocamlbook.org. It also features automatic type checking for OCaml code snippets, so you can use it for inspiration if you are writing your own programming book. It's similar to mdBook in functionality, but doesn't require a summary.md file: the chapter list in the sidebar is generated automatically. Chapter numbers are stored in file names, like 01_introduction.md. One Lua hook extracts that number from the file name to add it to the page metadata, and another hook removes it from the target path, so that its URL becomes just /introduction. - Future blueprints I'm also working on a wiki blueprint inspired by MediaWiki (e.g., it will have categories--a very underused wiki feature). It's not ready yet and needs more work, so stay tuned for updates. I also have an idea to move the old sample site to a tongue-in-cheek homepage blueprint. Now let's move on to the actual release changelog. - Breaking changes Before we get to new features, let's discuss the breaking changes that triggered the major version bump to 4.0.0. They are minor and only affect edge cases, but semver is semver.^1 - Index views have no default template anymore The index view option index_item_template does not have a default value anymore. If you had an index view that didn't have either index_template, index_item_template, or index_processor and it was working fine for you, you need to add its original default value to your view explicitly to make it work like before. [index.views.some_view] index_item_template = '''
''' The reason for removing its implicit default value is that it continuted to refer to the {{title}} field long after the content model dehardcoding in soupault 2.0.0. The {{url}} field is a built-in, one of the technical fields that is guaranteed to be present. The {{title}} field is a remnant from the built-in content model of soupault 1.x.x that made title, date, and author fields special. Now that the content model is completely user-defined, it's very strange to have a hardcoded reference to a field that may not exist in the user's configuration, so I decided to remove it. My survey of existing websites built with soupault shows that everyone specifies that option explicitly, so this change shouldn't affect many people. On a side note, I briefly thought of removing the index_item_template option itself because index_item_template = '' is merely a syntactic sugar for: index_template = ''' {% for e in entries %} {% endfor %} ''' However, my survey shows that surprisingly many people use index_item_template in their configs, and if people see value in it, I definitely shouldn't remove it. - absolute_links widgets without prefix cause errors now The old implementation of the absolute_links widget would just do nothing if you forgot to specify the site URL in prefix, so this configuration was valid: [widgets.add_site_url] widget = "absolute_links" # prefix = "https://example.com/~jrandomhacker" Now it causes an error because in most cases, it's likely a configuration mistake that shouldn't be silently ignored. Explicitly setting the prefix to "" will not cause any errors. - Empty build_dir is no longer allowed This is another behavior that was accidentally allowed rather than intentionally designed. While an absolute_links widget with an unspecified prefix simply does nothing, build_dir = "" is more annoying: it makes soupault output pages into the current working directory. I doubt it's a behavior anyone would want, and I believe programs that are supposed to have a dedicated output directory should never pollute any other location, so this is an error now. If anyone ever wants it back, they can use build_dir = ".". - New features - Required index fields It's now possible to mark soecific index fields as required. If a required field is not present in a page, soupault will display an error and stop. Example: [index.fields.title] selector = ["h1#post-title", "h1"] required = true - Processing certain pages before all other By default, soupault processes content pages in essentially random order. What if a page serves as a source of persistent data for a Lua plugin? Or what if you are experimenting with some new code on that page and want the build to fail very early? Now there's an option to tell soupault which pages to process first: [settings] process_pages_first = ["about.md"] Note that it must be a content page, not an index page (typically, not a page named index.*, unless you set your own settings.index_page option). Index pages are always processed after content pages so that extracted metadata can be inserted into them. - Page processing hooks The first big feature in this release is the long-promised system of page processing hooks. This release has the following hooks: pre-parse, pre-process, post-index, render, and save. * pre-parse: operates on the page text before it's parsed, must place the modified page source in the page_source variable. * pre-process: operates on the page element tree just after parsing, may modify the page variable and set target_dir and target_file variables. * post-index: operates on the page element tree after index data extraction, can add more fields, and override fields in the index_entry variable. * render: takes over the rendering process, must put rendered page text in the page_source variable. * save: takes over the page output process. For example, this is how you can do global variable substitution with a pre-parse hook: [hooks.pre-parse] lua_source = ''' soupault_release = soupault_config["custom_options"]["latest_soupault_version"] Log.debug("running pre-parse hook") page_source = Regex.replace_all(page_source, "\\$SOUPAULT_RELEASE\\$", soupault_release) ''' - Lua index processors The second big feature is an option to write site index rendering code in Lua, in addition to the old options to specify a Jingoo template or use an external script. Here's a reimplementation of the built-in index_template option in Lua: [index.views.blog] index_selector = "div#blog-index" index_template = """ {% for e in entries %}
can be converted to <
by relative_links or to
by absolute_links.
- New Lua plugin API functions and variables
- New variables
* target_file (path to the output file, relative to the current
working directory).
* index_entry (the index entry of the page being processed if
index.index_first = true, otherwise it's nil).
- New functions
* String.slugify_soft(string) replaces all whitespace with hyphens,
but doesn't touch any other characters.
* String.url_encode(str) and String.url_decode(str) for working
with percent-encoded URLs.
* String.join_url as an alias for String.join_path_unix.
* HTML.to_string(etree) and HTML.pretty_print(etree) return string
representations of element trees, useful for save hooks.
* HTML.create_document() creates an empty element tree.
* HTML.clone_document(etree) make a copy of a complete element
tree.
* HTML.append_root(etree, node) adds a node after the last element.
* HTML.child_count(elem) returns the number of children of an
element.
* HTML.unwrap(elem) yanks the child elements out of a parent and
inserts them in its former place.
* Table.take(table, limit) removes up to limit items from a table
and returns them.
* Table.chunks(table, size) splits a table into chunks of up to
size items.
* Table.has_value(table, value) returns true if value is present in
table.
* Table.apply_to_values(func, table) applies function func to every
value in table (a simpler version of Table.apply if you don't
care about keys).
* Table.get_nested(table, {"key", "sub-key", ...}) and
Table.get_nested_default(table, {"key", "sub-key", ...}, default)
for easily retrieving values from nested tables (handy for
getting config options).
* Table.keys(table) returns a list of all keys in table.
* Sys.list_dir(path) returns a list of all files in path.
* Value.repr(value) returns a string representation of a Lua value
for debug output (similar to Python's __repr__ in spirit).
- Unicode string functions
* String.length is now Unicode-aware, the old ASCII-only version is
now available as String.length_ascii
* String.truncate is now Unicode-aware, the old ASCII-only version
is now available as String.truncate_ascii
- New command line options
* --dump-index-json