https://nts.strzibny.name/ruby-for-ebook-publishing/

 /home/nts

Ruby for ebook publishing

20 Sep 2021

A lot of times, people ask what's Ruby good for apart from Rails.
Ruby is great for various tasks from several different domains, and
today, I would like to share how anybody can use Ruby in publishing
ebooks.

Since I used some Ruby tasks in publishing my first-ever ebook
Deployment from Scratch, it crossed my mind to write down why I think
Ruby is great for publishing ebooks.

PDF publishing

There is a whole Ruby toolkit to publish technical content in
AsciiDoc called Asciidoctor. It's a great toolkit to produce PDF,
EPUB 3, or even manual pages.

Here's a list of what Asciidoctor can do for you in terms of a PDF
(stolen from their page):

  * Custom fonts (TTF or OTF)
  * Full SVG support (thanks to prawn-svg)
  * PDF document outline (i.e., bookmarks)
  * Title page
  * Table of contents page(s)
  * Document metadata (title, authors, subject, keywords, etc.)
  * Configurable page size (e.g., A4, Letter, Legal, etc)
  * Internal cross-reference links
  * Syntax highlighting with Rouge (preferred), Pygments, or CodeRay
  * Cover pages
  * Page background color or page background image with named scaling
  * Page numbering
  * Double-sided (aka prepress) printing mode (i.e., margins
    alternate on recto and verso pages)
  * Customizable running content (header and footer)
  * "Keep together" blocks (i.e., page breaks avoided in certain
    block content)
  * Orphaned section titles avoided
  * Autofit verbatim blocks (as permitted by base_font_size_min
    setting)
  * Table border settings honored
  * Font-based icons
  * Auto-generated index
  * Automatic hyphenation (when enabled)
  * Permissive line breaking for CJK languages
  * Compression / optimization of output file

If you are thinking of publishing your first technical ebook, it's
strong contender. Just get familiar with the limitations before
starting. You would use AsciiDoc the same way as Markdown, although
the syntax is different:

= Hello, AsciiDoc!
Doc Writer <doc@example.com>

An introduction to http://asciidoc.org[AsciiDoc].

== First Section

* item 1
* item 2

[source,ruby]
puts "Hello, World!"

You then save AsciiDoc content with the .adoc extension and convert
it by running asciidoctor (default backend generates HTML):

$ gem install asciidoctor-pdf
$ asciidoctor -b docbook5 mysample.adoc
$ asciidoctor -r asciidoctor-pdf -b pdf mysample.adoc

And because this post is about Ruby, you can call it from Ruby:

require 'asciidoctor'

Asciidoctor.convert_file 'mysample.adoc'

And also work with the generated content directly:

html = Asciidoctor.convert_file 'mysample.adoc', to_file: false, header_footer: true
puts html

My journey went from an old gitbook version that could still generate
a PDF from Markdown to Pandoc to keep the Markdown I had and enhance
it with LaTex. For anything new, I would look into Asciidoctor first.
You can start with their AsciiDoc Writer's Guide.

As a side note, Asciidoctor uses the Prawn toolkit, which you can use
directly for several different things. I used Prawn to build
InvoicePrinter for example.

Text transformations

Since I ended up with Pandoc and a mixture of Markdown and LaTex, my
default EPUB version didn't look good. See, in my text, I might have
the following:

### Third headline

Text paragraph.

\cat{Something interesting that Tiger shares.}

I made the fictional character Tiger the Cat to make the heavily
technical book feel ligher and entertaining. I needed to draw a box
with a Tiger picture and text next to it. And I made a LaTex \cat
macro to do just that.

To my surprise, the conversion to EPUB worked, but the result was
horrible. So I needed to replace this macro with an HTML snipped for
the EPUB version before the transformation happened.

So I wrote a little script to find these occurrences and produce new
sources for the EPUB:

#!/usr/bin/ruby
require 'fileutils'

FileUtils.rm_rf 'epub_chapters'
FileUtils.mkdir 'epub_chapters'

Dir.glob('chapters/*.md') do |file|
  chapter = File.read file
  chapter.gsub!(
    /^\\cat{(.*)}$/,
    '<div class="cat"><div class="tiger"><img src=".."" /></div>\1</div>'
  )
  name = File.basename file
  File.write "epub_chapters/#{name}", chapter
end

There are many ways to pre- or post-process text, but this was my
quick way to fix the EPUB version.

Landing page

Your book might sell better with an attractive landing page. If you
plan on a full-blown website, I recommend looking into Ruby static
site generator called Jekyll for which I wrote some tips.

When I built my simple landing page, I realized that my chapter list
went outdated while I continued to publish beta content. To that end,
I decided to keep the short chapter description within the chapter
Markdown source like this:

# Processes

<!--
headline: A closer look at Linux processes. CPU and virtual memory, background processes, monitoring, debugging, systemd, system logging, and scheduled processes.
-->

Running a web application is essentially running a suite of related programs concurrently as processes. Spawning a program process can be as simple as typing its name into a terminal, but how do we ensure that this program won't stop at some point? We need to take a closer look at what Linux processes are and how to bring them back to life from failures.
...

And I wrote a Ruby script that takes this meta information, makes
HTML out of it, and updates the landing page:

#!/usr/bin/ruby
require 'redcarpet'

BOOK_DIR="/home/strzibny/Projects/deploymentfromscratch"
CHAPTER_DIR="#{BOOK_DIR}/chapters"

class Chapter
  attr_reader :index, :title, :headline, :html

  def initialize(index:, title:, headline:, html:)
    @index = index
    @title = title
    @headline = headline
    @html = html
  end
end

chapters = []

Dir.glob("#{CHAPTER_DIR}/*.md").sort.drop(1).each.with_index(1) do |file, index|
  content = File.read(file)
  title = content.scan(/^\# (.*)$/).first&.first
  headline = content.scan(/^headline: (.*)$/).first&.first

  if title
    html = Redcarpet::Markdown.new(Redcarpet::Render::HTML.new).render(content)
    chapters << Chapter.new(index: index, title: title, headline: headline, html: html)
  end
end

# ...and later...

BOOK_PAGE = "../index.html"

sections = chapters.map do |chapter|
  <<-EOF
    <div class="chapter">
      <strong>#{chapter.index}. #{chapter.title}</strong>
      <p>
        #{chapter.headline}
      </p>
    </div>
  EOF
end.join("\n")

page = File.read BOOK_PAGE
new_page = page.gsub(
  /<!--CHAPTERS START-->.*<!--CHAPTERS END-->/m,
  "<!--CHAPTERS START-->\n#{sections}\n<!--CHAPTERS END-->"
)
File.open(BOOK_PAGE, "w") { |file| file.puts new_page }

So remember, you can work with your sources and automate the landing
page management. I used redcarpet gem for Markdown processing, and
they are also other useful gems like front_matter_parser.

PDF previews

While I was writing the alpha and beta releases of Deployment from
Scratch, I wanted to send a preview from time to time. The obvious
way is to limit the pages you render and perhaps use a PDF editor to
insert something else. Or you can use Ruby.

Ruby ecosystem features a nice PDF toolkit called HexaPDF that can be
used to cut the pages you want and interleave them with other pages
(an introduction, a call to action, a reminder, or final words for
the preview). An example:

#!/usr/bin/ruby
require 'hexapdf'

demo = HexaPDF::Document.open("output/book.pdf")

preview = HexaPDF::Document.new

demo.pages.each_with_index { |page, page_index|
  if [0].include? page_index
    blank = preview.pages.add.canvas
    blank.font('Amiri', size: 25, variant: :bold)
    blank.text("This is a preview of Deployment from Scratch", at: [20, 800])
    blank.font('Amiri', size: 20)
    blank.text("Follow the book updates at https://deploymentfromscratch.com/.", at: [20, 550])
    blank.text("Write me what you think at strzibny@gmail.com.", at: [20, 500])
    blank.text("Or catch me on Twitter at https://twitter.com/strzibnyj.", at: [20, 450])
    blank.font('Amiri', size: 10)
    blank.text("Copyright by Josef Strzibny. All rights reserved.", at: [20, 20])
  end
}

preview.write("output/preview.pdf", optimize: false)

If you don't need to add custom content, you can also use HexaPDF
from a command line to just merge various pages from one or many
PDFs:

$ hexapdf merge output/toc.pdf --pages 1-10 output/book.pdf --force

Image previews

I covered cutting out PDF previews, but I also wanted to include nice
little image previews for my landing page. To that end, I separated
the final PDF into individual PDF pages and converted them to images.

Although there are various PDF utilities, it's easy to stick with
HexaPDF for the first part of the job:

#!/usr/bin/ruby
require 'fileutils'
require 'hexapdf'

FileUtils.rm_rf 'preview'
FileUtils.mkdir 'preview'

file = "output/deploymentfromscratch.pdf"

pdf = HexaPDF::Document.open(file)

pdf.pages.each_with_index do |page, index|
  target = HexaPDF::Document.new
  target.pages << target.import(page)
  target.write("preview/#{index+1}.pdf", optimize: true)
end

Once I have individual PDFs, I go through them again and convert them
to images with Ruby binding to vips:

#!/usr/bin/ruby
require 'fileutils'
require 'vips'

Dir.glob('preview/*.pdf') do |file|
  im = Vips::Image.new_from_file file, scale: 2.5
  im.write_to_file("#{file}.jpg")
end

You can notice I had to increase the scale, otherwise the result is
of poor quality.

Once I have individual images I just insert them to my landing page.
But you can also extend your Ruby task to do it for you
automatically.

Customers' management

I built a waitlist of more than 600 people before releasing
Deployment from Scratch, and many of the people on the list became
customers. But you see, I use Gumroad for selling the book and
Mailchimp for the waitlist. Two different products and two separate
lists.

What if I want to send a reminder or special offer to people that
didn't buy the book yet? I certainly don't want to bother my current
customers with an email they don't need. Or what if I want to find
out the total conversion rate of the waitlist?

Both tools offer to export the dataset, so all we need is a little
bit of Ruby:

#!/usr/bin/ruby
require 'csv'

# Customers from Gumroad
customers = 'customers_sep14_2021.csv'

# Waitlist
list = 'subscribed_segment_export_48995a2a64.csv'

customer_rows = CSV.read(customers)
buyer_emails = []

(1..customer_rows.count-1).each do |num|
  email = customer_rows[num][4]
  buyer_emails << email if email
end

list_emails = []
list_rows = CSV.read(list)

(1..list_rows.count-1).each do |num|
  email = list_rows[num][0]
  list_emails << email if email
end

# Who didn't bought the book yet
not_bought = list_emails - buyer_emails

puts not_bought.count

If this is something you might want to do often, you can extend this
to use the APIs directly without going through the manual download of
the dataset.

Maintainable tasks

Although I started with a Makefile to stay on top of all these tasks,
if Make is not in your blood, there is nothing easier than writing
these tasks as Rake tasks:

task :generate_pdf do
  `asciidoctor -r asciidoctor-pdf -b pdf mysample.adoc`
end

task :prepare_preview do
  # ..
end

And call it by running rake:

$ rake generate_pdf

Build environment

It's a good idea to keep your book production environment intact. For
one, new versions of various tools can break your original workflow
and rendering. Or you might forget all the LaTex packages that got
installed in the process.

As with other projects, you can use Vagrant, and it's Ruby-powered
Vagrantfile to keep your project in the same state and survive the
unexpected. A starting Vagrantfile can be kept simple with just a
little bit of Ruby and Bash:

Vagrant.configure(2) do |config|
  config.vm.box = "fedora-33-cloud"

  config.vm.synced_folder ".", "/vagrant", type: :nfs, nfs_udp: false

  config.vm.provision "shell", inline: <<-SHELL
sudo dnf update -y || :

# Install dependencies...
sudo dnf install ruby pandoc -y || :
SHELL

end

Conclusion

So there you have it - yet another domain where Ruby can help you,
and perhaps even shines above the competition. A whole publishing
toolkit, a Make-like build utility, PDF toolkits, and Ruby's power to
write simple scripts for text manipulation.

ruby publishing
Any comments? Write me a direct message at @strzibnyj.

[twitter]

- IT'S OUT NOW

[cover_160]

I wrote a complete guide on web application deployment. Ruby with
Puma, Python with Gunicorn, NGINX, PostgreSQL, Redis, networking,
processes, systemd, backups, and all your usual suspects.

More -