https://nts.strzibny.name/ruby-for-ebook-publishing/ /home/nts Ruby for ebook publishing 20 Sep 2021 A lot of times, people ask what's Ruby good for apart from Rails. Ruby is great for various tasks from several different domains, and today, I would like to share how anybody can use Ruby in publishing ebooks. Since I used some Ruby tasks in publishing my first-ever ebook Deployment from Scratch, it crossed my mind to write down why I think Ruby is great for publishing ebooks. PDF publishing There is a whole Ruby toolkit to publish technical content in AsciiDoc called Asciidoctor. It's a great toolkit to produce PDF, EPUB 3, or even manual pages. Here's a list of what Asciidoctor can do for you in terms of a PDF (stolen from their page): * Custom fonts (TTF or OTF) * Full SVG support (thanks to prawn-svg) * PDF document outline (i.e., bookmarks) * Title page * Table of contents page(s) * Document metadata (title, authors, subject, keywords, etc.) * Configurable page size (e.g., A4, Letter, Legal, etc) * Internal cross-reference links * Syntax highlighting with Rouge (preferred), Pygments, or CodeRay * Cover pages * Page background color or page background image with named scaling * Page numbering * Double-sided (aka prepress) printing mode (i.e., margins alternate on recto and verso pages) * Customizable running content (header and footer) * "Keep together" blocks (i.e., page breaks avoided in certain block content) * Orphaned section titles avoided * Autofit verbatim blocks (as permitted by base_font_size_min setting) * Table border settings honored * Font-based icons * Auto-generated index * Automatic hyphenation (when enabled) * Permissive line breaking for CJK languages * Compression / optimization of output file If you are thinking of publishing your first technical ebook, it's strong contender. Just get familiar with the limitations before starting. You would use AsciiDoc the same way as Markdown, although the syntax is different: = Hello, AsciiDoc! Doc Writer An introduction to http://asciidoc.org[AsciiDoc]. == First Section * item 1 * item 2 [source,ruby] puts "Hello, World!" You then save AsciiDoc content with the .adoc extension and convert it by running asciidoctor (default backend generates HTML): $ gem install asciidoctor-pdf $ asciidoctor -b docbook5 mysample.adoc $ asciidoctor -r asciidoctor-pdf -b pdf mysample.adoc And because this post is about Ruby, you can call it from Ruby: require 'asciidoctor' Asciidoctor.convert_file 'mysample.adoc' And also work with the generated content directly: html = Asciidoctor.convert_file 'mysample.adoc', to_file: false, header_footer: true puts html My journey went from an old gitbook version that could still generate a PDF from Markdown to Pandoc to keep the Markdown I had and enhance it with LaTex. For anything new, I would look into Asciidoctor first. You can start with their AsciiDoc Writer's Guide. As a side note, Asciidoctor uses the Prawn toolkit, which you can use directly for several different things. I used Prawn to build InvoicePrinter for example. Text transformations Since I ended up with Pandoc and a mixture of Markdown and LaTex, my default EPUB version didn't look good. See, in my text, I might have the following: ### Third headline Text paragraph. \cat{Something interesting that Tiger shares.} I made the fictional character Tiger the Cat to make the heavily technical book feel ligher and entertaining. I needed to draw a box with a Tiger picture and text next to it. And I made a LaTex \cat macro to do just that. To my surprise, the conversion to EPUB worked, but the result was horrible. So I needed to replace this macro with an HTML snipped for the EPUB version before the transformation happened. So I wrote a little script to find these occurrences and produce new sources for the EPUB: #!/usr/bin/ruby require 'fileutils' FileUtils.rm_rf 'epub_chapters' FileUtils.mkdir 'epub_chapters' Dir.glob('chapters/*.md') do |file| chapter = File.read file chapter.gsub!( /^\\cat{(.*)}$/, '
\1
' ) name = File.basename file File.write "epub_chapters/#{name}", chapter end There are many ways to pre- or post-process text, but this was my quick way to fix the EPUB version. Landing page Your book might sell better with an attractive landing page. If you plan on a full-blown website, I recommend looking into Ruby static site generator called Jekyll for which I wrote some tips. When I built my simple landing page, I realized that my chapter list went outdated while I continued to publish beta content. To that end, I decided to keep the short chapter description within the chapter Markdown source like this: # Processes Running a web application is essentially running a suite of related programs concurrently as processes. Spawning a program process can be as simple as typing its name into a terminal, but how do we ensure that this program won't stop at some point? We need to take a closer look at what Linux processes are and how to bring them back to life from failures. ... And I wrote a Ruby script that takes this meta information, makes HTML out of it, and updates the landing page: #!/usr/bin/ruby require 'redcarpet' BOOK_DIR="/home/strzibny/Projects/deploymentfromscratch" CHAPTER_DIR="#{BOOK_DIR}/chapters" class Chapter attr_reader :index, :title, :headline, :html def initialize(index:, title:, headline:, html:) @index = index @title = title @headline = headline @html = html end end chapters = [] Dir.glob("#{CHAPTER_DIR}/*.md").sort.drop(1).each.with_index(1) do |file, index| content = File.read(file) title = content.scan(/^\# (.*)$/).first&.first headline = content.scan(/^headline: (.*)$/).first&.first if title html = Redcarpet::Markdown.new(Redcarpet::Render::HTML.new).render(content) chapters << Chapter.new(index: index, title: title, headline: headline, html: html) end end # ...and later... BOOK_PAGE = "../index.html" sections = chapters.map do |chapter| <<-EOF
#{chapter.index}. #{chapter.title}

#{chapter.headline}

EOF end.join("\n") page = File.read BOOK_PAGE new_page = page.gsub( /.*/m, "\n#{sections}\n" ) File.open(BOOK_PAGE, "w") { |file| file.puts new_page } So remember, you can work with your sources and automate the landing page management. I used redcarpet gem for Markdown processing, and they are also other useful gems like front_matter_parser. PDF previews While I was writing the alpha and beta releases of Deployment from Scratch, I wanted to send a preview from time to time. The obvious way is to limit the pages you render and perhaps use a PDF editor to insert something else. Or you can use Ruby. Ruby ecosystem features a nice PDF toolkit called HexaPDF that can be used to cut the pages you want and interleave them with other pages (an introduction, a call to action, a reminder, or final words for the preview). An example: #!/usr/bin/ruby require 'hexapdf' demo = HexaPDF::Document.open("output/book.pdf") preview = HexaPDF::Document.new demo.pages.each_with_index { |page, page_index| if [0].include? page_index blank = preview.pages.add.canvas blank.font('Amiri', size: 25, variant: :bold) blank.text("This is a preview of Deployment from Scratch", at: [20, 800]) blank.font('Amiri', size: 20) blank.text("Follow the book updates at https://deploymentfromscratch.com/.", at: [20, 550]) blank.text("Write me what you think at strzibny@gmail.com.", at: [20, 500]) blank.text("Or catch me on Twitter at https://twitter.com/strzibnyj.", at: [20, 450]) blank.font('Amiri', size: 10) blank.text("Copyright by Josef Strzibny. All rights reserved.", at: [20, 20]) end } preview.write("output/preview.pdf", optimize: false) If you don't need to add custom content, you can also use HexaPDF from a command line to just merge various pages from one or many PDFs: $ hexapdf merge output/toc.pdf --pages 1-10 output/book.pdf --force Image previews I covered cutting out PDF previews, but I also wanted to include nice little image previews for my landing page. To that end, I separated the final PDF into individual PDF pages and converted them to images. Although there are various PDF utilities, it's easy to stick with HexaPDF for the first part of the job: #!/usr/bin/ruby require 'fileutils' require 'hexapdf' FileUtils.rm_rf 'preview' FileUtils.mkdir 'preview' file = "output/deploymentfromscratch.pdf" pdf = HexaPDF::Document.open(file) pdf.pages.each_with_index do |page, index| target = HexaPDF::Document.new target.pages << target.import(page) target.write("preview/#{index+1}.pdf", optimize: true) end Once I have individual PDFs, I go through them again and convert them to images with Ruby binding to vips: #!/usr/bin/ruby require 'fileutils' require 'vips' Dir.glob('preview/*.pdf') do |file| im = Vips::Image.new_from_file file, scale: 2.5 im.write_to_file("#{file}.jpg") end You can notice I had to increase the scale, otherwise the result is of poor quality. Once I have individual images I just insert them to my landing page. But you can also extend your Ruby task to do it for you automatically. Customers' management I built a waitlist of more than 600 people before releasing Deployment from Scratch, and many of the people on the list became customers. But you see, I use Gumroad for selling the book and Mailchimp for the waitlist. Two different products and two separate lists. What if I want to send a reminder or special offer to people that didn't buy the book yet? I certainly don't want to bother my current customers with an email they don't need. Or what if I want to find out the total conversion rate of the waitlist? Both tools offer to export the dataset, so all we need is a little bit of Ruby: #!/usr/bin/ruby require 'csv' # Customers from Gumroad customers = 'customers_sep14_2021.csv' # Waitlist list = 'subscribed_segment_export_48995a2a64.csv' customer_rows = CSV.read(customers) buyer_emails = [] (1..customer_rows.count-1).each do |num| email = customer_rows[num][4] buyer_emails << email if email end list_emails = [] list_rows = CSV.read(list) (1..list_rows.count-1).each do |num| email = list_rows[num][0] list_emails << email if email end # Who didn't bought the book yet not_bought = list_emails - buyer_emails puts not_bought.count If this is something you might want to do often, you can extend this to use the APIs directly without going through the manual download of the dataset. Maintainable tasks Although I started with a Makefile to stay on top of all these tasks, if Make is not in your blood, there is nothing easier than writing these tasks as Rake tasks: task :generate_pdf do `asciidoctor -r asciidoctor-pdf -b pdf mysample.adoc` end task :prepare_preview do # .. end And call it by running rake: $ rake generate_pdf Build environment It's a good idea to keep your book production environment intact. For one, new versions of various tools can break your original workflow and rendering. Or you might forget all the LaTex packages that got installed in the process. As with other projects, you can use Vagrant, and it's Ruby-powered Vagrantfile to keep your project in the same state and survive the unexpected. A starting Vagrantfile can be kept simple with just a little bit of Ruby and Bash: Vagrant.configure(2) do |config| config.vm.box = "fedora-33-cloud" config.vm.synced_folder ".", "/vagrant", type: :nfs, nfs_udp: false config.vm.provision "shell", inline: <<-SHELL sudo dnf update -y || : # Install dependencies... sudo dnf install ruby pandoc -y || : SHELL end Conclusion So there you have it - yet another domain where Ruby can help you, and perhaps even shines above the competition. A whole publishing toolkit, a Make-like build utility, PDF toolkits, and Ruby's power to write simple scripts for text manipulation. ruby publishing Any comments? Write me a direct message at @strzibnyj. [twitter] - IT'S OUT NOW [cover_160] I wrote a complete guide on web application deployment. Ruby with Puma, Python with Gunicorn, NGINX, PostgreSQL, Redis, networking, processes, systemd, backups, and all your usual suspects. More -