publish webdump post - www.codemadness.org - www.codemadness.org saait content files
(HTM) git clone git://git.codemadness.org/www.codemadness.org
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
(DIR) commit a958fcb1d4d7d22302bdb51fb1cdcda3afdd55b8
(DIR) parent 5138e644ee86e41f39538e43005ba6429e94e27f
(HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 28 Jun 2024 10:45:32 +0200
publish webdump post
The draft version was already linked from:
https://www.bttr-software.de/forum/board_entry.php?id=21923
Diffstat:
M config.cfg | 2 +-
M output/atom.xml | 14 +++++++++++++-
M output/atom_content.xml | 121 ++++++++++++++++++++++++++++++-
M output/index | 1 +
M output/index.html | 1 +
M output/rss.xml | 8 ++++++++
M output/rss_content.xml | 114 +++++++++++++++++++++++++++++++
M output/sitemap.xml | 4 ++++
M output/twtxt.txt | 1 +
M output/urllist.txt | 1 +
A pages/webdump.cfg | 6 ++++++
A pages/webdump.md | 135 +++++++++++++++++++++++++++++++
12 files changed, 405 insertions(+), 3 deletions(-)
---
(DIR) diff --git a/config.cfg b/config.cfg
@@ -1,5 +1,5 @@
# last updated the site.
-siteupdated = 2024-05-18
+siteupdated = 2024-06-28
sitetitle = Codemadness
siteurl = https://www.codemadness.org
(DIR) diff --git a/output/atom.xml b/output/atom.xml
@@ -2,7 +2,7 @@
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>Codemadness</title>
<subtitle>blog with various projects and articles about computer-related things</subtitle>
- <updated>2024-05-18T00:00:00Z</updated>
+ <updated>2024-06-28T00:00:00Z</updated>
<link rel="alternate" type="text/html" href="https://www.codemadness.org" />
<id>https://www.codemadness.org/atom.xml</id>
<link rel="self" type="application/atom+xml" href="https://www.codemadness.org/atom.xml" />
@@ -43,6 +43,18 @@
<summary>Improved Youtube Atom feed by adding video duration and filtering away shorts</summary>
</entry>
<entry>
+ <title>webdump HTML to plain-text converter</title>
+ <link rel="alternate" type="text/html" href="https://www.codemadness.org/webdump.html" />
+ <id>https://www.codemadness.org/webdump.html</id>
+ <updated>2023-11-20T00:00:00Z</updated>
+ <published>2023-11-20T00:00:00Z</published>
+ <author>
+ <name>Hiltjo</name>
+ <uri>https://www.codemadness.org</uri>
+ </author>
+ <summary>webdump HTML to plain-text converter</summary>
+</entry>
+<entry>
<title>Setup your own mail paste service</title>
<link rel="alternate" type="text/html" href="https://www.codemadness.org/mailservice.html" />
<id>https://www.codemadness.org/mailservice.html</id>
(DIR) diff --git a/output/atom_content.xml b/output/atom_content.xml
@@ -2,7 +2,7 @@
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>Codemadness</title>
<subtitle>blog with various projects and articles about computer-related things</subtitle>
- <updated>2024-05-18T00:00:00Z</updated>
+ <updated>2024-06-28T00:00:00Z</updated>
<link rel="alternate" type="text/html" href="https://www.codemadness.org" />
<id>https://www.codemadness.org/atom_content.xml</id>
<link rel="self" type="application/atom+xml" href="https://www.codemadness.org/atom_content.xml" />
@@ -512,6 +512,125 @@ feeds() {
]]></content>
</entry>
<entry>
+ <title>webdump HTML to plain-text converter</title>
+ <link rel="alternate" type="text/html" href="https://www.codemadness.org/webdump.html" />
+ <id>https://www.codemadness.org/webdump.html</id>
+ <updated>2023-11-20T00:00:00Z</updated>
+ <published>2023-11-20T00:00:00Z</published>
+ <author>
+ <name>Hiltjo</name>
+ <uri>https://www.codemadness.org</uri>
+ </author>
+ <summary>webdump HTML to plain-text converter</summary>
+ <content type="html"><![CDATA[<h1>webdump HTML to plain-text converter</h1>
+ <p><strong>Last modification on </strong> <time>2023-11-20</time></p>
+ <p>webdump is (yet another) HTML to plain-text converter tool.</p>
+<p>It reads HTML in UTF-8 from stdin and writes plain-text to stdout.</p>
+<h2>Goals and scope</h2>
+<p>The main goal of this tool for me is to use it for converting HTML mails to
+plain-text and to convert HTML content in RSS feeds to plain-text.</p>
+<p>The tool will only convert HTML to stdout, similarly to links -dump or lynx
+-dump but simpler and more secure.</p>
+<ul>
+<li>HTML and XHTML will be supported.</li>
+<li>There will be some workarounds and quirks for broken and legacy HTML code.</li>
+<li>It will be usable and secure for reading HTML from mails and RSS/Atom feeds.</li>
+<li>No remote resources which are part of the HTML will be downloaded:
+images, video, audio, etc. But these may be visible as a link reference.</li>
+<li>Data will be written to stdout. Intended for plain-text or a text terminal.</li>
+<li>No support for Javascript, CSS, frame rendering or form processing.</li>
+<li>No HTTP or network protocol handling: HTML data is read from stdin.</li>
+<li>Listings for references and some options to extract them in a list that is
+usable for scripting. Some references are: link anchors, images, audio, video,
+HTML (i)frames, etc.</li>
+<li>Security: on OpenBSD it uses pledge("stdio", NULL).</li>
+<li>Keep the code relatively small, simple and hackable.</li>
+</ul>
+<h2>Features</h2>
+<ul>
+<li>Support for word-wrapping.</li>
+<li>A mode to enable basic markup: bold, underline, italic and blink ;)</li>
+<li>Indentation of headers, paragraphs, pre and list items.</li>
+<li>Basic support to query an elements or hide them.</li>
+<li>Show link references.</li>
+<li>Show link references and resources such as img, video, audio, subtitles.</li>
+<li>Export link references and resources to a TAB-separated format.</li>
+</ul>
+<h2>Usage examples</h2>
+<pre><code>url='https://codemadness.org/sfeed.html'
+
+curl -s "$url" | webdump -r -b "$url" | less
+
+curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R
+
+curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R
+</code></pre>
+<p>Yes, all these option flags look ugly, a shellscript wrapper could be used :)</p>
+<h2>Practical examples</h2>
+<p>To use webdump as a HTML to text filter for example in the mutt mail client,
+change in ~/.mailcap:</p>
+<pre><code>text/html; webdump -i -l -r < %s; needsterminal; copiousoutput
+</code></pre>
+<p>In mutt you should then add:</p>
+<pre><code>auto_view text/html
+</code></pre>
+<p>Using webdump as a HTML to text filter for sfeed_curses (otherwise the default is lynx):</p>
+<pre><code>SFEED_HTMLCONV="webdump -d -8 -r -i -l -a" sfeed_curses ~/.sfeed/feeds/*
+</code></pre>
+<h1>Query/selector examples</h1>
+<p>The query syntax using the -s option is a bit inspired by CSS (but much more limited).</p>
+<p>To get the title from a HTML page:</p>
+<pre><code>url='https://codemadness.org/sfeed.html'
+
+title=$(curl -s "$url" | webdump -s 'title' "$url")
+printf '%s\n' "$title"
+</code></pre>
+<p>List audio and video-related content from a HTML page, redirect fd 3 to fd 1 (stdout):</p>
+<pre><code>url="https://media.ccc.de/v/051_Recent_features_to_OpenBSD-ntpd_and_bgpd"
+curl -s "$url" | webdump -x -s 'audio,video' "$url" 3>&1 >/dev/null | cut -f 2
+</code></pre>
+<h2>Clone</h2>
+<pre><code>git clone git://git.codemadness.org/webdump
+</code></pre>
+<h2>Browse</h2>
+<p>You can browse the source-code at:</p>
+<ul>
+<li><a href="https://git.codemadness.org/webdump/">https://git.codemadness.org/webdump/</a></li>
+<li><a href="gopher://codemadness.org/1/git/webdump">gopher://codemadness.org/1/git/webdump</a></li>
+</ul>
+<h2>Build and install</h2>
+<pre><code>$ make
+# make install
+</code></pre>
+<h2>Dependencies</h2>
+<ul>
+<li>C compiler.</li>
+<li>libc + some BSDisms.</li>
+</ul>
+<h2>Trade-offs</h2>
+<p>All software has trade-offs.</p>
+<p>webdump processes HTML in a single-pass. It does not buffer the full DOM tree.
+Although due to the nature of HTML/XML some parts like attributes need to be
+buffered.</p>
+<p>Rendering tables in webdump is very limited. Twibright Links has really nice
+table rendering. However implementing a similar feature in the current design of
+webdump would make the code much more complex. Twibright links
+processes a full DOM tree and processes the tables in multiple passes (to
+measure the table cells) etc. Of course tables can be nested also, or HTML tables
+that are used for creating layouts (these are mostly older webpages).</p>
+<p>These trade-offs and preferences are chosen for now. It may change in the
+future. Fortunately there are the usual good suspects for HTML to plain-text
+conversion, each with their own chosen trade-offs of course:</p>
+<ul>
+<li>twibright links: <a href="http://links.twibright.com/">http://links.twibright.com/</a></li>
+<li>lynx: <a href="https://lynx.invisible-island.net/">https://lynx.invisible-island.net/</a></li>
+<li>w3m: <a href="https://w3m.sourceforge.net/">https://w3m.sourceforge.net/</a></li>
+<li>xmllint (part of libxml2): <a href="https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home">https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home</a></li>
+<li>xmlstarlet: <a href="https://xmlstar.sourceforge.net/">https://xmlstar.sourceforge.net/</a></li>
+</ul>
+]]></content>
+</entry>
+<entry>
<title>Setup your own mail paste service</title>
<link rel="alternate" type="text/html" href="https://www.codemadness.org/mailservice.html" />
<id>https://www.codemadness.org/mailservice.html</id>
(DIR) diff --git a/output/index b/output/index
@@ -14,6 +14,7 @@ i codemadness.org 70
12024-02-02 Chess puzzle book generator /phlog/chess-puzzles codemadness.org 70
12023-11-22 xargs: an example for parallel batch jobs /phlog/xargs codemadness.org 70
12023-11-20 Improved Youtube RSS/Atom feed /phlog/youtube-feed codemadness.org 70
+12023-11-20 webdump HTML to plain-text converter /phlog/webdump codemadness.org 70
12023-10-25 Setup your own mail paste service /phlog/mailservice codemadness.org 70
12022-07-01 A simple TODO application /phlog/todo codemadness.org 70
12022-03-23 2FA TOTP without crappy authenticator apps /phlog/totp codemadness.org 70
(DIR) diff --git a/output/index.html b/output/index.html
@@ -43,6 +43,7 @@
<tr><td><time>2024-02-02</time></td><td><a href="chess-puzzles.html">Chess puzzle book generator</a></td></tr>
<tr><td><time>2023-11-22</time></td><td><a href="xargs.html">xargs: an example for parallel batch jobs</a></td></tr>
<tr><td><time>2023-11-20</time></td><td><a href="youtube-feed.html">Improved Youtube RSS/Atom feed</a></td></tr>
+<tr><td><time>2023-11-20</time></td><td><a href="webdump.html">webdump HTML to plain-text converter</a></td></tr>
<tr><td><time>2023-10-25</time></td><td><a href="mailservice.html">Setup your own mail paste service</a></td></tr>
<tr><td><time>2022-07-01</time></td><td><a href="todo-application.html">A simple TODO application</a></td></tr>
<tr><td><time>2022-03-23</time></td><td><a href="totp.html">2FA TOTP without crappy authenticator apps</a></td></tr>
(DIR) diff --git a/output/rss.xml b/output/rss.xml
@@ -31,6 +31,14 @@
<description>Improved Youtube Atom feed by adding video duration and filtering away shorts</description>
</item>
<item>
+ <title>webdump HTML to plain-text converter</title>
+ <link>https://www.codemadness.org/webdump.html</link>
+ <guid>https://www.codemadness.org/webdump.html</guid>
+ <dc:date>2023-11-20T00:00:00Z</dc:date>
+ <author>Hiltjo</author>
+ <description>webdump HTML to plain-text converter</description>
+</item>
+<item>
<title>Setup your own mail paste service</title>
<link>https://www.codemadness.org/mailservice.html</link>
<guid>https://www.codemadness.org/mailservice.html</guid>
(DIR) diff --git a/output/rss_content.xml b/output/rss_content.xml
@@ -497,6 +497,120 @@ feeds() {
]]></description>
</item>
<item>
+ <title>webdump HTML to plain-text converter</title>
+ <link>https://www.codemadness.org/webdump.html</link>
+ <guid>https://www.codemadness.org/webdump.html</guid>
+ <dc:date>2023-11-20T00:00:00Z</dc:date>
+ <author>Hiltjo</author>
+ <description><![CDATA[<h1>webdump HTML to plain-text converter</h1>
+ <p><strong>Last modification on </strong> <time>2023-11-20</time></p>
+ <p>webdump is (yet another) HTML to plain-text converter tool.</p>
+<p>It reads HTML in UTF-8 from stdin and writes plain-text to stdout.</p>
+<h2>Goals and scope</h2>
+<p>The main goal of this tool for me is to use it for converting HTML mails to
+plain-text and to convert HTML content in RSS feeds to plain-text.</p>
+<p>The tool will only convert HTML to stdout, similarly to links -dump or lynx
+-dump but simpler and more secure.</p>
+<ul>
+<li>HTML and XHTML will be supported.</li>
+<li>There will be some workarounds and quirks for broken and legacy HTML code.</li>
+<li>It will be usable and secure for reading HTML from mails and RSS/Atom feeds.</li>
+<li>No remote resources which are part of the HTML will be downloaded:
+images, video, audio, etc. But these may be visible as a link reference.</li>
+<li>Data will be written to stdout. Intended for plain-text or a text terminal.</li>
+<li>No support for Javascript, CSS, frame rendering or form processing.</li>
+<li>No HTTP or network protocol handling: HTML data is read from stdin.</li>
+<li>Listings for references and some options to extract them in a list that is
+usable for scripting. Some references are: link anchors, images, audio, video,
+HTML (i)frames, etc.</li>
+<li>Security: on OpenBSD it uses pledge("stdio", NULL).</li>
+<li>Keep the code relatively small, simple and hackable.</li>
+</ul>
+<h2>Features</h2>
+<ul>
+<li>Support for word-wrapping.</li>
+<li>A mode to enable basic markup: bold, underline, italic and blink ;)</li>
+<li>Indentation of headers, paragraphs, pre and list items.</li>
+<li>Basic support to query an elements or hide them.</li>
+<li>Show link references.</li>
+<li>Show link references and resources such as img, video, audio, subtitles.</li>
+<li>Export link references and resources to a TAB-separated format.</li>
+</ul>
+<h2>Usage examples</h2>
+<pre><code>url='https://codemadness.org/sfeed.html'
+
+curl -s "$url" | webdump -r -b "$url" | less
+
+curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R
+
+curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R
+</code></pre>
+<p>Yes, all these option flags look ugly, a shellscript wrapper could be used :)</p>
+<h2>Practical examples</h2>
+<p>To use webdump as a HTML to text filter for example in the mutt mail client,
+change in ~/.mailcap:</p>
+<pre><code>text/html; webdump -i -l -r < %s; needsterminal; copiousoutput
+</code></pre>
+<p>In mutt you should then add:</p>
+<pre><code>auto_view text/html
+</code></pre>
+<p>Using webdump as a HTML to text filter for sfeed_curses (otherwise the default is lynx):</p>
+<pre><code>SFEED_HTMLCONV="webdump -d -8 -r -i -l -a" sfeed_curses ~/.sfeed/feeds/*
+</code></pre>
+<h1>Query/selector examples</h1>
+<p>The query syntax using the -s option is a bit inspired by CSS (but much more limited).</p>
+<p>To get the title from a HTML page:</p>
+<pre><code>url='https://codemadness.org/sfeed.html'
+
+title=$(curl -s "$url" | webdump -s 'title' "$url")
+printf '%s\n' "$title"
+</code></pre>
+<p>List audio and video-related content from a HTML page, redirect fd 3 to fd 1 (stdout):</p>
+<pre><code>url="https://media.ccc.de/v/051_Recent_features_to_OpenBSD-ntpd_and_bgpd"
+curl -s "$url" | webdump -x -s 'audio,video' "$url" 3>&1 >/dev/null | cut -f 2
+</code></pre>
+<h2>Clone</h2>
+<pre><code>git clone git://git.codemadness.org/webdump
+</code></pre>
+<h2>Browse</h2>
+<p>You can browse the source-code at:</p>
+<ul>
+<li><a href="https://git.codemadness.org/webdump/">https://git.codemadness.org/webdump/</a></li>
+<li><a href="gopher://codemadness.org/1/git/webdump">gopher://codemadness.org/1/git/webdump</a></li>
+</ul>
+<h2>Build and install</h2>
+<pre><code>$ make
+# make install
+</code></pre>
+<h2>Dependencies</h2>
+<ul>
+<li>C compiler.</li>
+<li>libc + some BSDisms.</li>
+</ul>
+<h2>Trade-offs</h2>
+<p>All software has trade-offs.</p>
+<p>webdump processes HTML in a single-pass. It does not buffer the full DOM tree.
+Although due to the nature of HTML/XML some parts like attributes need to be
+buffered.</p>
+<p>Rendering tables in webdump is very limited. Twibright Links has really nice
+table rendering. However implementing a similar feature in the current design of
+webdump would make the code much more complex. Twibright links
+processes a full DOM tree and processes the tables in multiple passes (to
+measure the table cells) etc. Of course tables can be nested also, or HTML tables
+that are used for creating layouts (these are mostly older webpages).</p>
+<p>These trade-offs and preferences are chosen for now. It may change in the
+future. Fortunately there are the usual good suspects for HTML to plain-text
+conversion, each with their own chosen trade-offs of course:</p>
+<ul>
+<li>twibright links: <a href="http://links.twibright.com/">http://links.twibright.com/</a></li>
+<li>lynx: <a href="https://lynx.invisible-island.net/">https://lynx.invisible-island.net/</a></li>
+<li>w3m: <a href="https://w3m.sourceforge.net/">https://w3m.sourceforge.net/</a></li>
+<li>xmllint (part of libxml2): <a href="https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home">https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home</a></li>
+<li>xmlstarlet: <a href="https://xmlstar.sourceforge.net/">https://xmlstar.sourceforge.net/</a></li>
+</ul>
+]]></description>
+</item>
+<item>
<title>Setup your own mail paste service</title>
<link>https://www.codemadness.org/mailservice.html</link>
<guid>https://www.codemadness.org/mailservice.html</guid>
(DIR) diff --git a/output/sitemap.xml b/output/sitemap.xml
@@ -13,6 +13,10 @@
<lastmod>2023-11-20</lastmod>
</url>
<url>
+ <loc>https://www.codemadness.org/webdump.html</loc>
+ <lastmod>2023-11-20</lastmod>
+</url>
+<url>
<loc>https://www.codemadness.org/mailservice.html</loc>
<lastmod>2024-02-10</lastmod>
</url>
(DIR) diff --git a/output/twtxt.txt b/output/twtxt.txt
@@ -1,6 +1,7 @@
2024-02-02T00:00:00Z Chess puzzle book generator: https://www.codemadness.org/chess-puzzles.html
2023-11-22T00:00:00Z xargs: an example for parallel batch jobs: https://www.codemadness.org/xargs.html
2023-11-20T00:00:00Z Improved Youtube RSS/Atom feed: https://www.codemadness.org/youtube-feed.html
+2023-11-20T00:00:00Z webdump HTML to plain-text converter: https://www.codemadness.org/webdump.html
2023-10-25T00:00:00Z Setup your own mail paste service: https://www.codemadness.org/mailservice.html
2022-07-01T00:00:00Z A simple TODO application: https://www.codemadness.org/todo-application.html
2022-03-23T00:00:00Z 2FA TOTP without crappy authenticator apps: https://www.codemadness.org/totp.html
(DIR) diff --git a/output/urllist.txt b/output/urllist.txt
@@ -1,6 +1,7 @@
https://www.codemadness.org/chess-puzzles.html
https://www.codemadness.org/xargs.html
https://www.codemadness.org/youtube-feed.html
+https://www.codemadness.org/webdump.html
https://www.codemadness.org/mailservice.html
https://www.codemadness.org/todo-application.html
https://www.codemadness.org/totp.html
(DIR) diff --git a/pages/webdump.cfg b/pages/webdump.cfg
@@ -0,0 +1,6 @@
+title = webdump HTML to plain-text converter
+id = webdump
+description = webdump HTML to plain-text converter
+keywords = webdump, HTML to plain-text, converter, formatter
+created = 2023-11-20
+updated = 2023-11-20
(DIR) diff --git a/pages/webdump.md b/pages/webdump.md
@@ -0,0 +1,135 @@
+webdump is (yet another) HTML to plain-text converter tool.
+
+It reads HTML in UTF-8 from stdin and writes plain-text to stdout.
+
+
+## Goals and scope
+
+The main goal of this tool for me is to use it for converting HTML mails to
+plain-text and to convert HTML content in RSS feeds to plain-text.
+
+The tool will only convert HTML to stdout, similarly to links -dump or lynx
+-dump but simpler and more secure.
+
+* HTML and XHTML will be supported.
+* There will be some workarounds and quirks for broken and legacy HTML code.
+* It will be usable and secure for reading HTML from mails and RSS/Atom feeds.
+* No remote resources which are part of the HTML will be downloaded:
+ images, video, audio, etc. But these may be visible as a link reference.
+* Data will be written to stdout. Intended for plain-text or a text terminal.
+* No support for Javascript, CSS, frame rendering or form processing.
+* No HTTP or network protocol handling: HTML data is read from stdin.
+* Listings for references and some options to extract them in a list that is
+ usable for scripting. Some references are: link anchors, images, audio, video,
+ HTML (i)frames, etc.
+* Security: on OpenBSD it uses pledge("stdio", NULL).
+* Keep the code relatively small, simple and hackable.
+
+
+## Features
+
+* Support for word-wrapping.
+* A mode to enable basic markup: bold, underline, italic and blink ;)
+* Indentation of headers, paragraphs, pre and list items.
+* Basic support to query an elements or hide them.
+* Show link references.
+* Show link references and resources such as img, video, audio, subtitles.
+* Export link references and resources to a TAB-separated format.
+
+
+## Usage examples
+
+ url='https://codemadness.org/sfeed.html'
+
+ curl -s "$url" | webdump -r -b "$url" | less
+
+ curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R
+
+ curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R
+
+Yes, all these option flags look ugly, a shellscript wrapper could be used :)
+
+
+## Practical examples
+
+To use webdump as a HTML to text filter for example in the mutt mail client,
+change in ~/.mailcap:
+
+ text/html; webdump -i -l -r < %s; needsterminal; copiousoutput
+
+In mutt you should then add:
+
+ auto_view text/html
+
+
+Using webdump as a HTML to text filter for sfeed_curses (otherwise the default is lynx):
+
+ SFEED_HTMLCONV="webdump -d -8 -r -i -l -a" sfeed_curses ~/.sfeed/feeds/*
+
+
+# Query/selector examples
+
+The query syntax using the -s option is a bit inspired by CSS (but much more limited).
+
+To get the title from a HTML page:
+
+ url='https://codemadness.org/sfeed.html'
+
+ title=$(curl -s "$url" | webdump -s 'title' "$url")
+ printf '%s\n' "$title"
+
+List audio and video-related content from a HTML page, redirect fd 3 to fd 1 (stdout):
+
+ url="https://media.ccc.de/v/051_Recent_features_to_OpenBSD-ntpd_and_bgpd"
+ curl -s "$url" | webdump -x -s 'audio,video' "$url" 3>&1 >/dev/null | cut -f 2
+
+
+## Clone
+
+ git clone git://git.codemadness.org/webdump
+
+
+## Browse
+
+You can browse the source-code at:
+
+* <https://git.codemadness.org/webdump/>
+* <gopher://codemadness.org/1/git/webdump>
+
+
+## Build and install
+
+ $ make
+ # make install
+
+
+## Dependencies
+
+* C compiler.
+* libc + some BSDisms.
+
+
+## Trade-offs
+
+All software has trade-offs.
+
+webdump processes HTML in a single-pass. It does not buffer the full DOM tree.
+Although due to the nature of HTML/XML some parts like attributes need to be
+buffered.
+
+Rendering tables in webdump is very limited. Twibright Links has really nice
+table rendering. However implementing a similar feature in the current design of
+webdump would make the code much more complex. Twibright links
+processes a full DOM tree and processes the tables in multiple passes (to
+measure the table cells) etc. Of course tables can be nested also, or HTML tables
+that are used for creating layouts (these are mostly older webpages).
+
+These trade-offs and preferences are chosen for now. It may change in the
+future. Fortunately there are the usual good suspects for HTML to plain-text
+conversion, each with their own chosen trade-offs of course:
+
+* twibright links: <http://links.twibright.com/>
+* lynx: <https://lynx.invisible-island.net/>
+* w3m: <https://w3m.sourceforge.net/>
+* xmllint (part of libxml2): <https://gitlab.gnome.org/GNOME/libxml2/-/wikis/home>
+* xmlstarlet: <https://xmlstar.sourceforge.net/>