.Dd October 6, 2023 .Dt WEBDUMP 1 .Os .Sh NAME .Nm webdump .Nd convert HTML to plain-text .Sh SYNOPSIS .Nm .Op Fl 8adiIlrx .Op Fl b Ar baseurl .Op Fl s Ar selector .Op Fl u Ar selector .Op Fl w Ar termwidth .Sh DESCRIPTION .Nm reads UTF-8 HTML data from stdin. It converts and writes the output as plain-text to stdout. A .Ar baseurl can be specified if the links in the feed are relative URLs. This must be an absolute URI. .Pp The options are as follows: .Bl -tag -width Ds .It Fl 8 Use UTF-8 symbols for certain items like bullet items and rulers to make the output fancier. .It Fl a Toggle ANSI escape codes usage, by default it is not enabled. .It Fl b Ar baseurl Base URL of links. This is used to make links absolute. The specified URL is always preferred over the value in a tag. .It Fl d Deduplicate link references. When a duplicate link reference is found reuse the same link reference number. .It Fl i Toggle if link reference numbers are displayed inline or not, by default it is not enabled. .It Fl I Toggle if URLs for link reference are displayed inline or not, by default it is not enabled. .It Fl l Toggle if link references are displayed at the bottom or not, by default it is not enabled. .It Fl r Toggle if line-wrapping mode is enabled, by default it is not enabled. .It Fl s CSS-like selectors, this sets a reader mode to show only content matching the selector, see the section .Sx SELECTOR SYNTAX for the syntax. Multiple selectors can be specified by separating them with a comma. .It Fl u CSS-like selectors, this sets a reader mode to hide content matching the selector, see the section .Sx SELECTOR SYNTAX for the syntax. Multiple selectors can be specified by separating them with a comma. .It Fl w Ar termwidth The terminal width. The default is 77 characters. .It Fl x Write resources as TAB-separated lines to file descriptor 3. .El .Sh SELECTOR SYNTAX The syntax has some inspiration from CSS, but it is more limited. Some examples: .Bl -item .It "main" would match on the "main" tags. .It "#someid" would match on any tag which has the id attribute set to "someid". .It ".someclass" would match on any tag which has the class attribute set to "someclass". .It "main#someid" would match on the "main" tag which has the id attribute set to "someid". .It "main.someclass" would match on the "main" tags which has the class attribute set to "someclass". .It "ul li" would match on any "li" tag which also has a parent "ul" tag. .It "li@0" would match on any "li" tag which is also the first child element of its parent container. Note that this differs from filtering on a collection of "li" elements. .El .Sh EXIT STATUS .Ex -std .Sh EXAMPLES .Bd -literal url='https://codemadness.org/sfeed.html' curl -s "$url" | webdump -r -b "$url" | less curl -s "$url" | webdump -8 -a -i -l -r -b "$url" | less -R curl -s "$url" | webdump -s 'main' -8 -a -i -l -r -b "$url" | less -R .Ed .Pp To use .Nm as a HTML to text filter for example in the mutt mail client, change in ~/.mailcap: .Bd -literal text/html; webdump -i -l -r < %s; needsterminal; copiousoutput .Ed .Sh SEE ALSO .Xr curl 1 , .Xr xmllint 1 , .Xr xmlstarlet 1 , .Xr ftp 1 .Sh AUTHORS .An Hiltjo Posthuma Aq Mt hiltjo@codemadness.org