.\" Process this file with
.\" groff -man -Tutf8 akfquery.en.man
.\"
.
.TH "akfscraper" 1 2025-01-06 akfnetz
.nh
.
.\" Makros .TQ .EX .EE aus groff an-ext.tmac
.\" Copyright (C) 2007, 2009 Free Software Foundation, Inc.
.\" You may freely use, modify and/or distribute this file.
.
.de TQ
.br
.ns
.TP \\$1
..
.
.\" Beispielanfang
.de EX
.  nr mE \\n(.f
.  nf
.  nh
.  ft CW
..
.
.
.\" Beispielende
.de EE
.  ft \\n(mE
.  fi
.  hy \\n(HY
..
.
.SH NAME
akfscraper \- convert HTML/XHTML into plain text
.
.SH SYNOPSIS
.B akfscraper
.RI [ options "] [" HTML\~files ]
.
.SH DESCRIPTION
.
The program
.b akfscraper
converts HTML or XHTML data into plain text (Markdown).
.PP
When no files are provided, it reads the standard input.
Unless the option
.I -o
is used, it writes to the standard output.
That means, it can be used as a filter.
.PP
The output is always encoded as UTF-8.
For input it supports the encodings
UTF-8, ISO-8859-1 or codepage\~1252.
.PP
Links are added at the end as references.
.PP
.
.SH OPTIONS
.
.TP
-h
.TQ
--help
.TQ
--Hilfe
shows a short help
.PP
.TP
-V
.TQ
--version
.TQ
--Version
shows the version
.PP
.TP
-u
the input should be interpreted as UTF-8,
if no meta tag or XML signature says otherwise
.PP
.TP
.RI "-c " charset
.TQ
.RI "--charset " charset
.TQ
.RI "--Zeichesatz " charset
if the
.I charset
is set to UTF-8 or UTF8,
the input should be interpreted as UTF-8,
if no meta tag or XML signature says otherwise.
(This is a longer variant of -u for compatibility)
.PP
.TP
-l
.TQ
--Links
.TQ
--links
add external links as refereces
.PP
.TP
.RI "-L " number
like -l, but ignores
.I number
links at the beginning
.PP
.TP
-n
no automatic line breaks
(long lines)
.PP
.TP
-e
.TQ
--extended
.TQ
--erweitert
output with terminal control codes (SGR\~codes)
.PP
.TP
.RI "-o " \[dq]file\[dq]
.TQ
.RI "--output=" \[dq]file\[dq]
.TQ
.RI "--Ausgabe=" \[dq]file\[dq]
.TQ
.RI "--Datei=" \[dq]file\[dq]
writes output to
.I file
instead of the standard output
.PP
.TP
-f
.TQ
--force
.TQ
--forciere
overwrite existing files
.
.SH EXAMPLES
.
.EX
akfscraper -l -o article.text page1.html page2.html page3.html
.EE
.PP
.EX
akfscraper -e -l article.html | less -r
.EE
.PP
.EX
curl -sL "https://gnu.org" | akfscraper -e -l | less -r
.EE
.PP
Mailcap:
.EX
text/html; akfscraper -l -c %{charset}; copiousoutput
.EE
.
.SH AUTHORS
.
Copyright \(co 2025 Andreas K. F\[:o]rster
.PP
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
.PP
This program is distributed in the hope that it will be useful, but
.BR "WITHOUT ANY WARRANTY" ;
without even the implied warranty of
.BR MERCHANTABILITY " or " "FITNESS FOR A PARTICULAR PURPOSE" .
See the GNU General Public License for more details.
.PP
You should have received a copy of the GNU General Public License
along with this program.
If not, see <http://www.gnu.org/licenses/>.
.PP
.
.SH "SEE ALSO"
.BR more (1)
.BR less (1)
.BR akfweb-dl (1)
.BR curl (1)
.PP
https://akfoerster.de/p/akfnetz/
