man page improvements (sync) - tscrape - twitter scraper
(HTM) git clone git://git.codemadness.org/tscrape
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
(DIR) commit b0413f42bd2bc31cbbb5e338093de51b94cfd028
(DIR) parent 423d3f5ad6023be3eb50ebe2f9504309bfe3d940
(HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 20 Mar 2020 12:03:26 +0100
man page improvements (sync)
- tscraperc.5: use the same order as executed in the tscrape_update file.
- tscraperc.5: reference curl, which are optional, but used by default.
- tscraperc.5: use a .Sh VARIABLES section for tscrapepath and maxjobs.
- tscrape_update.1: split config format-specific documentation and reference it.
- just use the term "url" instead of "uri".
- shorten some texts, increasing readability.
- document exit status of tools.
fix:
- do not reference RSS/Atom.
Diffstat:
M tscrape.1 | 4 +++-
M tscrape.5 | 6 +++---
M tscrape_html.1 | 4 +++-
M tscrape_plain.1 | 4 +++-
M tscrape_update.1 | 56 ++++++++++---------------------
M tscraperc.5 | 61 ++++++++++++++++++-------------
6 files changed, 65 insertions(+), 70 deletions(-)
---
(DIR) diff --git a/tscrape.1 b/tscrape.1
@@ -1,4 +1,4 @@
-.Dd May 11, 2018
+.Dd March 20, 2020
.Dt TSCRAPE 1
.Os
.Sh NAME
@@ -35,6 +35,8 @@ Item Retweet ID.
.It item is pinned
Item is pinned or not? 0 or 1.
.El
+.Sh EXIT STATUS
+.Ex -std
.Sh EXAMPLES
.Bd -literal -offset left
curl --http1.0 -H 'User-Agent:' -s 'https://twitter.com/namehere' | tscrape
(DIR) diff --git a/tscrape.5 b/tscrape.5
@@ -1,4 +1,4 @@
-.Dd July 20, 2019
+.Dd March 20, 2020
.Dt TSCRAPE 5
.Os
.Sh NAME
@@ -21,8 +21,8 @@ Control characters are replaced by a single space.
.Pp
The order and content of the fields are:
.Bl -tag -width 17n
-.It UNIX timestamp
-UNIX timestamp in UTC+0.
+.It timestamp
+UNIX timestamp in UTC+0, empty on parse failure.
.It username
Twitter username (can be a retweet).
.It fullname
(DIR) diff --git a/tscrape_html.1 b/tscrape_html.1
@@ -1,4 +1,4 @@
-.Dd July 20, 2019
+.Dd March 20, 2020
.Dt TSCRAPE_HTML 1
.Os
.Sh NAME
@@ -26,6 +26,8 @@ is empty.
.Pp
Items with a timestamp from the last day compared to the system time at the
time of formatting are counted and marked as new.
+.Sh EXIT STATUS
+.Ex -std
.Sh SEE ALSO
.Xr tscrape 1 ,
.Xr tscrape_plain 1 ,
(DIR) diff --git a/tscrape_plain.1 b/tscrape_plain.1
@@ -1,4 +1,4 @@
-.Dd July 20, 2019
+.Dd March 20, 2020
.Dt TSCRAPE_PLAIN 1
.Os
.Sh NAME
@@ -38,6 +38,8 @@ per rune, using
.Xr mbtowc 3
and
.Xr wcwidth 3 .
+.Sh EXIT STATUS
+.Ex -std
.Sh SEE ALSO
.Xr tscrape 1 ,
.Xr tscrape_html 1 ,
(DIR) diff --git a/tscrape_update.1 b/tscrape_update.1
@@ -1,4 +1,4 @@
-.Dd August 17, 2019
+.Dd March 20, 2020
.Dt TSCRAPE_UPDATE 1
.Os
.Sh NAME
@@ -9,65 +9,43 @@
.Op Ar tscraperc
.Sh DESCRIPTION
.Nm
-updates feeds files and merges the new data with the previous files.
-These are the files in the directory
+writes TAB-separated feed files and merges new items with the items in any
+existing files.
+The items are stored in one file per feed in the directory
.Pa $HOME/.tscrape/feeds
by default.
+The directory can be changed in the
+.Xr tscraperc 5
+file.
.Sh OPTIONS
.Bl -tag -width 17n
.It Ar tscraperc
-Config file, if not specified uses the path
+Config file.
+The default is
.Pa $HOME/.tscrape/tscraperc
-by default.
-See the
-.Sx FILES READ
-section for more information.
.El
.Sh FILES READ
.Bl -tag -width 17n
.It Ar tscraperc
-Config file, see the tscraperc.example file for an example.
This file is evaluated as a shellscript in
.Nm .
-.Pp
-Atleast the following functions can be overridden per feed:
-.Bl -tag -width 17n
-.It Fn fetch
-to use
-.Xr wget 1 ,
-OpenBSD
-.Xr ftp 1
-or an other download program.
-.It Fn merge
-to change the merge logic.
-.It Fn filter
-to filter on fields.
-.It Fn order
-to change the sort order.
-.El
-.Pp
-The
-.Fn feeds
-function is called to process the feeds.
-The default
-.Fn feed
-function is executed concurrently as a background job in your
+See also the
.Xr tscraperc 5
-config file to make updating faster.
-The variable
-.Va maxjobs
-can be changed to limit or increase the amount of concurrent jobs (8 by
-default).
+man page for a detailed description of the format and an example file.
.El
.Sh FILES WRITTEN
.Bl -tag -width 17n
.It feedname
-TAB-separated format containing all items per feed.
+TAB-separated
+.Xr tscrape 5
+format containing all items per feed.
The
.Nm
script merges new items with this file.
-The filename cannot contain '/' characters, they will be replaced with '_'.
+The feedname cannot contain '/' characters, they will be replaced with '_'.
.El
+.Sh EXIT STATUS
+.Ex -std
.Sh EXAMPLES
To update your feeds and format them in various formats:
.Bd -literal
(DIR) diff --git a/tscraperc.5 b/tscraperc.5
@@ -1,4 +1,4 @@
-.Dd July 14, 2019
+.Dd March 20, 2020
.Dt TSCRAPERC 5
.Os
.Sh NAME
@@ -8,30 +8,36 @@
.Nm
is the configuration file for
.Xr tscrape_update 1 .
-.Pp
-The variable
-.Va tscrapepath
-can be set for the directory to store the TAB-separated feed files,
-by default this is
+.Sh VARIABLES
+.Bl -tag -width Ds
+.It Va tscrapepath
+can be set for the directory to store the TAB-separated feed files.
+The default is
.Pa $HOME/.tscrape/feeds .
-.
+.It Va maxjobs
+can be used to change the amount of concurrent
+.Fn feed
+jobs.
+The default is 8.
+.El
.Sh FUNCTIONS
-The following functions must be defined in a
-.Nm
-file:
.Bl -tag -width Ds
.It Fn feeds
-This function is like a "main" function called from
+This function is the required "main" entry-point function called from
.Xr tscrape_update 1 .
.It Fn feed "name" "feedurl"
-Function to process the feed, its arguments are in the order:
+Inside the
+.Fn feeds
+function feeds can be defined by calling the
+.Fn feed
+function, its arguments are:
.Bl -tag -width Ds
.It Fa name
Name of the feed, this is also used as the filename for the TAB-separated
feed file.
-The filename cannot contain '/' characters, they will be replaced with '_'.
+The feedname cannot contain '/' characters, they will be replaced with '_'.
.It Fa feedurl
-Uri to fetch the RSS/Atom data from, usually a HTTP or HTTPS uri.
+Url to fetch the data from, usually a HTTP or HTTPS url.
.El
.El
.Sh OVERRIDE FUNCTIONS
@@ -40,16 +46,28 @@ Because
is a shellscript each function can be overridden to change its behaviour,
notable functions are:
.Bl -tag -width Ds
-.It Fn fetch "name" "uri" "feedfile"
+.It Fn fetch "name" "url" "feedfile"
Fetch feed from url and writes data to stdout, its arguments are:
.Bl -tag -width Ds
.It Fa name
Specified name in configuration file (useful for logging).
-.It Fa uri
-Uri to fetch.
+.It Fa url
+Url to fetch.
.It Fa feedfile
Used feedfile (useful for comparing modification times).
.El
+.Pp
+By default the tool
+.Xr curl 1
+is used.
+.It Fn filter "name"
+Filter
+.Xr tscrape 5
+data from stdin, write to stdout, its arguments are:
+.Bl -tag -width Ds
+.It Fa name
+Feed name.
+.El
.It Fn merge "name" "oldfile" "newfile"
Merge data of oldfile with newfile and writes it to stdout, its arguments are:
.Bl -tag -width Ds
@@ -60,14 +78,6 @@ Old file.
.It Fa newfile
New file.
.El
-.It Fn filter "name"
-Filter
-.Xr tscrape 5
-data from stdin, write to stdout, its arguments are:
-.Bl -tag -width Ds
-.It Fa name
-Feed name.
-.El
.It Fn order "name"
Sort
.Xr tscrape 5
@@ -92,6 +102,7 @@ feeds() {
}
.Ed
.Sh SEE ALSO
+.Xr curl 1 ,
.Xr sh 1 ,
.Xr tscrape_update 1
.Sh AUTHORS