tscrape_update: sync improvements from sfeed_update - tscrape - twitter scraper
(HTM) git clone git://git.codemadness.org/tscrape
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
(DIR) commit 51995d6fc4760fadac68650bb82773b9bf9eae79
(DIR) parent db47c97bea3370886d011a2c950ead2551cf3fbc
(HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
Date: Fri, 2 Aug 2019 18:33:10 +0200
tscrape_update: sync improvements from sfeed_update
- change order of functions in script and documentation to match the execution
order.
- improve a comment about the parallel processing behaviour (performance stall).
Diffstat:
M README | 4 ++--
M tscrape_update | 16 ++++++++--------
2 files changed, 10 insertions(+), 10 deletions(-)
---
(DIR) diff --git a/README b/README
@@ -6,8 +6,8 @@ Twitter feed HTML scraper.
It scrapes HTML from stdin and outputs it to a TAB-separated format that can be
easier parsed with various (UNIX) tools. There are formatting programs included
to convert this TAB-separated format to various other formats. There are also
-some programs and scripts included to import and export OPML and to update,
-sort, filter and merge feed items.
+some programs and scripts included to import and export OPML and to fetch,
+filter, merge and order items.
Build and install
(DIR) diff --git a/tscrape_update b/tscrape_update
@@ -50,23 +50,23 @@ filter() {
cat
}
-# order by timestamp (descending).
-# order(name)
-order() {
- sort -t ' ' -k1rn,1
-}
-
# merge raw files: unique sort by id, retweetid.
# merge(name, oldfile, newfile)
merge() {
sort -t ' ' -u -k5,5 -k8,8 "$2" "$3" 2>/dev/null
}
+# order by timestamp (descending).
+# order(name)
+order() {
+ sort -t ' ' -k1rn,1
+}
+
# fetch and parse feed.
# feed(name, feedurl)
feed() {
- # wait until ${maxjobs} are finished: throughput using this logic is
- # non-optimal, but it is simple and portable.
+ # wait until ${maxjobs} are finished: will stall the queue if an item
+ # is slow, but it is portable.
[ ${signo} -ne 0 ] && return
[ $((curjobs % maxjobs)) -eq 0 ] && wait
[ ${signo} -ne 0 ] && return