codemadness.org

       tscrape_update: sync improvements from sfeed_update - tscrape - twitter scraper
 (HTM) git clone git://git.codemadness.org/tscrape
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
 (DIR) commit 51995d6fc4760fadac68650bb82773b9bf9eae79
 (DIR) parent db47c97bea3370886d011a2c950ead2551cf3fbc
 (HTM) Author: Hiltjo Posthuma <hiltjo@codemadness.org>
       Date:   Fri,  2 Aug 2019 18:33:10 +0200
       
       tscrape_update: sync improvements from sfeed_update
       
       - change order of functions in script and documentation to match the execution
         order.
       - improve a comment about the parallel processing behaviour (performance stall).
       
       Diffstat:
         M README                              |       4 ++--
         M tscrape_update                      |      16 ++++++++--------
       
       2 files changed, 10 insertions(+), 10 deletions(-)
       ---
 (DIR) diff --git a/README b/README
       @@ -6,8 +6,8 @@ Twitter feed HTML scraper.
        It scrapes HTML from stdin and outputs it to a TAB-separated format that can be
        easier parsed with various (UNIX) tools. There are formatting programs included
        to convert this TAB-separated format to various other formats. There are also
       -some programs and scripts included to import and export OPML and to update,
       -sort, filter and merge feed items.
       +some programs and scripts included to import and export OPML and to fetch,
       +filter, merge and order items.
        
        
        Build and install
 (DIR) diff --git a/tscrape_update b/tscrape_update
       @@ -50,23 +50,23 @@ filter() {
                cat
        }
        
       -# order by timestamp (descending).
       -# order(name)
       -order() {
       -        sort -t '        ' -k1rn,1
       -}
       -
        # merge raw files: unique sort by id, retweetid.
        # merge(name, oldfile, newfile)
        merge() {
                sort -t '        ' -u -k5,5 -k8,8 "$2" "$3" 2>/dev/null
        }
        
       +# order by timestamp (descending).
       +# order(name)
       +order() {
       +        sort -t '        ' -k1rn,1
       +}
       +
        # fetch and parse feed.
        # feed(name, feedurl)
        feed() {
       -        # wait until ${maxjobs} are finished: throughput using this logic is
       -        # non-optimal, but it is simple and portable.
       +        # wait until ${maxjobs} are finished: will stall the queue if an item
       +        # is slow, but it is portable.
                [ ${signo} -ne 0 ] && return
                [ $((curjobs % maxjobs)) -eq 0 ] && wait
                [ ${signo} -ne 0 ] && return