danq.me

       SUBSCRIBING TO FORWARD USING FRESHRSS'S XPATH SCRAPING
       
       2023-03-28
       
       As I've mentioned before, I'm a fan of Tailsteak's Forward comic. I'm not a
       fan of the author's weird aversion to RSS, so I hacked a way around it first
       using an exploit in webcomic reader app Comic Chameleon (accidentally getting
       access to comics weeks in advance of their publication as a side-effect) and
       later by using my own tool RSSey.
       
       But now I'm able to use my favourite feed reader FreshRSS to scrape websites
       directly - like I've done for The Far Side - I should switch to using this
       approach to subscribe to Forward, too:
       
 (IMG) Screenshot showing RSS feed items: recent Forward episodes including their numbers, titles, and publication dates.
       
       Here's the settings I came up with -
       * Feed URL: http://forwardcomic.com/list.php
       * Type of feed source: HTML + XPath (Web scraping)
       * XPath for finding news items: //a[starts-with(@href,'archive.php')]
       * Item title: .
       * Item link (URL): ./@href
       * Item date: ./following-sibling::text()[1]
       * Custom date/time format: - Y.m.d
       
 (IMG) Annotated screenshot showing how each XPath directive maps to each part of the page. The item selector finds each hyperlink that begins with "archive.php" (notably missing the most-recent comic at any given time, which is found at index.php), and the date is found in the text node that immediately follows it, in a slightly-unusual variation on ISO8601.
       
       I continue to love this "killer feature" of FreshRSS, but I'm beginning to see
       how it could go further - I wish I had the free time to contribute to its
       development!
       
       I'd love to see a mechanism for exporting/importing feed configurations like
       this so that I could share them more-easily, for example. I'd also be
       delighted if I could expand on my XPath rules to load pages referenced by the
       results and get data from them, too, e.g. so I could use an image found by
       XPath on the "item link" page as the thumbnail image! These are things RSSey
       could do for me, but FreshRSS can't... yet!
       
       LINKS
       
 (HTM) My blog post promoting Forward as it reached episode #100
 (HTM) Tailsteak
 (HTM) Forward
 (HTM) Tailsteak posts in his official forums perhaps at the moment that he first fell out of love with RSS?
 (HTM) My blog post about hacking Comic Chameleon
 (HTM) My RSSey code to turn Forward Comic into an RSS feed
 (HTM) My blog post about using FreshRSS's XPath feature to subscribe to my friend Beverley's weblog
 (HTM) FreshRSS
 (DIR) My blog post about using FreshRSS's XPath scraping to subscribe to The Far Side