SUBSCRIBING TO FORWARD USING FRESHRSS'S XPATH SCRAPING
2023-03-28
As I've mentioned before, I'm a fan of Tailsteak's Forward comic. I'm not a
fan of the author's weird aversion to RSS, so I hacked a way around it first
using an exploit in webcomic reader app Comic Chameleon (accidentally getting
access to comics weeks in advance of their publication as a side-effect) and
later by using my own tool RSSey.
But now I'm able to use my favourite feed reader FreshRSS to scrape websites
directly - like I've done for The Far Side - I should switch to using this
approach to subscribe to Forward, too:
(IMG) Screenshot showing RSS feed items: recent Forward episodes including their numbers, titles, and publication dates.
Here's the settings I came up with -
* Feed URL: http://forwardcomic.com/list.php
* Type of feed source: HTML + XPath (Web scraping)
* XPath for finding news items: //a[starts-with(@href,'archive.php')]
* Item title: .
* Item link (URL): ./@href
* Item date: ./following-sibling::text()[1]
* Custom date/time format: - Y.m.d
(IMG) Annotated screenshot showing how each XPath directive maps to each part of the page. The item selector finds each hyperlink that begins with "archive.php" (notably missing the most-recent comic at any given time, which is found at index.php), and the date is found in the text node that immediately follows it, in a slightly-unusual variation on ISO8601.
I continue to love this "killer feature" of FreshRSS, but I'm beginning to see
how it could go further - I wish I had the free time to contribute to its
development!
I'd love to see a mechanism for exporting/importing feed configurations like
this so that I could share them more-easily, for example. I'd also be
delighted if I could expand on my XPath rules to load pages referenced by the
results and get data from them, too, e.g. so I could use an image found by
XPath on the "item link" page as the thumbnail image! These are things RSSey
could do for me, but FreshRSS can't... yet!
LINKS
(HTM) My blog post promoting Forward as it reached episode #100
(HTM) Tailsteak
(HTM) Forward
(HTM) Tailsteak posts in his official forums perhaps at the moment that he first fell out of love with RSS?
(HTM) My blog post about hacking Comic Chameleon
(HTM) My RSSey code to turn Forward Comic into an RSS feed
(HTM) My blog post about using FreshRSS's XPath feature to subscribe to my friend Beverley's weblog
(HTM) FreshRSS
(DIR) My blog post about using FreshRSS's XPath scraping to subscribe to The Far Side