Parse json instead of HTML to retrive title and URL of the last articles. - gophercgis - Collection of gopher CGI/DCGI for geomyidae
(HTM) hg clone https://bitbucket.org/iamleot/gophercgis
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
(DIR) changeset ea8197da47392d16091d275ab2e16cebd8d71a16
(DIR) parent 621411a3bc8c7ae7e520d594581d6a16f4ff6360
(HTM) Author: Leonardo Taccari <iamleot@gmail.com>
Date: Sun, 26 Aug 2018 14:08:34
Parse json instead of HTML to retrive title and URL of the last articles.
XXX: Only recent articles section is implemented at the moment, the URL does
XXX: not seem valid for other sections.
XXX: Pagination is still not implemented ATM but it should be easier to
XXX: implement now.
Diffstat:
internazionale/sections.dcgi | 46 ++++++++++++++-----------------------------
1 files changed, 15 insertions(+), 31 deletions(-)
---
diff -r 621411a3bc8c -r ea8197da4739 internazionale/sections.dcgi
--- a/internazionale/sections.dcgi Sun Aug 26 11:45:36 2018 +0200
+++ b/internazionale/sections.dcgi Sun Aug 26 14:08:34 2018 +0200
@@ -1,24 +1,17 @@
#!/bin/sh
-#
-# It seems that in order to enable pagination the following HTTP GET requests
-# are done:
-#
-# <https://data.internazionale.it/stream_data/items/ultimi-articoli/0/0/$(date +'%Y-%m-%d_%H-%M-%S').json>
-#
-# Instead of scraping the HTML page only for the last articles this can be
-# reused in order to get more data to build the DCGI and to enable
-# pagination.
-#
-
ARTICLE_CGI="/cgi/internazionale/article.cgi"
section="$2"
case "${section}" in
- ultimi-articoli | i-piu-letti | reportage | opinioni | savagelove )
- url="https://www.internazionale.it/${section}"
+ ultimi-articoli)
+ url="https://data.internazionale.it/stream_data/items/${section}/0/0/$(date +'%Y-%m-%d_%H-%M-%S').json"
+ ;;
+ i-piu-letti | reportage | opinioni | savagelove )
+ # TODO
+ exit 1
;;
*)
exit 1
@@ -29,24 +22,15 @@
echo "Internazionale"
echo ""
-/usr/pkg/bin/curl -sgL "${url}" |
-awk '
-/class="box-article-title"/ {
- if (!match($0, /href="[^"]*"/)) {
- next
- }
- url = substr($0, RSTART + 6, RLENGTH - 7)
- url = "https://www.internazionale.it" url
-
- title = $0
- sub(/^ *<a href="[^"]*" class="box-article-title">/, "", title)
- sub(/<\/a>.*$/, "", title)
-
- gsub("\\|", "\\|", url)
- gsub("\\|", "\\|", title)
-
- printf("[0|%s|'"${ARTICLE_CGI}?"'%s|server|port]\n", title, url)
-}
+/usr/bin/ftp -V -o - "${url}" |
+/usr/pkg/bin/jq -r '
+.items[] | (
+"[0|" +
+ "\(.title | gsub("\\|"; "\\|") )" + "|" +
+ "'"${ARTICLE_CGI}?"'" + "https://www.internazionale.it" +
+ "\(.url | gsub("\\|"; "\\|") )" + "|" +
+ "server|port]"
+)
'
echo ""