Add an extract_section() function in order to ignore lot of noise of The Downlo… - gophercgis - Collection of gopher CGI/DCGI for geomyidae
(HTM) hg clone https://bitbucket.org/iamleot/gophercgis
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
(DIR) changeset 339fa69fe9b4346b793f1807dcc76644ec76dadf
(DIR) parent 28798a698908c8067a7f2f352f5764078d1ac645
(HTM) Author: Leonardo Taccari <iamleot@gmail.com>
Date: Mon, 27 Aug 2018 22:12:17
Add an extract_section() function in order to ignore lot of noise of The Download articles.
Diffstat:
technologyreview/article.cgi | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)
---
diff -r 28798a698908 -r 339fa69fe9b4 technologyreview/article.cgi
--- a/technologyreview/article.cgi Mon Aug 27 02:30:05 2018 +0200
+++ b/technologyreview/article.cgi Mon Aug 27 22:12:17 2018 +0200
@@ -62,7 +62,25 @@
url=$2
+
+case "${url}" in
+*/the-download/*)
+ extract_section()
+ {
+ awk '
+ /class="download__text"/,/class="download__source-wrapper"/ {
+ print
+ }
+ '
+ }
+ ;;
+*)
+ extract_section() { cat; }
+ ;;
+esac
+
/usr/pkg/bin/curl -sL "${url}" |
{ /usr/pkg/bin/xmllint --html --format --xpath '//main' - 2>/dev/null ; } |
+ extract_section |
filter_html |
html_to_text