Follow RFC 3986 regarding scheme more closely in extract_urls(). - plumb - Open certain URL patterns with an ad-hoc opener (plumber)
(HTM) hg clone https://bitbucket.org/iamleot/plumb
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
---
(DIR) changeset 6014611a02b449c4d2f5f0e285d26fd2ba525b66
(DIR) parent 5a0b90c4bd7d2f0f60197a9c7bf06781122d1137
(HTM) Author: Leonardo Taccari <iamleot@gmail.com>
Date: Wed, 28 Mar 2018 17:03:24
Follow RFC 3986 regarding scheme more closely in extract_urls().
According RFC 3986 the scheme part of an URI can be:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
Adjust URI related REs accordingly.
Diffstat:
dplumb | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
---
diff -r 5a0b90c4bd7d -r 6014611a02b4 dplumb
--- a/dplumb Tue Mar 27 22:14:23 2018 +0200
+++ b/dplumb Wed Mar 28 17:03:24 2018 +0200
@@ -76,13 +76,13 @@
'
/:\/\// {
# Extract URLs inside possible delimiters
- if (match($0, /\<[[:alnum:]]+:\/\/[^>]+\>/) ||
- match($0, /\([[:alnum:]]+:\/\/[^)]+\)/) ||
- match($0, /\[[[:alnum:]]+:\/\/[^]]+\]/) ||
- match($0, /"[[:alnum:]]+:\/\/[^]]+"/) ||
- match($0, /'"'"'[[:alnum:]]+:\/\/[^]]+'"'"'/)) {
+ if (match($0, /\<[[:alpha:]][[:alnum:]+.-]*:\/\/[^>]+\>/) ||
+ match($0, /\([[:alpha:]][[:alnum:]+.-]*:\/\/[^)]+\)/) ||
+ match($0, /\[[[:alpha:]][[:alnum:]+.-]*:\/\/[^]]+\]/) ||
+ match($0, /"[[:alpha:]][[:alnum:]+.-]*:\/\/[^]]+"/) ||
+ match($0, /'"'"'[[:alpha:]][[:alnum:]+.-]*:\/\/[^]]+'"'"'/)) {
print substr($0, RSTART + 1, RLENGTH - 2)
- } else if (match($0, /[[:alnum:]]+:\/\/.+/)) {
+ } else if (match($0, /[[:alpha:]][[:alnum:]+.-]*:\/\/.+/)) {
print substr($0, RSTART, RLENGTH)
}
}