hngopher.com

       [HN Gopher] The Pains of Path Parsing
       ___________________________________________________________________
        
       The Pains of Path Parsing
        
       Author : lukastyrychtr
       Score  : 22 points
       Date   : 2021-04-30 09:20 UTC (13 hours ago)
        
 (HTM) web link (www.fpcomplete.com)
 (TXT) w3m dump (www.fpcomplete.com)
        
       | mlex wrote:
       | Great article; in particular I hadn't thought about empty path
       | components too closely and how websites usually just omit them
       | when you try to go to e.g. https://github.com/nodejs//node
        
       | michael1999 wrote:
       | Once you accept that urls don't even specify the character
       | encoding, you realize it's impossible in general.
        
       | saurik wrote:
       | > Whether to include trailing slashes in URLs has been an old
       | argument on the internet. Personally, because I consider the
       | parsing-into-segments concept to be central to path parsing, I
       | prefer excluding the trailing slash. And in fact, Yesod's default
       | (and, at least for now, routetype-rs's default) is to treat such
       | a URL as non-canonical and redirect away from it. I felt even
       | more strongly about that when I realized lots of frameworks have
       | special handling for "final segments with filename extensions."
       | For example, /blog/bananas/ is good with a trailing slash, but
       | /images/bananas.png should not have a trailing slash.
       | 
       | So, this is not an argument in which people can really have an
       | opinion: the URLs have fundamentally different semantics and
       | behaviors with respect to relative paths. If you are talking
       | about "the subresource banana under the resource blog" then you
       | _must_ use  /blog/bananas, and the trailing slash is incorrect;
       | in such a case, if you were to have a relative link to "apples"
       | it would bring you to /blog/apples. In contrast, if you are
       | wanting some kind of "default" resource--say, the moral
       | equivalent of an "index.html" as is implemented in many web
       | servers (but has nothing to do with the actual information model
       | of the web)--as a subresource of the subresource bananas, then
       | you _must_ use  /blog/bananas/; in such a case, if you were to
       | have relative link to apples, it would bring you to
       | /blog/bananas/apples and to link to /blog/apples you'd have to
       | use ../apples.
       | 
       | FWIW, I absolutely agree that "for the narrow question of blog
       | posts, is a blog post a file or a folder?" to be an interesting
       | argument for which one might have a different opinion than
       | someone else for a reasonable reason, but generalizing it to the
       | concept of URL routing itself is wrong: if you believe a blog
       | post is semantically a folder--which is very reasonable, as a
       | blog post might "contain" a number of media attachments"--and the
       | post itself is part of that folder it would simply be wrong to
       | elide the trailing slash, and web frameworks or content
       | management systems that return the representation of a folder
       | from "inside" that folder without a trailing slash deserve a
       | special circle of www hell :/. My hope is that this author is
       | _actually_ just expressing an opinion on the semantics of a blog
       | post, not some general notion about URLs, but it is certainly
       | written as the latter and it seems like the software they work on
       | is general purpose.
       | 
       | As an example, I personally find the usage on GitHub to not just
       | be "incorrect" but "flagrantly ridiculous": it has decided to
       | make no opinion of whether a trailing slash has any semantic
       | meaning or not, and so relative paths essentially make no sense
       | in the context of their website. Is the landing page of my
       | repository "inside" the folder of my repository, or is my
       | repository itself a resource of sorts that happens to also
       | contains subresources? In the former case, the landing page of
       | other repositories in my organization are siblings of my
       | repository's landing page, and the other information about my
       | repository is a subresource of said landing page; while, in the
       | latter case, the landing page of other repositories in my
       | organization is the aunt/uncle of my repository's landing page,
       | and other information about my repository is a sibling of my
       | repository's landing page. Only one of these is supposed to be
       | true!
        
         | slver wrote:
         | Regarding blog post, resources and trailing slashes: we have
         | base tags.
        
       | gopalv wrote:
       | About 17 years ago, I had to solve the same problem, but since I
       | used regexes, I had two problems at the end of it - memory usage
       | and performance.                  /* RFC2396 : Appendix B
       | As described in Section 4.3, the generic URI syntax is not
       | sufficient                     to disambiguate the components of
       | some forms of URI.  Since the                     "greedy
       | algorithm" described in that section is identical to the
       | disambiguation method used by POSIX regular expressions, it is
       | natural and commonplace to use a regular expression for parsing
       | ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
       | 12            3  4          5       6  7        8 9
       | (Modified to support mailto: syntax as well)             */
       | 
       | But at least it wasn't the problem I started with.
        
         | SavantIdiot wrote:
         | That is a nightmare of a regex, IETF be damned. Splitting
         | regexess up into smaller chunks helps readability, support,
         | memory usage -and- performance. I suspect that example was
         | provided as a definition and not actually intended to be
         | implemented. Although there are better forms for representing
         | construction, like BNF.
        
         | the-dude wrote:
         | _Regular Expressions: Now You Have Two Problems_
         | 
         | https://blog.codinghorror.com/regular-expressions-now-you-ha...
        
       ___________________________________________________________________
       (page generated 2021-04-30 23:01 UTC)