[HN Gopher] Web servers should refuse requests for random, unnec...
       ___________________________________________________________________
        
       Web servers should refuse requests for random, unnecessary URLs
        
       Author : ingve
       Score  : 67 points
       Date   : 2023-07-06 08:52 UTC (14 hours ago)
        
 (HTM) web link (utcc.utoronto.ca)
 (TXT) w3m dump (utcc.utoronto.ca)
        
       | paulnpace wrote:
       | Depending on various factors, I tend to prefer any request that
       | doesn't get a file or a redirect gets dropped (and without being
       | logged). In reviewing my logs, more than 99.997% of 404 responses
       | are garbage noise requests for /ty.php, or //xmlprc.php, or //wp-
       | includes/wlwmanifest.php, and similar other bots scanning for
       | vulnerabilities that can't exist on the server.
        
       | jakub_g wrote:
       | A followup blog also worth checking:
       | 
       | https://utcc.utoronto.ca/~cks/space/blog/web/URLPresenceNotG...
       | 
       | > _URLPresenceNotGoodSignal_
       | 
       | > There are a variety of situations where you (in the sense of
       | programs and systems) want to know if a web server supports
       | something or is under someone's control. One traditional way is
       | to require the publication of specific URLs on the web server,
       | often URLs with partially random names. The simplest way to
       | implement this is to simply require the URL to exist and be
       | accessible, which is to say that fetching it returns a HTTP 200
       | response. However, in light of web server implementations which
       | will return HTTP 200 responses for any URL, or at least many of
       | them, this simple check is clearly not sufficient in practice.
       | _The mere 'presence' of a URL on a web server proves very
       | little._
       | 
       | > _If you need to implement this sort of protocol, you need to
       | require the URL to contain some specific contents._
        
       | Sarkie wrote:
       | Solving the wrong problem
        
       | gumby wrote:
       | I feel like a 203 would work for this case.
        
       | pacificpendant wrote:
       | I had regular emails from a security testing tool telling me that
       | internal IP addresses were being exposed on a webpage, in reality
       | the page was a forum post where someone had pasted some console
       | output including an IP address they were working with. In the end
       | I blocked the emails from the tool because I wasn't allowed to
       | mark things as false positives.
       | 
       | If a tool wants to remain relevant it should try to minimise
       | false positives, in some cases this might mean removing rules
       | that are going to throw false positives significantly more often
       | than true positives. Tools should also be run such that anyone
       | that receive alerts should be able to flag false positives with
       | minimal effort.
       | 
       | The response to this false positive could be to fix Prometheus,
       | but if you end up having to fix lots of things it's more of a
       | sign of a bad rule that is making you concentrate on things with
       | a low value to the goal of improving security.
        
         | tetha wrote:
         | Oh, you remind me of that day when our IDS went bonkers.
         | Something was hammering us with SQL injections, it said. Like,
         | 1-2 SQL injections per minute. And it gave successful HTTP
         | responses, and actual JSON responses. The sky must be falling!
         | We must be doomed!
         | 
         | After a brief amount of panic, we figured out that we had a new
         | customer for our knowledge base. This was an MSP and they were
         | busy uploading their MSSQL and PostgreSQL runbooks into our
         | knowledge base. Entirely beautiful documentation I have to say,
         | clear steps, great instructions, smart queries to check, act
         | and validate. We eventually had a good call about Postgres and
         | such with those guys. But our IDS hated it.
        
           | technion wrote:
           | I keep referring to the situation where a supplier sold the
           | Cisco select range. If you clicked the page on their site,
           | select showed up in the url and their way blocked your
           | connection.
        
         | bombcar wrote:
         | 192.168.0.1
         | 
         | 10.10.10.10
         | 
         | 172.16.31.5
         | 
         | I've exposed internal IPs!
        
           | Joker_vD wrote:
           | Oh no! You should contact your localhost's administrator ASAP
           | and tell him to change those!
        
             | bbarnett wrote:
             | You're joking, but jokes on him! I can _ping_ some of those
             | IPs, right now!
             | 
             | A good example of how security through obscurity helps,
             | they'd better get that fixed!
        
               | bombcar wrote:
               | Ah, fond memories of getting people to DOS themselves out
               | of IRC by reporting that my IP was 127.63.78.41 or
               | similar.
        
       | tester756 wrote:
       | in this thread:
       | 
       | way too many people being pissed off due to software's behaviour
       | which messes with bots
        
         | 3np wrote:
         | The web runs on bots. Your point being?
        
           | tester756 wrote:
           | Not on those trying to access /admin.php and similar.
        
       | ranting-moth wrote:
       | If Prometheus says it's by design to accept everything incoming,
       | I'd suggest they rename it from Prometheus to The Whore of
       | Babylon.
        
       | [deleted]
        
       | jonas-w wrote:
       | Maybe the scanner should also look at the "resource-that-should-
       | not-exist-whose-status-code-should-not-be-200 well-known URI"
       | 
       | https://w3c.github.io/webappsec-change-password-url/response...
        
         | tremon wrote:
         | Wow. If I were a webserver, I'd return 204 for that URL, just
         | for shits and giggles.
        
           | wakamoleguy wrote:
           | Despite the URL using "200" and not "2xx", any status between
           | 200-299 inclusive would flag the server as having unreliable
           | status codes. https://w3c.github.io/webappsec-change-
           | password-url/response...
        
       | bostik wrote:
       | > _Security scanners and other tools could adopt various
       | heuristics to detect this sort of situation and reduce false
       | positives [...]_
       | 
       | Wrong. Flat out wrong.
       | 
       | Security scanners should adopt LESS heuristics and focus more on
       | _VALIDATING_ their perceived findings. Getting a 200 response for
       | a missing file is a web server misconfiguration, but unless the
       | returned data actually matches, it 's a false positive.
       | 
       | I may be repeating myself but any scanner (or researcher) who
       | blindly raises a finding without validating it has as much to do
       | with security as ripping wings off a fly has to do with
       | bioengineering.
        
         | dqv wrote:
         | That lack of validation is frustrating in the case of an
         | automated scanner you can't control.
         | 
         | "Your nginx version is exposed! Add `server_tokens off` to your
         | config"
         | 
         | Oh yeah, what version am I running then?
        
         | tantalic wrote:
         | This is problem of misaligned incentives: if you are making a
         | security scanner the last thing you want to do is miss a
         | vulnerability. The result is many false positives.
        
         | adamckay wrote:
         | > Security scanners should adopt LESS heuristics and focus more
         | on VALIDATING their perceived findings
         | 
         | Aren't you saying the same thing?
         | 
         | In the context of the blog post I understood it to mean that an
         | automated scanner, on retrieving content on a /.bash_history
         | request, should look at the content and attempt to determine if
         | it's full of bash commands and a correct finding (e.g.
         | plaintext, full of new lines, lines start with common commands
         | such as cd/ls, etc, ...), similar to how AV software makes
         | (very) educated guesses on whether software is malicious or
         | not. [1]
         | 
         | Interested to hear your distinction between the definitions of
         | heuristics and validations.
         | 
         | 1 -
         | https://en.wikipedia.org/wiki/Heuristic_(computer_science)#A...
        
           | naikrovek wrote:
           | heuristics are imperfect; validation checks to make sure that
           | you've found what you're looking for.
           | 
           | the rub there, though, is that validation is extremely rigid
           | and heuristics aren't. an attacker could (ideally) change a
           | single bit in their signature to falsely pass validation but
           | a heuristic will probably still catch it.
        
       | CipherThrowaway wrote:
       | I would extend this to unused query parameters and trailing
       | slashes or lack thereof.
        
         | lifeonlars wrote:
         | This breaks the principle of being liberal in what you accept.
        
           | tzs wrote:
           | In retrospect I think we'd have been better off without that
           | principle.
           | 
           | Too often what happens is that the receiver is liberal, which
           | results in senders not having to bother to fix violations of
           | whatever spec the parties purport to be following.
           | 
           | As the sending code continues to be developed occasionally
           | new deviations from the actual spec and from whatever it is
           | that current liberal receivers accept creep in, and the
           | receivers get even more liberal to deal with them.
           | 
           | A few years of this and writing a receiver for what should
           | have been a fairly simple and easy to implement format
           | requires huge parsers that handle a bazillion weird cases.
        
             | lifeonlars wrote:
             | Perhaps your hindsight is selective. After HTTP/HTML and
             | related technologies such as JavaScript have become hugely
             | successful following this principle, it's easy to look back
             | and say "the technology stack which beat all others and
             | became ubiquitous is now hard to develop against because
             | it's too permissive".
        
           | Omin wrote:
           | I think op goes too far by including trailing slashes, but
           | being liberal in what you accept (Postel's law) is a bad idea
           | as the last few decades of the web have shown. Once you
           | accept things, you are locked into supporting them forever
           | lest you break compatibility. When there are multiple
           | different implementations of a standard, they grow
           | incompatible over time.
        
             | lifeonlars wrote:
             | Well, the web has been very successful in the last few
             | decades. Arguably at least part of this success has been
             | due to Postel's law since, for example, browsers
             | interpreting HTML on a best effort basis has allowed for
             | both diversity and innovation. Comparing the rapid adoption
             | of the web to the failures of numerous closed and strongly
             | specified protocols provides some empirical basis for this.
        
             | cellularmitosis wrote:
             | Accepting and ignoring doesn't lock you into anything.
        
           | CipherThrowaway wrote:
           | That's not a principle. That's just a random thing that
           | people thought back in the 80s and 90s because of some clever
           | little comment in a spec, that sounded correct at the time
           | but proved absolutely ruinous in the decades since.
           | 
           | Systems that are liberal in what they accept are
           | paradoxically harder to develop against, undermine standards
           | and encourage incorrect, fragile implementations. Query
           | parameters are a classic. In practice, unrecognized query
           | parameters almost always represent a bug in the client.
           | Better to find out immediately with a 400.
        
       | bradley13 wrote:
       | The requested URL did not exist, so the server _must_ return some
       | response other than 200. It could be not-found, it could be
       | forwarding, or it could be something else.
       | 
       | However, 200 (success) is simply wrong.
        
         | andrewaylett wrote:
         | And the penalty for being "wrong" is..?
         | 
         | You can't write code relying on random third-party code doing
         | what it's "mandated" to do by the standard, because malicious
         | code exists and so do malicious services. So as a client, if
         | you're connecting to services you don't control, you need to be
         | able to handle misbehaving third-party code.
         | 
         | And what it means for a request to be "successful" is entirely
         | at the discretion of the author of the software. I think it's
         | fairly clear that the software looked for a response, found
         | one, and successfully returned it to the client.
         | 
         | Which isn't to say that the code is optimal, necessarily, but
         | unless you have a contract to say otherwise then I don't think
         | you've any basis to claim that it _must_ do _anything_ in
         | particular.
        
           | shkkmo wrote:
           | > I don't think you've any basis to claim that it must do
           | anything in particular.
           | 
           | The relevant part of the spec says that returning a 4XX code
           | is a SHOULD [0] not a MUST. That means:
           | 
           | > This word, or the adjective "RECOMMENDED", mean that there
           | may exist valid reasons in particular circumstances to ignore
           | a particular item, but the full implications must be
           | understood and carefully weighed before choosing a different
           | course [1]
           | 
           | [0] https://www.rfc-editor.org/rfc/rfc9110.html#name-client-
           | erro...
           | 
           | [1] https://www.rfc-editor.org/rfc/rfc2119
           | 
           | So yes, there is no requirement to return a 4XX message, but
           | you should think carefully and understand the consequences of
           | not doing so.
           | 
           | One consequence is creating false positives in most
           | vulnerability scanners.
           | 
           | Another consequence is potentially messing up various
           | crawlers, including those that help search engines index your
           | content.
        
             | andrewaylett wrote:
             | In order to conform to the spec, yes. But almost by
             | definition, people who don't care to conform to the spec
             | aren't going to worry overmuch about whether what they're
             | doing is conformant with the spec.
             | 
             | As a client, you've _got_ to be ready to deal with non-
             | conformant implementations. As a service owner, there is no
             | authority who will hunt you down for deploying a service
             | that doesn 't conform to the spec. Let me restate my claim
             | slightly: acknowledging that impolite to claim to conform
             | to a spec and then not do so, no-one has a right to claim
             | that any arbitrary person _must_ write code that conforms
             | to any arbitrary RFC.
             | 
             | > ["MUST",] "REQUIRED" or "SHALL", mean that the definition
             | is an absolute requirement _of the specification_. (my
             | emphasis)
             | 
             | I may choose to return whatever response code I want in
             | whatever circumstances I feel like, and there's absolutely
             | nothing you can do to stop me. I probably won't, because
             | that would be silly. But there's no _must_.
        
               | shkkmo wrote:
               | > As a client, you've got to be ready to deal with non-
               | conformant implementations
               | 
               | The whole point of the "MUST" vs "SHOULD" is that you can
               | assume that you don't have to handle implementations that
               | aren't compliant (e.g. don't do the things they _must_ do
               | to be in compliance) but do need to worry about handling
               | edge-cases where implementations are in compliance but
               | not doing things the way they _should_.
               | 
               | Both tools (the vulnerability scanner and Prometheus)
               | should make changes but I would consider the Prometheus
               | change to be a bug report (which they can clearly choose
               | to not fix and still be compliant) and the vulnerability
               | scanner would be a feature request to check file contents
               | rather than just relying on the status.
        
         | pravus wrote:
         | How did the URL not exist? An empty resource is still a valid
         | resource and 200 is an appropriate response. You can argue that
         | Prometheus has a broken data model, but the request and
         | response are actually valid in my book.
        
           | [deleted]
        
         | wang_li wrote:
         | >However, 200 (success) is simply wrong.
         | 
         | What makes it wrong? HTTP/HTTPS URLs/URIs don't mean files on
         | the disk. You can return 200 as a default if you like.
        
           | shkkmo wrote:
           | Returning a 200 response for a resource that doesn't exist is
           | wrong.
           | 
           | The correct response code would be a 3XX or 4XX response
           | code, depending on the client behavior that you want.
           | 
           | In this specific case, a 303 response that redirects to the
           | homepage would be most correct.
           | 
           | Alternatively, you could return a 404 response alongside the
           | default content.
        
             | toast0 wrote:
             | > The correct response code would be a 3XX or 4XX response
             | code, depending on the client behavior that you want.
             | 
             | If the client behavior you want is to display your styled
             | error page, that has historically meant returning a 200
             | status code, because some user-agents prefer their own
             | error displays.
        
             | phlakaton wrote:
             | And how do you know the resource doesn't exist in this
             | case? The server says it does.
             | 
             | The point is, "resource" is an abstraction that servers may
             | implement as they wish. The only requirements, per HTTP, is
             | that (1) it's something that can be identified with a URI
             | (satisfied in this case), and (2) it has information
             | associated with it that can be retrieved and/or managed via
             | the HTTP protocol (satisfied in this case).
             | 
             | We might imagine that the set of resources should be
             | finite, or strictly mapped to extant data records in our
             | system - but these are not requirements.
        
               | cellularmitosis wrote:
               | We can argue about the semantics of "wrong" all day, but
               | the lesson here is that not following standards and
               | conventions creates extra work for everyone downstream.
        
               | shkkmo wrote:
               | > We might imagine that the set of resources should be
               | finite, or strictly mapped to extant data records in our
               | system - but these are not requirements.
               | 
               | Clearly they aren't requirements as I can think of many
               | valid use cases that violate them.
               | 
               | To me, creating an infinite number of identical aliases
               | for a single resource is clearly wrong. The only reason
               | to do it is either incompetence or laziness.
        
             | pravus wrote:
             | URLs refer to resources, not files.
        
             | eddythompson80 wrote:
             | > _depending on the client behavior that you want._
             | 
             | What if you want to confuse the client?
        
             | tremon wrote:
             | _Returning a 200 response for a resource that doesn 't
             | exist is wrong_
             | 
             | Isn't that just begging the question? What defines whether
             | a resource exists? Is it the client's knowledge about the
             | webserver backend configuration, or the server's? If a
             | server returns a 200 response, the resource exists, by
             | definition. It may or may not have data, but again, that's
             | not up to the client to decide. If the server says the
             | resource exists and has no data, then that's the
             | authoritative answer.
        
           | dekhn wrote:
           | I had to look this up recently, but Tim Berners Lee
           | specifically said that URIs should be treated more or less
           | identically to files on disk. That would mean that accessing
           | a web server that is hosting a static directory should return
           | not found for missing files. But this was just a guideline
           | and it was never followed in practice.
        
           | mock-possum wrote:
           | RFC 9110 - https://httpwg.org/specs/rfc9110.html#overview.of.
           | status.cod...
           | 
           | It's like saying "what makes it wrong to signal left but turn
           | right?"
           | 
           | It makes the component you're operating behave less
           | predictably, resulting in a less stable system overall.
           | Follow the spec.
        
       | jfoutz wrote:
       | OK
        
       | fastest963 wrote:
       | This is a consequence of the default ServeMux in Go. If you
       | register "/" then effectively all URLs respond with that handler
       | and in the case the handler is an index page.
        
       | gmuslera wrote:
       | For Prometheus may be, or not. But taking that as as a generic
       | rule for web servers and whatever they serve will break things
       | too. Not everything is a file on disk or a full endpoint. That it
       | may not the best for your particular use case doesn't mean that
       | there are other use cases for which that policy would be wrong.
        
         | 3np wrote:
         | Policies and generic rules can still have valid exceptions.
         | Just keep the red tape under control.
        
       | vkaku wrote:
       | What about /favicon.ico - or those pre-flight requests?
       | 
       | The only real way to refuse requests is to just close the
       | connection, no 404s or whatever should be sent.
       | 
       | The deal with the way most people have designed the web from a
       | client point of view, a lot of it works by convention - that's
       | where this stuff starts to break.
       | 
       | cgi-bin was a fairly well accepted convention a couple decades
       | ago - and today those URIs are considered malicious.
       | 
       | Unless people start depreciating conventions sooner, this problem
       | will have false positives of its own.
        
         | nubinetwork wrote:
         | In my tests, favicon gets loaded after the page, and 404ing it
         | seems to go ignored by browsers.
        
       | nubinetwork wrote:
       | > What was happening instead is that the Prometheus host agent's
       | HTTP server code will give you a HTTP 200 answer (with a generic
       | front page) for any URL except the special URL for its metrics
       | endpoint.
       | 
       | > Neither party is exactly wrong here, but the result is not
       | ideal.
       | 
       | No, Prometheus has a bug if this is the case, and cks says it
       | himself further on...
       | 
       | > all it would need to do is only give a HTTP 200 response for
       | '/' and then a 404 (with the same HTML) for everything else that
       | it answers with the generic front page
        
         | matsemann wrote:
         | What might happen is that it's a single page app. So whatever
         | URL you access the same frontend is served. And then the SPA
         | shows the correct content based on the path in the URL. Then
         | it's hard for the backend to know the correct status code to
         | send.
         | 
         | Of course you might still consider that a bug or wrong, but
         | it's reality for a lot of web pages.
        
           | nubinetwork wrote:
           | I would argue that is still wrong... if the page doesn't
           | exist, you either 404, or 301 to a page that does exist.
        
             | Timon3 wrote:
             | That would require you to duplicate all routing between
             | backend and frontend. As long as it's not an API I don't
             | see the need.
        
               | wruza wrote:
               | Wouldn't it be better to auto-extract a list of frontend
               | routes for backend? Our stacks lack the obvious, but
               | instead of fixing them we are searching for good angles.
        
               | Timon3 wrote:
               | It depends on your use case. My use case doesn't warrant
               | it.
        
               | shkkmo wrote:
               | > As long as it's not an API I don't see the need.
               | 
               | And I don't see the need to use a SPA almost anywhere. If
               | you are gonna use a SPA for stuff that really doesn't
               | require it, I think it behooves you to still atleast try
               | to conform the the HTTP spec
        
               | wruza wrote:
               | I see the need for SPA on every site that has internal
               | links and dynamic state of any kind. Seeing a flash is
               | annoying and so is tapping back button only to see the
               | cached version that is not true anymore and forcing a
               | reload. Interop between web pages (and tabs) is
               | practically non-existent. Even right now on HN I'm about
               | to post a comment, and the whole "go back" stack will
               | become irrelevant.
               | 
               | SPA is ironically not needed mostly on single-page sites.
               | When I see comments like this it sincerely puzzles me
               | what your common user stories are and/or how you managed
               | to train a blind eye to all this.
        
               | Timon3 wrote:
               | > And I don't see the need to use a SPA almost anywhere.
               | 
               | Did I advocate for that in any way?
               | 
               | > If you are gonna use a SPA for stuff that really
               | doesn't require it, I think it behooves you to still
               | atleast try to conform the the HTTP spec
               | 
               | Why? Give me a good reason why it's worth the effort for
               | small tools I write.
        
               | shkkmo wrote:
               | > Why? Give me a good reason why it's worth the effort
               | for small tools I write
               | 
               | Why are you using SPAs with internal routing for small
               | tools?
               | 
               | An SPA only needs to use urls to represent states that
               | should be externally reachable. If you want those states
               | to be externally reachable, then you probably want those
               | states to be indexable by search engines. You also want
               | users to know if a url is mistyped/miscopied and a 404
               | response page does a good job of that.
               | 
               | In the end, given the spec, it is on you to carefully
               | consider amd have a good reason to go against the
               | recommendations.
        
               | Timon3 wrote:
               | > Why are you using SPAs with internal routing for small
               | tools?
               | 
               | Because I don't use a server if I can help it. Client-
               | side routing is wonderfully easy to use and provides a
               | great user experience. What good does it for me to pre-
               | render the sites or host it on a server where I can route
               | server-side if client-side routing works perfectly for
               | what I write?
               | 
               | > An SPA only needs to use urls to represent states that
               | should be externally reachable.
               | 
               | "Needs" does a lot here. My tools integrate with browser
               | history for navigation, so I need URLs for that as well.
               | But I often make state available in the query, so it's
               | externally reachable.
               | 
               | > If you want those states to be externally reachable,
               | then you probably want those states to be indexable by
               | search engines.
               | 
               | No.
               | 
               | > You also want users to know if a url is
               | mistyped/miscopied and a 404 response page does a good
               | job of that.
               | 
               | I do, and client-side routing gives users a wonderful 404
               | page. No need for serving a status 404.
               | 
               | > In the end, given the spec, it is on you to carefully
               | consider amd have a good reason to go against the
               | recommendations.
               | 
               | I've given good reasons. You haven't so far.
        
         | lifeonlars wrote:
         | ..or the vuln detection could be less stupid and try to decide
         | whether the content it has retrieved is a bash history or not?
        
           | nubinetwork wrote:
           | I would agree in principle, but that would require an
           | extensive parser for files it thinks it finds... it's easier
           | to deal with this on the server side than the client side.
        
             | lifeonlars wrote:
             | No it's not. Dealing with it on the server side means that
             | all tools and servers in the world have to go along with
             | this decision. Posting on a blog saying that people
             | 'SHOULD' do something and actually making it so that this
             | is a reasonable expectation are two very different things.
             | 
             | A vulnerability scanner having to implement hairy
             | heuristics to decide what's a vuln and what's a common
             | false positive is literally its whole job.
        
               | jjav wrote:
               | > Posting on a blog saying that people 'SHOULD' do
               | something
               | 
               | To be fair, it's not about some rando on a blog post. The
               | HTTP response codes and their semantics are in the RFCs.
               | And while it's true the RFCs say SHOULD and not MUST,
               | there are also now ~30 decades of experience and
               | expectation that a non-existent resource is more likely a
               | 404 not 200.
               | 
               | Sure a server can always do what it likes but can't
               | expect it to play well with the outside world if it goes
               | against both convention and documentation.
        
               | JoBrad wrote:
               | There is a standard for HTTP status codes that has
               | dictated what web servers "should" do for quite a while,
               | now. Most web servers respond properly with a 404 "out of
               | the box", when asked to serve up content that doesn't
               | exist.
        
               | semi wrote:
               | I think that's still somewhat beside the point though. in
               | the case of Prometheus yes it should return a 404. but
               | what if it was nginx routing all paths to some app? or
               | even just some actual file being served on that path? in
               | either case the vuln scanner says you have an exposed
               | home directory, and that's a false positive.
        
               | Dah00n wrote:
               | But should != Must.
        
               | lifeonlars wrote:
               | Almost any web server _can_ be configured to provide a
               | generic response to a specific request, for example by
               | ignoring some or all of the url path, and in practice I
               | would bet that a majority of actual instances do this for
               | at least some sets of requests. (To confirm my theory in
               | 20 seconds I checked if https://news.ycombinator.com/user
               | ?id=nonexistent_user_1620 returns a 404 or a 200 - it's
               | the latter.)
               | 
               | It's silly to pretend that the use of a 404 in this type
               | of circumstance is either clearcut in the standards or
               | ubiquitous in practice.
        
               | shkkmo wrote:
               | > It's silly to pretend that the use of a 404 in this
               | type of circumstance is either clearcut in the standards
               | or ubiquitous in practice.
               | 
               | The standards seem pretty clear to me.
               | 
               | I would point out that technically, the path portion of
               | the HN URI does indeed point to a valid endpoint, it is
               | the query portion of the URI (usually not used by the
               | server to do any routing) that points to a non-existent
               | resource.
               | 
               | Still, HN is wrong here and should be returning a 404
               | status.
        
       | colonelpopcorn wrote:
       | We could extend this the other way to, security research tools
       | shouldn't assert vulnerability based solely on HTTP status code.
       | I think most SPAs require a setup like the one mentioned in the
       | article.
        
         | pixl97 wrote:
         | "Security tools shouldn't"
         | 
         | The bain of my existence here working with customers. See they
         | like doing dumb things like having unified Nessus policies that
         | alert if you have hyper threading on, so they disable HT on all
         | their servers, including ones that don't run untrusted code.
         | Then at the same time they complain that their expenses are
         | nearly 50% higher than expected in execution costs on my highly
         | multithreaded app.
         | 
         | Reasonable policy and security don't really work well because
         | there's not enough people trained in making this work properly
         | across workloads in the enterprise .
        
       | gonzo41 wrote:
       | What's the ethics / law on handing back zip bombs as png's in gz
       | files when you get scanned from rando's on the internet? Asking
       | for a friend
        
         | JoBrad wrote:
         | I don't think there's a rule that you have to respond with zip
         | bombs, but it's plainly impolite not to.
         | 
         | This whole comment section has me wondering if an internet-
         | connected tea pot should respond to all requests with a 418
         | status code.
        
         | pravus wrote:
         | They asked for the resource. You are free to give it to them.
        
         | rwmj wrote:
         | The legal and ethical risk is approximately zero.
         | 
         | I think the most likely problem is that by trying to be clever
         | with your web server configuration you'll accidentally
         | introduce an insecurity at your end.
         | 
         | On my personal webserver, the vast majority of "rando" accesses
         | are from search engine spiders that I've never heard of (as
         | well as the ones you have heard of). These spiders really ought
         | to be secure against anything a server can throw at them, and
         | if they aren't, that's their problem.
        
       | 28304283409234 wrote:
       | "an HTTP server", not "a HTTP server". On a university website,
       | no less.
        
         | tenebrisalietum wrote:
         | If "http" is not pronounced as its letters but as "Hypertext
         | Transport Protocol Server", "a HTTP server" is fine.
        
           | brickteacup wrote:
           | yes but nobody pronounces it like that...
        
         | stirfish wrote:
         | Both are correct
        
           | TacticalCoder wrote:
           | Oh!? Even though GP is being a bit pedantic I thought he was
           | correct. Why is "a HTTP server" cromulent here? Do acronyms
           | enjoy special rules? (FWIW english is not my native language)
        
             | nubinetwork wrote:
             | H is a consonant. "An" is used in this case if the next
             | word starts with a vowel.
        
               | mnw21cam wrote:
               | It's not as simple as that. In general English usage,
               | when we have an acronym, we tend to look at whether it
               | _sounds_ like it starts with a vowel. So for instance the
               | acronym RTFM is pronounced  "arr tee eff em", which
               | starts with a vowel.
               | 
               | Which brings us back to this one - both "a" and "an" are
               | valid, because it depends whether you're pronouncing your
               | "h" as an Australian "Haich" or a British/American
               | "Aich".
        
               | pm215 wrote:
               | It's used if the next word starts with a vowel _sound_ ,
               | so it depends on how you pronounce 'H' as a standalone
               | letter. Personally I say it as "aitch", so for me "an" is
               | correct here and what I would say.
        
               | nubinetwork wrote:
               | What about L (ell)? You don't say "an LAMP server", it's
               | "a LAMP server".
        
               | stirfish wrote:
               | An ell ay em pee server
               | 
               | A lamp server
        
               | xnorswap wrote:
               | Because LAMP is an acronym so is pronounced as a whole
               | word.
               | 
               | You would say "An LLP" for example.
               | 
               | HTTP is (usually) pronounced "Aitch tee tee pee", so
               | starts with a vowel.
        
               | DonHopkins wrote:
               | And then there's "an historic event" with a historically
               | British pronunciation, which sounds like a weird non-
               | intuitive exception to Americans, because Brits drop the
               | voiceless glottal fricative "h" sound and start the word
               | with the vowel "i", like "an 'istoric event". Saying "a
               | historic event" by pronouncing the "h" sounds natural to
               | a non-pedantic American (even though it's against the
               | official rules of British English), because Americans
               | pronounce the consonant voiceless glottal fricative "h"
               | instead of dropping it.
               | 
               | https://www.thesaurus.com/e/grammar/an-historic-vs-a-
               | histori...
               | 
               | https://en.wikipedia.org/wiki/H-dropping
               | 
               | H-dropping
               | 
               | H-dropping or aitch-dropping is the deletion of the
               | voiceless glottal fricative or "H-sound", [h]. The
               | phenomenon is common in many dialects of English, and is
               | also found in certain other languages, either as a purely
               | historical development or as a contemporary difference
               | between dialects. Although common in most regions of
               | England and in some other English-speaking countries, and
               | linguistically speaking a neutral evolution in languages,
               | H-dropping is often stigmatized as a sign of careless or
               | uneducated speech.
               | 
               | The reverse phenomenon, H-insertion or H-adding, is found
               | in certain situations, sometimes as an allophone or
               | hypercorrection by H-dropping speakers, and sometimes as
               | a spelling pronunciation or out of perceived etymological
               | correctness. A particular example of this is the spread
               | of 'haitch' for 'aitch'.
               | 
               | [...]
               | 
               | H-insertion
               | 
               | The opposite of H-dropping, called H-insertion or
               | H-adding, sometimes occurs as a hypercorrection in
               | typically H-dropping accents of English. It is commonly
               | noted in literature from late Victorian times to the
               | early 20th century that some lower-class people
               | consistently drop h in words that should have it, while
               | adding h to words that should not have it. An example
               | from the musical My Fair Lady is, "In 'Artford, 'Ereford,
               | and 'Ampshire, 'urricanes 'ardly hever 'appen". Another
               | is in C. S. Lewis' The Magician's Nephew: "Three cheers
               | for the Hempress of Colney 'Atch". In practice, however,
               | it would appear that h-adding is more of a stylistic
               | prosodic effect, being found on some words receiving
               | particular emphasis, regardless of whether those words
               | are h-initial or vowel-initial in the standard language.
               | 
               | Some English words borrowed from French may begin with
               | the letter <h>  but not with the sound /h/. Examples
               | include heir, and, in many regional pronunciations, hour,
               | hono(u)r and honest. In some cases, spelling
               | pronunciation has introduced the sound /h/ into such
               | words, as in humble, hotel and (for most speakers)
               | historic. Spelling pronunciation has also added /h/ to
               | the British English pronunciation of herb, /he:b/, while
               | American English retains the older pronunciation /erb/.
               | Etymology may also serve as a motivation for H-addition,
               | as in the words horrible, habit and harmony; these were
               | borrowed into Middle English from French without an /h/
               | (orrible, abit, armonie), but all three derive from Latin
               | words with an /h/ and would later acquire an /h/ in
               | English as an etymological "correction".[13] The name of
               | the letter H itself, "aitch", is subject to H-insertion
               | in some dialects, where it is pronounced "haitch". (In
               | Hiberno-English, "haitch" has come to be considered
               | standard, consistent with its not-an-H-dropping
               | dialects).[14]
        
               | pnut wrote:
               | But what about LED?
               | 
               | The point is whether you're saying the letter or a word
               | that starts with the letter.
        
               | DonHopkins wrote:
               | (In response to your pre-edit post): It's a lemon, not an
               | ellemon. Or a lamp, not an ellamp.
        
               | TRiG_Ireland wrote:
               | You say _an_ if
               | 
               | * the next word begins with a vowel sound; or
               | 
               | * the next word begins with an _h_ sound, _and_ the
               | initial syllable of the next word is unstressed, _and_
               | you 're an old-fashioned, slightly upper class Brit.
        
               | 28304283409234 wrote:
               | I am Dutch, actually. But indeed: old and fashioned.
        
               | ExoticPearTree wrote:
               | While generally true that "an" goes before vowels and "a"
               | goes before consonants, it actually matters on how the
               | consonant is pronounced - yeah, I know, English is funny
               | like this.
               | 
               | So in this particular case you can use "an" and "a"
               | depending on how you pronounce the letter H.
        
               | SturgeonsLaw wrote:
               | Would you say "a XML file" or "an XML file"?
        
               | stirfish wrote:
               | I just try not to talk about XML
        
               | DonHopkins wrote:
               | It depends on if you're in a CDATA section or not.
               | 
               | I Wanna Be <![CDATA[ -- Sung to the tune of "I Wanna Be
               | Sedated", with apologies to The Ramones:
               | 
               | https://donhopkins.medium.com/i-wanna-be-
               | cdata-3406e14d4f21
        
               | wizofaus wrote:
               | There's not really any way you can pronounce XML starting
               | with a consonant sound is there? So always "an", even if
               | you spell out the whole initialism. But..."xaml" is
               | fairly typically pronounced as "zaml", so "a xaml file"
               | would be the norm.
        
               | Joker_vD wrote:
               | I personally pronounce it as "ks-em-el", with the "a"
               | article before it. Admittedly, it sounds almost as if I
               | pronounce it as "eks-em-el" without any article at all.
        
             | aesh2Xa1 wrote:
             | Some people pronounce the letter "H" differently. The use
             | of "a" or "an" depends upon the vowel sound.
             | 
             | an aitch
             | 
             | a haitch
             | 
             | https://www.bbc.com/news/magazine-11642588
        
               | wizofaus wrote:
               | "Haitch" is now so common in Australia it _almost_ sounds
               | normal, despite my years of trying to correct anyone
               | using that pronunciation (insisting if they are going to
               | call it that, then it should also be  'feff', 'lell',
               | 'mem' etc. - thankfully I never got as far as
               | extrapolating to w-double u). I'm not sure I'll ever
               | accept it as "correct" (whatever that means) but it is
               | undoubtedly the way many people have learned to say it
               | and are unlikely to change. The "'an' only before a vowel
               | sound" rule seems to much more deeply wired in, so yeah,
               | it would be "a http server" if you're a haitcher.
        
               | JoBrad wrote:
               | Most of the changed emphasis in the article is very
               | similar to American English.
               | 
               | But "says" with a long a is an odd one. I've heard it
               | pronounced that way by a few Brits, and it still takes me
               | a minute.
        
             | [deleted]
        
             | dekhn wrote:
             | because HTTP is pronounced "hittip", obviously, and in the
             | US, we use 'a' for words that start with hard H sounds.
        
       | pravus wrote:
       | I got tired of this sort of thing recently and now my web server
       | returns 402 for everything that doesn't exist or has an invalid
       | request. Most of what I do involves API and static data so I just
       | serve everything from memory and avoid all disk access. It really
       | bothered me that tons of stupid bots eat up resources asking the
       | kernel for ENOENT.
        
       | stevage wrote:
       | I'm just surprised that anyone would implement the problematic
       | behaviour of returning 200 by default, even if there's no such
       | file to return.
       | 
       | I mean, if you write a server in Express (NodeJS) or set up
       | Nginx, it would be harder to get it wrong than to get it right.
        
         | nailer wrote:
         | This is an odd article. I thought it must have been written in
         | the nineties for the combination of obvious (see above) and
         | bizarre ("web browsers and people mostly don't care about or
         | notice the HTTP return code" advice). Your JS is absolutely
         | likely to check response codes.
        
           | stevage wrote:
           | Yeah, I started doubting myself - I do often write code that
           | checks response codes.
           | 
           | I have to admit, I'm actually not sure what a browser does if
           | a server is sending normal content but with 404 status codes.
           | At the very least, I guess it messes up the caching?
        
             | nubinetwork wrote:
             | I know google treats it as a "soft 404"...
             | 
             | There was a video on YouTube that talked about how browsers
             | and bots handled different status codes, but I'm having a
             | hard time finding it.
             | 
             | Found it. https://youtu.be/4OztMJ4EL1s
        
             | shkkmo wrote:
             | From the http spec about 4xx errors:
             | 
             | > Except when responding to a HEAD request, the server
             | SHOULD send a representation containing an explanation of
             | the error situation, and whether it is a temporary or
             | permanent condition.
             | 
             | So sending content alongside a 4xx error should be the
             | standard behavior and browsers should display that content.
             | Not sending/displaying that content is the non-standard
             | behavior.
        
       | danw1979 wrote:
       | > the Prometheus host agent's HTTP server code will give you a
       | HTTP 200 answer (with a generic front page) for any URL except
       | the special URL for its metrics endpoint.
       | 
       | > Neither party is exactly wrong here
       | 
       | Prometheus is in the wrong here, IMHO.
       | 
       | I'm not sure why the article titled "Web servers should refuse
       | requests for random, unnecessary URLs" would start out sitting on
       | the fence.
        
       | ExoticPearTree wrote:
       | Seems like a Prometheus "issue" on how I would guess
       | node_exporter handles requests for unknown resources. The easy
       | fix would be to answer with 404 for everytging except the
       | configured /metrics endpoint.
       | 
       | Any webserver, configured correctly, will return an appropriate
       | code for non existing resources.
        
       ___________________________________________________________________
       (page generated 2023-07-06 23:01 UTC)