[HN Gopher] You cannot simply publicly access private secure lin...
       ___________________________________________________________________
        
       You cannot simply publicly access private secure links, can you?
        
       Author : vin10
       Score  : 243 points
       Date   : 2024-03-07 16:29 UTC (6 hours ago)
        
 (HTM) web link (vin01.github.io)
 (TXT) w3m dump (vin01.github.io)
        
       | internetter wrote:
       | The fundamental issue is that links without any form of access
       | control are presumed private, simply because there is no public
       | index of the available identifiers.
       | 
       | Just last month, a story with a premise of discovering AWS
       | account ids via buckets[0] did quite well on HN. The consensus
       | established in the comments is that if you are relying on your
       | account identifier being private as some form of security by
       | obscurity, you are doing it wrong. The same concept applies here.
       | This isn't a novel security issue, this is just another method of
       | dorking.
       | 
       | [0]: https://news.ycombinator.com/item?id=39512896
        
         | ta1243 wrote:
         | The problem is links leak.
         | 
         | In theory a 256 hex-character link (so 1024 bits) is near
         | infinitely more secure than a 32 character username and 32
         | character password, as to guess it
         | 
         | https://site.com/[256chars]
         | 
         | As there's 2^1024 combinations. You'd never brute force it
         | 
         | vs
         | 
         | https://site,com/[32chars] with a password of [32chars]
         | 
         | As there's 2^256 combinations. Again you can't brute force it,
         | but it's more likely than the 2^1024 combinations.
         | 
         | Imagine it's
         | 
         | https://site,com/[32chars][32chars] instead.
         | 
         | But while guessing the former is harder than the latter, URLs
         | leak a lot, far more than passwords.
        
           | internetter wrote:
           | Dorking is the technique of using public search engine
           | indexes to uncover information that is presumed to be
           | private. It has been used to uncover webcams, credit card
           | numbers, confidential documents, and even spies.
           | 
           | The problem is the website administers who are encoding
           | authentication tokens into URL state, _not_ the naive
           | crawlers that find them.
        
             | shkkmo wrote:
             | It can be OK to put authentication tokens in urls, but
             | those tokens need to (at a bare minimum) have short
             | expirations.
        
               | knome wrote:
               | >It can be OK to put authentication tokens in urls
               | 
               | When would this ever be necessary? URL session tokens
               | have been a bad idea ever since they first appeared.
               | 
               | The only things even near to auth tokens I can reasonably
               | see stuffed into a URL are password reset and email
               | confirmation tokens sent to email for one time short
               | expiration use.
               | 
               | Outside of that, I don't see any reason for it.
        
               | dylanowen wrote:
               | They're useful for images when you can't use cookies and
               | want the client to easily be able to embed them.
        
               | albert_e wrote:
               | "presigned" URLs[1] are a pretty standard and recommended
               | way of providing users access to upload/download content
               | to Amazon S3 buckets without needing other forms of
               | authentication like IAM credential pair, or STS token,
               | etc
               | 
               | Web Applications do utilize this pattern very frequently
               | 
               | But as noted i previous comment these do have short
               | expiry times (configurable) so that there is no permanent
               | or long-term risk on the lines of the OP article
               | 
               | [1]: https://docs.aws.amazon.com/AmazonS3/latest/userguid
               | e/using-...
        
               | vin10 wrote:
               | You are right about short expiry times but another catch
               | here is that if pre-signed URLs are being leaked in an
               | automated fashion, these services also keep the
               | downloaded content from these URLs around. I found
               | various such examples where links no longer work, but
               | PDFs downloaded from pre-signed URLs were still stored by
               | scanning services.
               | 
               | From https://urlscan.io/blog/2022/07/11/urlscan-pro-
               | product-updat...
               | 
               | > In the process of scanning websites, urlscan.io will
               | sometimes encounter file downloads triggered by the
               | website. If we are able to successfully download the
               | file, we will store it, hash it and make it available for
               | downloading by our customers.
        
               | knome wrote:
               | Interesting. I haven't built on s3, and if I did my first
               | instinct would probably have been to gate things through
               | a website.
               | 
               | Thanks for sharing your knowledge in that area.
        
             | layer8 wrote:
             | I wonder if there would be a way to tag such URLs in a
             | machine-recognizable, but not text-searchable way. (E.g.
             | take every fifth byte in the URL from after the authority
             | part, and have those bytes be a particular form of hash of
             | the remaining bytes.) Meaning that crawlers and tools in
             | TFA would have a standardized way to recognize when a URL
             | is meant to be private, and thus could filter them out from
             | public searches. Of course, being recognizable in that way
             | may add new risks.
        
               | loa_in_ wrote:
               | We already have robots.txt in theory.
        
               | layer8 wrote:
               | I didn't think robots.txt would be applicable to URLs
               | being copied around, but actually it might be, good
               | point. Though again, collecting that robots.txt
               | information could make it easier to search for such URLs.
        
               | internetter wrote:
               | We already have a solution to this. It's called not
               | including authentication information within URLs
               | 
               | Even if search engines knew to include it, would every
               | insecure place a user put a link know it? Bad actors with
               | their own indexes certainly wouldn't care
        
               | layer8 wrote:
               | How do you implement password-reset links otherwise? I
               | mean, those should be short-lived, but still.
        
               | andersa wrote:
               | You could send the user a code that they must copy paste
               | onto the page rather than sending them a link.
        
               | dmurray wrote:
               | Also, it would allow bad actors to just opt out of
               | malware scans - the main vector whereby these insecure
               | URLs were leaked.
        
               | tsimionescu wrote:
               | Actually, there are cases where this is more or less
               | unavoidable.
               | 
               | For example, if you want a web socket server that is
               | accessible from a browser, you need authentication, and
               | can't rely on cookies, the only option is to encode the
               | Auth information in the URL (since browsers don't allow
               | custom headers in the initial HTTP request for
               | negotiating a web socket).
        
               | zer00eyz wrote:
               | Authentication: Identify yourself
               | 
               | Authorization: Can you use this service.
               | 
               | Access Control/Tokenization: How long can this service be
               | used for.
               | 
               | I swipe my badge on the card reader. The lock unlocks.
               | 
               | Should we leave a handy door stopper or 2x4 there, so you
               | can just leave it propped open? Or should we have tokens
               | that expire in a reasonable time frame.. say a block of
               | ice (in our door metaphor) so it disappears at some point
               | in future? Nonce tokens have been a well understood
               | pattern for a long time...
               | 
               | Its not that these things are unavoidable its that
               | security isnt first principal, or easy to embed due to
               | issues of design.
        
               | skissane wrote:
               | > the only option is to encode the Auth information in
               | the URL (since browsers don't allow custom headers in the
               | initial HTTP request for negotiating a web socket).
               | 
               | Put a timestamp in the token and sign it with a private
               | key, so that the token expires after a defined time
               | period.
               | 
               | If the URL is only valid for the next five minutes, the
               | odds that the URL will leak and be exploited in that five
               | minute window is very low
        
             | FrustratedMonky wrote:
             | "public search engine indexes"
             | 
             | Then it should be the search engine at fault.
             | 
             | If you leave your house unlocked is one thing.
             | 
             | If there is a company trying everyone's doors, then posting
             | a sign in the yard "this house is unlocked", has to account
             | for something.
        
               | lmm wrote:
               | A plain URL is an open door not a closed one. Most
               | websites are public and expected to be public.
        
               | FrustratedMonky wrote:
               | Isn't that the point of the post?
               | 
               | There are URL's that are out there 'as-if' public, but
               | really should be private.
               | 
               | And some people argue they should be treated as private,
               | even if it is just a plain URL and public.
        
           | 4death4 wrote:
           | Passwords are always private. Links are only sometimes
           | private.
        
             | QuinnyPig wrote:
             | Yup. There's a reason putting credentials into url
             | parameters is considered dangerous.
        
             | bachmeier wrote:
             | Well-chosen passwords stored properly are always private.
             | Passwords also tend to have much longer lifetimes than
             | links.
        
           | masom wrote:
           | You won't find a specific link, but at some point if you
           | generate millions of urls the 1024 bits will start to return
           | values pretty quick through bruteforce.
           | 
           | The one link won't be found quickly, but a bunch of links
           | will. You just need to fetch all possibilities and you'll get
           | data.
        
             | blueflow wrote:
             | 1024 bits seems a bit too much for the birthday problem to
             | be a thing.
             | 
             | I looked at [1] to do the calculation but (2^1024)! is a
             | number too large for any of my tools. If someone has a math
             | shortcut to test this idea properly...
             | 
             | [1] https://en.wikipedia.org/wiki/Birthday_problem#Calculat
             | ing_t...
        
               | saagarjha wrote:
               | Stirling's approximation?
        
             | duskwuff wrote:
             | > You won't find a specific link, but at some point if you
             | generate millions of urls the 1024 bits will start to
             | return values pretty quick through bruteforce.
             | 
             | Not even close. 1024 bits is a really, really big address
             | space.
             | 
             | For the sake of argument and round numbers, let's say that
             | there are 4.2 billion (2^32) valid URLs. That means that
             | one out of every 2^992 randomly generated URLs is valid.
             | Even if you guessed billions of URLs every second, the
             | expected time to come up with a valid one (~2^960 seconds)
             | is still many orders of magnitude greater than the age of
             | the universe (~2^59 seconds).
        
             | charleslmunger wrote:
             | I'm not sure your math checks out. With 1024 bits of
             | entropy and, say, 1 trillion valid links, your chances of
             | any one link being valid are 1/2^984
             | 
             | So test a million links - your probability of finding a
             | real one is (1-1/2^984)^1000000. That's around 1/10^291
             | chance of hitting a valid URL with a million tries. Even if
             | you avoid ever checking the same URL twice it will still
             | take you an impractical amount of time.
        
               | mbrumlow wrote:
               | All this is fine and dandy until your link shows up in a
               | log at /logs.
        
             | eknkc wrote:
             | We call 128 bit random data "universally" unique ids. 1024
             | bits won't ever get close to returning any random hits.
        
           | Y_Y wrote:
           | > site-comma-com
           | 
           | Did you do that just to upset me?
        
           | hiddencost wrote:
           | No. In theory they are both totally insecure.
        
           | noahtallen wrote:
           | You can easily rate-limit an authentication attempt, to make
           | brute-forcing account access practically impossible, even for
           | a relatively insecure passwords.
           | 
           | How would you do that for the URLs? 5 requests to
           | site.com/[256chars] which all 404 block your IP because you
           | don't have a real link? I guess the security is relying on
           | the fact that only a very a small percentage of the total
           | possible links would be used? Though the likelihood of
           | randomly guessing a link is the same as the % of addressable
           | links used.
        
           | ablob wrote:
           | Which alphabet did you take as a basis to reach 2^256
           | combinations?
        
         | bo1024 wrote:
         | There's probably details I'm missing, but I think the
         | fundamental issue is that "private" messages between people are
         | presumed private, but actually the platforms we use to send
         | messages do read those messages and access links in them. (I
         | mean messages in a very broad sense, including emails, DMs,
         | pasted links in docs, etc.)
        
           | internetter wrote:
           | URL scanners are not scanning links contained within
           | platforms that require access control. They haven't guessed
           | your password, and to my knowledge no communications platform
           | is feeding all links behind authentication into one of these
           | public URL scanning databases. As the article acknowledged in
           | the beginning, these links are either exposed as the result
           | of deliberate user action, or misconfigured extensions (that,
           | I might add, are suffering from this exact same
           | misconception).
           | 
           | If the actual websites are configured to not use the URL as
           | the authentication state, all this would be avoided
        
             | tobyjsullivan wrote:
             | The suggestion (in both the article and the parent) is that
             | the platforms themselves are submitting URLs. For example,
             | if I send a link in Discord[0] DM, it might show the
             | recipient a message like "warning: this link is malicious".
             | How does it know that? It submitted the url to one of these
             | services without your explicit consent.
             | 
             | [0] Discord is a hypothetical example. I don't know if they
             | have this feature. But an increasing number of platforms
             | do.
        
               | internetter wrote:
               | Where in the article does it suggest this? The two bullet
               | points at the very top of TFA is what I cited to
               | discredit this notion, I read it again and still haven't
               | found anything suggesting the communication platforms are
               | submitting this themselves.
        
               | bombcar wrote:
               | Falcon Sandbox is explicitly mentioned - which is a
               | middleware that can be installed on various communication
               | platforms (usually enterprise):
               | https://www.crowdstrike.com/products/threat-
               | intelligence/fal...
               | 
               | Microsoft has "safe links":
               | https://learn.microsoft.com/en-
               | us/microsoft-365/security/off... - Chrome has its own
               | thing, but there are also tons of additional hand-rolled
               | similar features.
               | 
               | My main annoyance is when they kill a one-time use URL.
        
               | anonymousDan wrote:
               | Do you know if safe links is guilty of the issue in the
               | OP?
        
               | bombcar wrote:
               | I suspect not because Microsoft is using their own
               | internal system.
               | 
               | However, it likely exposes the content internally to
               | Microsoft.
        
               | tobyjsullivan wrote:
               | I thought I read it in the article but I may have
               | unconsciously extrapolated from and/or misread this part:
               | 
               | "I came across this wonderful analysis by Positive
               | Security[0] who focused on urlscan.io and used canary
               | tokens to detect potential automated sources (security
               | tools scanning emails for potentially malicious [links])"
               | 
               | I don't see any mention of messaging platforms generally.
               | It only mentions email and does not suggest who might be
               | operating the tooling (vendor or end users). So I seem to
               | have miscredited that idea.
               | 
               | [0] https://positive.security/blog/urlscan-data-leaks
        
             | nightpool wrote:
             | The article says "Misconfigured scanners". Many, many
             | enterprise communication tools have such a scanner, and if
             | your IT team is using the free plan of whatever url scan
             | tool they signed up for, it's a good bet that these links
             | may end up being public.
        
         | mikepurvis wrote:
         | Bit of a tangent, but I was recently advised by a consultant
         | that pushing private Nix closures to a publicly-accessible S3
         | bucket was fine since each NAR file has a giant hash in the
         | name. I didn't feel comfortable with it so we ended up going a
         | different route, but I've continued to think about that since
         | how different is it _really_ to have the  "secret" be in the
         | URL vs in a token you submit as part of the request for the
         | URL?
         | 
         | And I think for me it comes down to the fact that the tokens
         | can be issued on a per-customer basis, and access logs can be
         | monitored to watch for suspicious behaviour and revoke
         | accordingly.
         | 
         | Also, as others have mentioned, there's just a different
         | mindset around how much it matters that the list of names of
         | files be kept a secret. On the scale of things Amazon might
         | randomly screw up, accidentally listing the filenames sitting
         | in your public bucket sounds pretty low on the priority list
         | since 99% of their users wouldn't care.
        
           | johnmaguire wrote:
           | > how different is it really to have the "secret" be in the
           | URL vs in a token you submit as part of the request for the
           | URL?
           | 
           | I'm not sure I grok this. Do you mean, for example, sending a
           | token in the POST body, or as a cookie / other header?
           | 
           | One disadvantage to having a secret in the URL, versus in a
           | header or body, is that it can appear in web service logs,
           | unless you use a URI fragment. Even then, the URL is visible
           | to the user, and will live in their history and URL bar -
           | from which they may copy and paste it elsewhere.
        
             | mikepurvis wrote:
             | In this case it's package archives, so they're never
             | accessed from a browser, only from the Nix daemon for
             | binary substitution [1]:
             | https://nixos.wiki/wiki/Binary_Cache
        
           | nmadden wrote:
           | I wrote about putting secrets in URLs a few years ago:
           | https://neilmadden.blog/2019/01/16/can-you-ever-safely-
           | inclu...
        
             | Sn0wCoder wrote:
             | Question in the Waterken-Key flow with token in the URL
             | fragment the URL looks like HTTPS
             | www.example.com/APP/#mhbqcmmva5ja3 - but in the diagram its
             | hitting example.com/API/#mhbqcmmva5ja3 Is this a type-o OR
             | are we mapping APP to API with the proxy so the user thinks
             | they are going to the APP with their Key. Or does the
             | browser do us for us automatically when it sees app in the
             | URL and then stores the key in window.location.hash. I am
             | confused and might just find the answer on Google but since
             | you appear to be the author maybe you can answer the
             | question here.
        
         | bachmeier wrote:
         | > The fundamental issue is that links without any form of
         | access control are presumed private, simply because there is no
         | public index of the available identifiers.
         | 
         | Is there a difference between a private link containing a
         | password and a link taking you to a site where you input the
         | password? Bitwarden Send gives a link that you can hand out to
         | others. It has # followed by a long random string. I'd like to
         | know if there are security issues, because I use it regularly.
         | At least with the link, I can kill it, and I can automatically
         | have it die after a few days. Passwords generally don't work
         | that way.
        
           | koolba wrote:
           | If there's a live redirect at least there's the option to
           | revoke the access if the otherwise public link is leaked. I
           | think that's what sites like DocuSign do with their public
           | links. You can always regenerate it and have it resent to the
           | intended recipients email, but it expires after some fixed
           | period of time to prevent it from being public forever.
        
           | 7952 wrote:
           | There is a difference in that people intuitively know that
           | entering passwords gives access. Also, it may be different
           | legally as the user could reasonably be expected to know that
           | they are not supposed to access something.
        
           | PeterisP wrote:
           | Yes, the difference is in what all our tools and
           | infrastructure presume to be more or less sensitive.
           | 
           | Sending a GET request to a site for the password-input screen
           | and POST'ing the password will get very different treatement
           | than sending the same amount of "authorization bits" in the
           | URL; in the first case, your browser won't store the secret
           | in the history, the webserver and reverse proxy won't include
           | it in their logs, various tools won't consider it appropriate
           | to cache, etc, etc.
           | 
           | Our software infrastructure is built on an assumption that
           | URLs aren't really sensitive, not like form content, and so
           | they get far more sloppy treatment in many places.
           | 
           | If the secret URL is short-lived or preferably single-use-
           | only (as e.g. many password reset links) then that's not an
           | issue, but if you want to keep something secret long-term,
           | then using it in an URL means it's very likely to get placed
           | in various places which don't really try to keep things
           | secret.
        
         | fddrdplktrew wrote:
         | legend.
        
         | XorNot wrote:
         | Worked for a company which ran into an S3 bucket naming
         | collision when working with a client - turns out that both
         | sides decided hyphenated-company-name was a good S3 bucket name
         | (my company lost that race obviously).
         | 
         | One of those little informative pieces where everytime I do AWS
         | now all the bucket names are usually named
         | <project>-<deterministic hash from a seed value>.
         | 
         | If it's really meant to be private then you encrypt the
         | project-name too and provide a script to list buckets with
         | "friendly" names.
         | 
         | There's always a weird tradeoff with hosted services where
         | technically the perfect thing (totally random identifiers) is
         | too likely to mostly be an operational burden compared to the
         | imperfect thing (descriptive names).
        
       | scblock wrote:
       | When it comes to the internet if something like this is not
       | protected by anything more than a random string in a URL then
       | they aren't really private. Same story with all the internet
       | connected web cams you can find if you go looking. I thought we
       | knew this already. Why doesn't the "Who is responsible" section
       | even mention this?
        
         | AnotherGoodName wrote:
         | Such links are very useful in an 'it's OK to have security
         | match the use case' type of way. You don't need maximum
         | security for everything. You just want a barrier to widespread
         | sharing in some cases.
         | 
         | As an example i hit 'create link share' on a photo in my photo
         | gallery and send someone the link to that photo. I don't want
         | them to have to enter a password. I want the link to show the
         | photo. It's ok for the link to do this. One of the examples
         | they have here is exactly that and it's fine for that use case.
         | In terms of privacy fears the end user could re-share a
         | screenshot at that point anyway even if there was a login. The
         | security matches the use case. The user now has a link to a
         | photo, they could reshare but i trust they won't intentionally
         | do this.
         | 
         | The big issue here isn't the links imho. It's the security
         | analysis tools scanning all links a user received via email and
         | making them available to other users in that community. That's
         | more re-sharing than i intended when i sent someone a photo.
        
           | nonrandomstring wrote:
           | > Such links are very useful in an 'it's OK to have security
           | match the use case'
           | 
           | I think you give the most sensible summary. It's about
           | "appropriate and proportional" security for the ease of use
           | trade-off.
           | 
           | > the user now has a link to a photo, they could reshare but
           | i trust they won't intentionally do this.
           | 
           | Time limits are something missing from most applications to
           | create ephemeral links. Ideally you'd want to choose from
           | something like 1 hour, 12 hours, 24 hours, 72 hours... Just
           | resend if they miss the message and it expires.
           | 
           | A good trick is to set a cron job on your VPS to clear
           | /www/tmp/ at midnight every other day.
           | 
           | > The big issue here isn't the links imho. It's the security
           | analysis tools scanning all links a user received via email
           | 
           | You have to consider anything sent to a recipient of Gmail,
           | Microsoft, Apple - any of the commercial providers - to be
           | immediately compromised. If sending between private domains
           | on unencrypted email then it's immediately compromised by
           | your friendly local intelligence agency. If using PGP or am
           | E2E chat app, assume it _will_ be compromised at the end
           | point eventually, so use an ephemeral link.
        
           | marcosdumay wrote:
           | The situation is greatly improved if you make the link short-
           | lived and if you put the non-public data in a region of the
           | URL that expects non-public data, like in the password, as in
           | "https://anonymous:32_chars_hash@myphotolibrary.example.com/u
           | ...".
        
       | victorbjorklund wrote:
       | Can someone smarter explain to me what is different between?
       | 
       | 1) domain.com/login user: John password: 5 char random password
       | 
       | 2) domain.com/12 char random url
       | 
       | If we assume both either have the same bruteforce/rate limiting
       | protection (or none at all). Why is 1 more safe than 2?
        
         | amanda99 wrote:
         | Two things:
         | 
         | 1. "Password" is a magic word that makes people less likely to
         | just paste it into anything.
         | 
         | 2. Username + passwords are two separate pieces of information
         | that are not normally copy-pasted at the same time or have a
         | canonical way of being stored next to each other.
        
           | victorbjorklund wrote:
           | 1) Make sense. 2) Not sure about that. If someone shares
           | their password with someone else they probably share both the
           | username/email and the password
        
             | amanda99 wrote:
             | Yes, people share usernames and passwords, but there's no
             | single canonical string, like
             | "username=amanda99&password=hithere". For example most of
             | the time when I share user/pass combos, they are in
             | separate messages on Signal. You type them into two
             | different boxes, so you normally copy the username, then
             | the password in separate actions.
        
               | nightpool wrote:
               | I mean, for HTTP Basic there literally _is_ a single
               | canonical string, and it 's not uncommon to see people
               | send you links like
               | https://user:somepasswordhere@example.com.
               | 
               | I think the arguments other commenters have made about
               | logging, browser history storage, etc are more convincing
        
         | munk-a wrote:
         | Assuming that 5 char password is done in a reasonable way then
         | that data is not part of the publicly visible portion of the
         | request that anyone along the chain of the communication can
         | trivially eavesdrop. In a lot of cases that password even
         | existing (even if there's no significant data there) will
         | transform a request from a cacheable request into an
         | uncacheable request so intermediate servers won't keep a copy
         | of the response in case anyone else wants the document (there
         | are other ways to do this but this will also force it to be the
         | case).
        
         | koliber wrote:
         | From the information theory angle, there is no difference.
         | 
         | In practice, there is.
         | 
         | There is a difference between something-you-have secrets and
         | something-you-know secrets.
         | 
         | A UrL is something you have. It can be taken from you if you
         | leave it somewhere accessible. Passwords are something-you-know
         | and if managed well can not be taken (except for the lead pipe
         | attack).
         | 
         | There is also something-you-are, which includes retina and
         | fingerprint scans.
        
         | kube-system wrote:
         | The difference is that people (and software that people write)
         | often treat URLs differently than a password field. 12
         | characters might take X amount of time to brute force, but if
         | you already have the 12 characters, that time drops to zero.
        
         | rkangel wrote:
         | This article is the exact reason why.
         | 
         | (1) Requires some out-of-band information to authenticate.
         | Information that people are used to keeping safe.
         | 
         | On the other hand the URLs in (2) are handled as URLs. URLs are
         | often logged, recorded, shared, passed around. E.g. your work
         | firewall logging the username and password you used to log into
         | a service would obviously be bad, but logging URLs you've
         | accessed would probably seems fine.
         | 
         | [the latter case is just an example - the E2E guarantees of TLS
         | mean that neither should be accessible]
        
         | wetpaste wrote:
         | In the context of this article, it is that security scanning
         | software that companies/users are using seem to be indexing
         | some of the 12-char links out of emails which ends up in some
         | cases on public scan. Additionally, if domain.com/12-char-
         | password is requested without https, even if there is a
         | redirect, that initial request went over the wire unencrypted
         | and therefore could be MITM, whereas with a login page, there
         | are more ways to guarantee that the password submit would only
         | ever happen over https.
        
         | jarofgreen wrote:
         | As well as what the others have said, various bits of software
         | make the assumption that 1) may be private and to be careful
         | with it and 2) isn't.
         | 
         | eg Your web browser will automatically save any URLs to it's
         | history for any user of the computer to see but will ask first
         | before saving passwords.
         | 
         | eg Any web proxies your traffic goes through or other software
         | that's looking like virus scanners will probably log URLs but
         | probably won't log form contents (yes HTTPS makes this one more
         | complicated but still).
        
         | hawski wrote:
         | You can easily make a regex to filter out URLs. There is no
         | universal regex (other than maybe costly LLM) to match the URL,
         | the username and the password.
        
         | ApolloFortyNine wrote:
         | I researched this a while ago when I was curious if you could
         | put auth tokens as query params.
         | 
         | One of the major issues is that many logging applications will
         | log the full url somewhere, so now your logging 'passwords'.
        
           | laurels-marts wrote:
           | You can definitely pass JWT as a query param (and often are
           | in embedded scenarios) and no its not the same as logging
           | passwords unless you literally place the password in the
           | payload (which would be stupid).
        
       | amanda99 wrote:
       | Off topic: but that links to cloudflare radar which apparently
       | mines data from 1.1.1.1. I was under the impression that 1.1.1.1
       | did not use user data for any purposes?
        
         | kube-system wrote:
         | CF doesn't sell it or use it for marketing, but the entire way
         | they even got the addresses was because APNIC wanted to study
         | the garbage traffic to 1.1.1.1.
        
           | amanda99 wrote:
           | > CF doesn't sell it or use it for marketing
           | 
           | Any source for this? Do you work there? I checked their docs
           | and they say they don't "mine user data", so I wouldn't trust
           | anything they say, at least outside legal documents.
        
             | kube-system wrote:
             | https://1.1.1.1/dns/
             | 
             | > We will never sell your data or use it to target ads.
             | 
             | https://developers.cloudflare.com/1.1.1.1/privacy/public-
             | dns...
             | 
             | > Cloudflare will not sell or share Public Resolver users'
             | personal data with third parties or use personal data from
             | the Public Resolver to target any user with advertisements.
             | 
             | There's a lot of transparency on that page in particular,
             | down to the lists of the fields in the logs.
        
       | boxed wrote:
       | Outlook.com leaks links to bing. At work it's a constant attack
       | surface that I have to block by looking at the user agent string.
       | Thankfully they are honest in the user agent!
        
       | overstay8930 wrote:
       | Breaking news: Security by obscurity isn't actually security
        
         | makapuf wrote:
         | Well, I like my password/ssh private key to be kept in
         | obscurity.
        
           | fiddlerwoaroof wrote:
           | Yeah, I've always hated this saying because all security
           | involves something that is kept secret, or "obscure". Also,
           | obscurity is a valid element of a defense in depth strategy
        
             | koito17 wrote:
             | To play devil's advocate, people discourage "security by
             | obscurity" but not "security with obscurity". That is to
             | say, secrets or "obscurity" as part of a layer in your
             | overall security model isn't what gets contested, it's
             | solely relying on obscure information staying obscure that
             | gets contested.
             | 
             | e.g. configuring an sshd accepting password auth and
             | unlimited retries to listen on a non-22 port is "security
             | by obscurity". configuring an sshd to disallow root logins,
             | disallow password authentication, only accept connections
             | from a subset of "trustworthy" IP addresses, and listen on
             | a non-22 port, is "security with obscurity"
        
             | maxcoder4 wrote:
             | The idea behind "security thorough obscurity" is that even
             | if the adversary knows everything about your setup *except
             | the secret keys*, you should be secure. Security through
             | obscurity is any method of protection other than the secret
             | key, like for example: * serving ssh on a random high port
             | * using a custom secret encryption algorithm * hosting an
             | unauthenticated service on a secret subdomain in hope
             | nobody will find out * or with a long directory name
             | 
             | Some security thorough obscurity is OK (for example high
             | ports or port knocking help buy time when protecting from a
             | zeroday on the service). It's just that relying only on the
             | security thorough obscurity is bad.
             | 
             | In this case, I wouldn't call URLs with embedded key
             | security through obscurity, just a poor key management.
        
           | overstay8930 wrote:
           | If you use an HSM you wouldn't have to worry about that
           | either
        
         | panic wrote:
         | "Security by obscurity" means using custom, unvetted
         | cryptographic algorithms that you believe others won't be able
         | to attack because they're custom (and therefore obscure).
         | Having a key you are supposed to keep hidden isn't security by
         | obscurity.
        
       | QuercusMax wrote:
       | I've always been a bit suspicious of infinite-use "private"
       | links. It's just security thru obscurity. At least when you share
       | a Google doc or something there's an option that explicitly says
       | "anyone with the URL can access this".
       | 
       | Any systems I've built that need this type of thing have used
       | Signed URLs with a short lifetime - usually only a few minutes.
       | And the URLs are generally an implementation detail that's not
       | directly shown to the user (although they can probably see them
       | in the browser debug view).
        
         | empath-nirvana wrote:
         | There's functionally no difference between a private link and a
         | link protected by a username and password or an api key, as
         | long as the key space is large enough.
        
           | ses1984 wrote:
           | You can't revoke an individual user's access to a hard to
           | guess link.
        
             | colecut wrote:
             | You can if it's one link per user
        
               | vel0city wrote:
               | Lots of platforms I've used with these public share links
               | don't really support multiple share links, and if they do
               | the management of it is pretty miserable. Clicking share
               | multiple times just gives the same link.
        
               | ses1984 wrote:
               | True but if you're generating one link per user, at what
               | point do you lift up your head and wonder if it wouldn't
               | be easier to just use authentication?
        
               | jddj wrote:
               | The friction that semi-private links remove is that the
               | recipient doesn't need an account for your service.
               | 
               | Any tradeoffs should be viewed in that context.
        
               | anonymousDan wrote:
               | I like how google docs does it. You can specify the email
               | of a user allowed to access the link (doesn't need to be
               | gmail). When they click it they will be told to check for
               | a validation email containing a link to the actual
               | document.
        
           | kriops wrote:
           | There is a big difference in how the browser treats the
           | information, depending on how you provide it. Secrets in URLs
           | leak more easily.
        
           | rfoo wrote:
           | Most of developers are aware that username or password are
           | PII and if they log it they are likely to get fired.
           | 
           | Meanwhile our HTTP servers happily log every URI it received
           | in access logs. Oh, and if you ever send a link in non E2EE
           | messenger it's likely their server generated the link preview
           | for you.
        
           | vel0city wrote:
           | There's one big functional difference. People don't normally
           | have their username and password or API key directly in the
           | URL.
           | 
           | Example 1:
           | 
           | Alice wants Bob to see CoolDocument. Alice generates a URL
           | that has the snowflake in the URL and gives it to Bob. Eve
           | manages to see the chat, and can now access the document.
           | 
           | Example 2:
           | 
           | Alice wants Bob to see CoolDocument. Alice clicks "Share with
           | Bob" in the app, grabs the URL to the document with no
           | authentication encoded within and sends it to Bob. Bob clicks
           | the link, is prompted to login, Bob sees the document. Eve
           | manages to see the chat, follows the link, but is unable to
           | login and thus cannot see the document.
           | 
           | Later, Alice wants to revoke Bob's access to the document.
           | Lots of platforms don't offer great tools to revoke
           | individual generated share URLs, so it can be challenging to
           | revoke Bob's access without potentially cutting off other
           | people's access in Example 1, as that link might have been
           | shared with multiple people. In example 2, Alice just removes
           | Bob's access to the doucment and now his login doesn't have
           | permissions to see it. Granted, better link management tools
           | could sovle this, but it often seems like these snowflake
           | systems don't really expose a lot of control over multiple
           | share links.
        
           | LaGrange wrote:
           | I mean, there's a functional difference if your email client
           | will try to protect you by submitting the URL to a public
           | database. Which is incredible and mind-boggling, but also
           | apparently the world we live in.
        
           | nkrisc wrote:
           | There's a big difference. The latter requires information not
           | contained in the URL to access the information.
        
             | ironmagma wrote:
             | That's not a fundamental difference but a difference of
             | convention. A lot of us have been in the convention long
             | enough that it seems like a fundamental.
        
             | deathanatos wrote:
             | > _Here 's the URL to the thing:
             | https://example.com/a/url?secret=hunter2_
             | 
             | This is indexable by search engines.
             | 
             | > _Here 's the URL to the thing: https://example.com/a/url
             | and the password is "hunter2"._
             | 
             | This is indexable by search engines.
             | 
             | Yes, the latter is marginally harder, but you're still
             | leaning on security through obscurity, here.
             | 
             | The number of times I have had "we need to securely
             | transmit this data!" end with exactly or something
             | equivalent to emailing an encrypted ZIP _with the password
             | in the body of the email_ (or sometimes, some other
             | insecure channel...) ...
        
               | nkrisc wrote:
               | Sure if you're comparing worst case of one to best case
               | of the other it's functionally similar, but if the
               | password is strong and handled properly then they are not
               | functionally similar at all.
        
               | pests wrote:
               | Right, but you settled on the answer as well. You must
               | communicate the password via a different medium, which is
               | impossible with links.
        
           | OtherShrezzing wrote:
           | There's at least one critical functional difference: The URL
           | stays in the browser's history after it's been visited.
        
         | voiper1 wrote:
         | >At least when you share a Google doc or something there's an
         | option that explicitly says "anyone with the URL can access
         | this".
         | 
         | Unfortunately, it's based on the document ID, so you can't re-
         | enable access with a new URL.
        
           | nightpool wrote:
           | Not true, as you may have heard they closed this loophole in
           | 2021 by adding a "resource key" (that can be rotated) to
           | every shared URL: https://9to5google.com/2021/07/28/google-
           | drive-security-upda....
        
       | sbr464 wrote:
       | All media/photos you upload to a private airtable.com app are
       | public links. No authentication required if you know the url.
        
         | internetter wrote:
         | This is actually fairly common for apps using CDNs - not just
         | airtable. I agree it's potentially problematic
        
           | blue_green_maps wrote:
           | Yes, this is the case for images uploaded through GitHub
           | comments, I think.
        
       | ttymck wrote:
       | Zoom meeting links often have the password appended as a query
       | parameter. Is this link a "private secure" link? Is the link
       | without the password "private secure"?
        
         | bombcar wrote:
         | If the password is randomized for each meeting, the URL link is
         | not so bad, as the meeting will be dead and gone by the time
         | the URL appears elsewhere.
         | 
         | But in reality, nobody actually cares and just wants a "click
         | to join" that doesn't require fumbling around - but the
         | previous "just use the meeting ID" was too easily guessed.
        
           | runeb wrote:
           | Unless its a recurring meeting
        
       | godelski wrote:
       | There's a clear UX problem here. If you submit a scan it doesn't
       | tell you it is public.
       | 
       | There can be a helpful fix: make clear that the scan is public!
       | When submitting a scan it isn't clear, as the article shows. But
       | you have the opportunity to also tell the user that it is public
       | during the scan, which takes time. You also have the opportunity
       | to tell them AFTER the scan is done. There should be a clear
       | button to delist.
       | 
       | urlscan.io does a bit better but the language is not quite clear
       | that it means the scan is visible to the public. And the colors
       | just blend in. If something isn't catching to your eye, it might
       | as well be treated as invisible. If there is a way to easily
       | misinterpret language, it will always be misinterpreted. if you
       | have to scroll to find something, it'll never be found.
        
         | heipei wrote:
         | Thanks for your feedback. We show the Submit button on our
         | front page as "Public Scan" to indicate that the scan results
         | will be public. Once the scan has finished it will also contain
         | the same colored banner that says "Public Scan". On each scan
         | result page there is a "Report" button which will immediately
         | de-list the scan result without any interaction from our side.
         | If you have any ideas on how to make the experience more
         | explicit I would be happy to hear it!
        
           | godelski wrote:
           | I understand, but that is not clear enough. "Public scan" can
           | easily be misinterpreted. Honestly, when I looked at it, I
           | didn't know what it meant. Just looked like idk maybe a
           | mistranslation or something? Is it a scan for the public? Is
           | the scanning done in public? Are the results public? Who
           | knows. Remember that I'm not tech literate and didn't make
           | the project.
           | 
           | I'd suggest having two buttons, "public scan" "private scan".
           | That would contextualize the public scan to clarify and when
           | you are scanning is publicly __listed__. And different
           | colors. I think red for "public" would actually be the better
           | choice.
           | 
           | Some information could be displayed while scanning. Idk put
           | something like "did you know, using the public scan makes the
           | link visible to others? This helps security researchers. You
           | can delist it by clicking ____" or something like that and do
           | the inverse. It should stand out. There's plenty of time
           | while the scan happens.
           | 
           | > On each scan result page there is a "Report" button which
           | will immediately de-list the scan result without any
           | interaction from our side.
           | 
           | "Report" is not clear. That makes me think I want to report a
           | problem. Also I think there is a problem with the color
           | scheme. The pallet is nice but at least for myself, it all
           | kinda blends in. Nothing pops. Which can be nice at times,
           | but we want to draw the user to certain things, right? I
           | actually didn't see the report button at first. I actually
           | looked around, scrolled, and then even felt embarrassed when
           | I did find it because it is in an "obvious" spot. One that I
           | even looked at! (so extra embarrassing lol)
           | 
           | I think this is exactly one of those problems where when you
           | build a tool everything seems obvious and taken care of. You
           | clearly thought about these issues (far better than most!)
           | but when we put things out into public, we need to see how
           | they get used and where our assumptions miss the mark.
           | 
           | I do want to say thank you for making this. I am criticizing
           | not to put you down or dismiss any of the work you've done.
           | You've made a great tool that helps a lot of people. You
           | should feel proud for that! I am criticizing because I want
           | to help make the tool the best tool it can be. Of course
           | these are my opinions. My suggestion would be to look at
           | other opinions as well and see if there are common themes.
           | Godelski isn't right, they're just one of many voices that
           | you have to parse. Keep up the good work :)
        
             | heipei wrote:
             | Thanks, that is great feedback and we'll try to improve how
             | the scan visibility is shown and what it actually means.
             | The suggestion of adding a text to the loading page is a
             | great idea, and the feedback about the colors on the result
             | page is totally valid.
             | 
             | I'm the last person who wants to see private data
             | accidentally leak into the public domain. However
             | experience has shown that combating the massive amounts of
             | fraud and malicious activity on the web nowadays requires
             | many eyes that are able to access that data and actually do
             | something about it. That is the reason we have these public
             | scans in the first place.
        
               | godelski wrote:
               | And thank you for being receptive and listening! I hope
               | my thoughts and others can help make your tools better.
               | 
               | I really appreciate that people like you are out there
               | trying to defend our data and privacy. I know it is such
               | a difficult problem to solve and you got a lot of work
               | ahead of you. But appreciation is often not said enough
               | and left implied. So I want to make it explicit: Thank
               | you.
               | 
               | (and I'll say this interaction is the best advertisement
               | you could make, at least to me haha)
        
             | vin10 wrote:
             | This is a very well formulated suggestion. Nicely written!
        
       | r2b2 wrote:
       | To create private shareable links, store the private part in the
       | hash of the URL. The hash is not transmitted in DNS queries or
       | HTTP requests.
       | 
       | Ex. When links.com?token=<secret> is visited, that link will be
       | transmitted and potentially saved (search parameters included) by
       | intermediaries like Cloud Flare.
       | 
       | Ex. When links.com#<secret> is visited, the hash portion will not
       | leave the browser.
       | 
       |  _Note: It 's often nice to work with data in the hash portion by
       | encoding it as a URL Safe Base64 string. (aka. JS Object - JSON
       | String - URL Safe Base 64 String)._
        
         | eterm wrote:
         | If it doesn't leave the browser, how would the server know to
         | serve the private content?
        
           | jadengeller wrote:
           | Client web app makes POST request. It leaves browser, but not
           | in URL
        
         | klabb3 wrote:
         | Thanks, finally some thoughts about how to solve the issue. In
         | particular, email based login/account reset is the main
         | important use case I can think of.
         | 
         | Do bots that follow links in emails (for whatever reason)
         | execute JS? Is there a risk they activate the thing with a JS
         | induced POST?
        
           | 369548684892826 wrote:
           | Yes, I've seen this bot JS problem, it does happen.
        
           | r2b2 wrote:
           | To somewhat mitigate the _link-loading bot_ issue, the link
           | can land on a  "confirm sign in" page with a button the user
           | must click to trigger the POST request that completes
           | authentication.
           | 
           | Another way to mitigate this issue is to store a secret in
           | the browser that initiated the link-request (Ex. local
           | storage). However, this can easily break in situations like
           | private mode, where a new tab/window is opened without access
           | to the same session storage.
           | 
           | An alternative to the in-browser-secret, is doing a browser
           | fingerprint match. If the browser that opens the link doesn't
           | match the fingerprint of the browser that requested the link,
           | then fail authentication. This also has pitfalls.
           | 
           | Unfortunately, if your threat model requires blocking _bots
           | that click too_ , your likely stuck adding some semblance of
           | a second factor (pin/password, bio metric, hardware key,
           | etc.).
           | 
           | In any case, when using link-only authentication, best to at
           | least put sensitive user operations (payments, PII, etc.)
           | behind a second factor at the time of operation.
        
             | klabb3 wrote:
             | > a button the user must click
             | 
             | Makes sense. No action until the user clicks something on
             | the page. One extra step but better than having "helpful
             | bots" wreak havoc.
             | 
             | > to store a secret in the browser [...] is doing a browser
             | fingerprint match
             | 
             | I get the idea but I really dislike this. Assuming the user
             | will use the same device or browser is an anti-pattern that
             | causes problems with people especially while crossing the
             | mobile-desktop boundary. Generally any web functionality
             | shouldn't be browser dependent. Especially hidden state
             | like that..
        
         | andix wrote:
         | Is there a feature of DNS I'm unaware of, that queries more
         | than just the domain part? https://example.com?token=<secret>
         | should only lead to a DNS query with "example.com".
        
           | erikerikson wrote:
           | The problem isn't DNS in GP. DNS will happily supply the IP
           | address for a CDN. The HTTP[S] request will thereafter be
           | sent by the caller to the CDN (in the case of CloudFlare,
           | Akamai, etc.) where it will be handled and potentially logged
           | before the result is retrieved from the cache or the
           | configured origin (i.e. backing server).
        
           | r2b2 wrote:
           | Correct, DNS only queries the hostname portion of the URL.
           | 
           |  _Maybe my attempt to be thorough - by making note of DNS
           | along side HTTP since it 's part of the browser - network -
           | server request diagram - was too thorough._
        
         | jmholla wrote:
         | > Ex. When links.com?token=<secret> is visited, that link will
         | be transmitted and potentially saved (search parameters
         | included) by intermediaries like Cloud Flare.
         | 
         | Note: When over HTTPS, the parameter string (and path) is
         | encrypted so the intermediaries in question need to be able to
         | decrypt your traffic to read that secret.
         | 
         | Everything else is right. Just wanted to provide some nuance.
        
           | r2b2 wrote:
           | Good to point out. This distinction is especially important
           | to keep in mind when thinking about when and/or who
           | terminates TLS/SSL for your service, and any relevant threat
           | models the service might have for the portion of the HTTP
           | request _after_ terminattion.
        
           | mschuster91 wrote:
           | Cloudflare, Akamai, AWS Cloudfront are all legitimate
           | intermediaries.
        
         | loginatnine wrote:
         | It's called a fragment FYI!
        
           | shiomiru wrote:
           | However, window.location calls it "hash". (Also, the query
           | string is "search". I wonder why Netscape named them this
           | way...)
        
         | nightpool wrote:
         | The secret is still stored in the browser's history DB in this
         | case, which may be unencrypted (I believe it is for Chrome on
         | Windows last I checked). The cookie DB on the other hand I
         | think is always encrypted using the OS's TPM so it's harder for
         | malicious programs to crack
        
         | phyzome wrote:
         | Huge qualifier: Even otherwise benign Javascript running on
         | that page can pass the fragment anywhere on the internet.
         | Putting stuff in the fragment helps, but it's not perfect. And
         | I don't just mean this in an ideal sense -- I've actually seen
         | private tokens leak from the fragment this way multiple times.
        
       | rpigab wrote:
       | Links that are not part of a fast redirect loop will be copied
       | and pasted to be shared because that's what URLs are for, they're
       | universal, they facilitate access to a resource available on a
       | protocol.
       | 
       | Access control on anything that is not short-lived must be done
       | outside of the url.
       | 
       | When you share links on any channel that is not e2ee, the first
       | agent to access that url is not the person you're sending it to,
       | it is the channel's service, it can be legitimate like Bitwarden
       | looking for favicons to enhance UX, or malicious like FB
       | Messenger crawler that wants to know more about what you are
       | sharing in private messages.
       | 
       | Tools like these scanners won't get better UX, because if you
       | explicitly tell users that the scans are public, some of them
       | will think twice about using the service, and this is bad for
       | business, wether they're using it for free or paying a pro
       | license.
        
       | qudat wrote:
       | Over at pico.sh we are experimenting with an entirely new type of
       | private link by leveraging ssh local forward tunnels:
       | https://pgs.sh/
       | 
       | We are just getting started but so far we are loving the
       | ergonomics.
        
       | zzz999 wrote:
       | You can if you use E2EE and not CAs
        
       | Terr_ wrote:
       | A workaround for this "email-based authentication" problem
       | (without going to a full "make an account with a password" step)
       | is to use temporary one-time codes, so that it doesn't matter if
       | the URL gets accidentally shared.
       | 
       | 1. User visits "private" link (Or even a public link where they
       | re-enter their e-mail.)
       | 
       | 2. Site e-mails user _again_ with time-limited single-use code.
       | 
       | 3. User enters temporary code to confirm ownership of e-mail.
       | 
       | 4. Flow proceeds (e.g. with HTTP cookies/session data) with
       | reasonable certainty that the e-mail account owner is involved.
        
       | andix wrote:
       | A while ago I started to only send password protected links via
       | email. Just with the plaintext password inside the email. This
       | might seem absurd and unsafe on the first glance, but those kind
       | of attacks it can safely prevent. Adding an expiration time is
       | also a good idea, even if it is as long as a few months.
        
       | figers wrote:
       | We have done one time use query string codes at the end of a URL
       | sent to a user email address or as a text message to allow for
       | this...
        
       | kgeist wrote:
       | Tried it with the local alternative to Google Disk. Oh my...
       | Immediately found lots of private data, including photos of
       | credit cars (with security codes), scans of IDs, passports... How
       | do you report a site?
        
       | rvba wrote:
       | Reminds me how some would search for bitcoin wallets via google
       | and kazaa.
       | 
       | On a side note, can someome remind me what was the name of the
       | file, I think I have some tiny fraction of a bicoin on an old
       | computer
        
       | snthd wrote:
       | "private secure links" are indistinguishable from any other link.
       | 
       | With HTTP auth links you know the password is a password, so
       | these tools would know which part to hide from public display:
       | 
       | > https://username:password@example.com/page
        
       ___________________________________________________________________
       (page generated 2024-03-07 23:00 UTC)