hngopher.com

       [HN Gopher] Curl-impersonate: Special build of curl that can imp...
       ___________________________________________________________________
        
       Curl-impersonate: Special build of curl that can impersonate the
       major browsers
        
       Author : mmh0000
       Score  : 499 points
       Date   : 2025-04-03 15:24 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | pvg wrote:
       | Showhn at the time https://news.ycombinator.com/item?id=30378562
        
         | croemer wrote:
         | Back then (2022) it was Firefox only
        
       | jchw wrote:
       | I'm rooting for Ladybird to gain traction in the future.
       | Currently, it is using cURL proper for networking. That is
       | probably going to have some challenges (I think cURL is still
       | limited in some ways, e.g. I don't think it can do WebSockets
       | over h2 yet) but on the other hand, having a rising browser
       | engine might eventually remove this avenue for fingerprinting
       | since legitimate traffic will have the same fingerprint as stock
       | cURL.
        
         | rhdunn wrote:
         | It would be good to see Ladybird's cURL usage improve cURL
         | itself, such as the WebSocket over h2 example you mention. It
         | is also a good test of cURL to see and identify what
         | functionality cURL is missing w.r.t. real-world browser
         | workflows.
        
         | eesmith wrote:
         | I'm hoping this means Ladybird might support ftp URLs.
        
           | navanchauhan wrote:
           | and even the Gopher protocol!
        
         | nonrandomstring wrote:
         | When I spoke to these guys [0] we touched on those quirks and
         | foibles that make a signature (including TCP stack stuff beyond
         | control of any userspace app).
         | 
         | I love this curl, but I worry that if a component takes on the
         | role of deception in order to "keep up" it accumulates a legacy
         | of hard to maintain "compatibility" baggage.
         | 
         | Ideally it should just say... "hey I'm curl, let me in"
         | 
         | The problem of course lies with a server that is picky about
         | dress codes, and that problem in turn is caused by crooks
         | sneaking in disguise, so it's rather a circular chicken and egg
         | thing.
         | 
         | [0] https://cybershow.uk/episodes.php?id=39
        
           | immibis wrote:
           | What should instead happen is that Chrome should stop sending
           | as much of a fingerprint, so that sites won't be able to
           | fingerprint. That won't happen, since it's against Google's
           | interests.
        
             | gruez wrote:
             | This is a fundamental misunderstanding of how TLS
             | fingerprinting works. The "fingerprint" isn't from chrome
             | sending a "fingerprint: [random uuid]" attribute in every
             | TLS negotiation. It's derived from various properties of
             | the TLS stack, like what ciphers it can accept. You can't
             | make "stop sending as much of a fingerprint", without every
             | browser agreeing on the same TLS stack. It's already
             | minimal as it is, because there's basically no aspect of
             | the TLS stack that users can configure, and chrome bundles
             | its own, so you'd expect every chrome user to have the same
             | TLS fingerprint. It's only really useful to distinguish
             | "fake" chrome users (eg. curl with custom header set, or
             | firefox users with user agent spoofer) from "real" chrome
             | users.
        
               | dochtman wrote:
               | Part of the fingerprint is stuff like the ordering of
               | extensions, which Chrome could easily do but AFAIK
               | doesn't.
               | 
               | (AIUI Google's Play Store is one of the biggest TLS
               | fingerprinting culprits.)
        
               | gruez wrote:
               | What's the advantage of randomizing the order, when all
               | chrome users already have the same order? Practically
               | speaking there's a bazillion ways to fingerprint Chrome
               | besides TLS cipher ordering, that it's not worth adding
               | random mitigations like this.
        
               | shiomiru wrote:
               | Chrome has randomized its ClientHello extension order for
               | two years now.[0]
               | 
               | The companies to blame here are solely the ones employing
               | these fingerprinting techniques, and those relying on
               | services of these companies (which is a worryingly large
               | chunk of the web). For example, after the Chrome change,
               | Cloudflare just switched to a fingerprinter that doesn't
               | check the order.[1]
               | 
               | [0]: https://chromestatus.com/feature/5124606246518784
               | 
               | [1]: https://blog.cloudflare.com/ja4-signals/
        
               | nonrandomstring wrote:
               | > blame here are solely the ones employing these
               | fingerprinting techniques,
               | 
               | Sure. And it's a tragedy. But when you look at the bot
               | situation and the sheer magnitude of resource abuse out
               | there, you have to see it from the other side.
               | 
               | FWIW the conversation mentioned above, we acknowledged
               | that and moved on to talk about _behavioural_
               | fingerprinting and why it makes sense not to focus on the
               | browser /agent alone but what gets done with it.
        
               | NavinF wrote:
               | Last time I saw someone complaining about scrapers, they
               | were talking about 100gib/month. That's 300kbps. Less
               | than $1/month in IP transit and ~$0 in compute.
               | Personally I've never noticed bots show up on a resource
               | graph. As long as you don't block them, they won't bother
               | using more than a few IPs and they'll backoff when
               | they're throttled
        
               | marcus0x62 wrote:
               | For some sites, things are a lot worse. See, for example,
               | Jonathan Corbet's report[0].
               | 
               | 0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664
        
               | lmz wrote:
               | How can you say it's $0 in compute without knowing if the
               | data returned required any computation?
        
               | nonrandomstring wrote:
               | Didn't rachelbytheebay post recently that her blog was
               | being swamped? I've heard that from a few self-hosting
               | bloggers now. And Wikipedia has recently said more than
               | half of traffic is noe bots. ARe you claiming this isn't
               | a real problem?
        
               | fc417fc802 wrote:
               | > The companies to blame here are solely the ones
               | employing these fingerprinting techniques,
               | 
               | Let's not go blaming vulnerabilities on those exploiting
               | them. Exploitation is _also_ bad but being exploitable is
               | a problem in and of itself.
        
               | shiomiru wrote:
               | > Let's not go blaming vulnerabilities on those
               | exploiting them. Exploitation is also bad but being
               | exploitable is a problem in and of itself.
               | 
               | There's "vulnerabilities" and there's "inherent
               | properties of a complex protocol that is used to transfer
               | data securely". One of the latter is that metadata may
               | differ from client to client for various reasons, inside
               | the bounds accepted in the standard. If you discriminate
               | based on such metadata, you have effectively invented a
               | new proprietary protocol that certain existing browsers
               | just so happen to implement.
               | 
               | It's like the UA string, but instead of just copying a
               | single HTTP header, new browsers now have to reverse
               | engineer the network stack of existing ones to get an
               | identical user experience.
        
               | fc417fc802 wrote:
               | I get that. I don't condone the behavior of those doing
               | the fingerprinting. But what I'm saying is that the fact
               | that it is possible to fingerprint should in pretty much
               | all cases be viewed as a sort of vulnerability.
               | 
               | It isn't necessarily a critical vulnerability. But it is
               | a problem on _some_ level nonetheless. To the extent
               | possible you should not be leaking information that you
               | did not intend to share.
               | 
               | A protocol that can be fingerprinted is similar to a
               | water pipe with a pinhole leak. It still works, it isn't
               | (necessarily) catastrophic, but it definitely would be
               | better if it wasn't leaking.
        
               | Jubijub wrote:
               | I'm sorry but you comment shows you never had to fight
               | this problem a scale. The challenge is not small time
               | crawlers, the challenge is blocking large / dedicated
               | actors. The problem is simple : if there is more than X
               | volume of traffic per <aggregation criteria >, block it.
               | Problem : most aggregation criteria are trivially
               | spoofable, or very cheap to change : - IP : with IPv6
               | this is not an issue to rotate your IP often - UA :
               | changing this is scraping 101 - SSL fingerprint : easy to
               | use the same as everyone - IP stack fingerprint : also
               | easy to use a common one - request / session tokens :
               | it's cheap to create a new session You can force login,
               | but then you have a spam account creation challenge, with
               | the same issues as above, and depending on your infra
               | this can become heavy
               | 
               | Add to this that the minute you use a signal for
               | detection, you "burn" it as adversaries will avoid using
               | it, and you lose measurement thus the ability to know if
               | you are fixing the problem at all.
               | 
               | I worked on this kind of problem for a FAANG service,
               | whoever claims it's easy clearly never had to deal with
               | motivated adversaries
        
               | RKFADU_UOFCCLEL wrote:
               | What? Just fix the ciphers to a list of what's known to
               | work + some safety margin. Each user needing some
               | different specific cipher (like a cipher for horses, and
               | one for dogs), is not a thing.
        
               | gruez wrote:
               | >Just fix the ciphers to a list of what's known to work +
               | some safety margin.
               | 
               | That's already the case. The trouble is that NSS (what
               | firefox uses) doesn't support the same cipher suites as
               | boringssl (what chrome uses?).
        
           | thaumasiotes wrote:
           | > Ideally it should just say... "hey I'm curl, let me in"
           | 
           | What? Ideally it should just say "GET /path/to/page".
           | 
           | Sending a user agent is a bad idea. That shouldn't be
           | happening at all, from any source.
        
             | Tor3 wrote:
             | Since the first browser appeared I've always meant that
             | sending a user agent id was a really bad idea. It breaks
             | with the fundamental idea of the web protocol, that it's
             | the server's responsibility to provide data and it's the
             | client's responsibility to present it to the user. The
             | server does not need to know anything about the client.
             | Including user agent in this whole thing was a huge mistake
             | as it allowed web site designers to code for specific
             | quirks in browsers. I can to some extent accept a
             | capability list from the client, but I'm not so sure even
             | that is necessary.
        
             | nonrandomstring wrote:
             | Absolutely, yes! A protocol should not be tied to client
             | details. Where did "User Agent" strings even come from?
        
               | darrenf wrote:
               | They're in the HTTP/1.0 spec. https://www.rfc-
               | editor.org/rfc/rfc1945#section-10.15
               | 
               | 10.15 User-Agent                  The User-Agent request-
               | header field contains information about the        user
               | agent originating the request. This is for statistical
               | purposes,        the tracing of protocol violations, and
               | automated recognition of user        agents for the sake
               | of tailoring responses to avoid particular user
               | agent limitations.
        
         | userbinator wrote:
         | _but on the other hand, having a rising browser engine might
         | eventually remove this avenue for fingerprinting_
         | 
         | If what I've seen from CloudFlare et.al. are any indication,
         | it's the exact opposite --- the amount of fingerprinting and
         | "exploitation" of implementation-defined behaviour has
         | _increased_ significantly in the past few months, likely in an
         | attempt to kill off other browser engines; the incumbents do
         | not like competition at all.
         | 
         | The enemy has been trying to spin it as "AI bots DDoSing" but
         | one wonders how much of that was their own doing...
        
           | hansvm wrote:
           | Hold up, one of those things is not like the other. Are we
           | really blaming webmasters for 100x increases in costs from a
           | huge wave of poorly written and maliciously aggressive bots?
        
             | refulgentis wrote:
             | > Are we really blaming...
             | 
             | No, they're discussing increased fingerprinting / browser
             | profiling recently and how it affects low-market-share
             | browsers.
        
               | hansvm wrote:
               | I saw that, but I'm still not sure how this fits in:
               | 
               | > The enemy has been trying to spin it as "AI bots
               | DDoSing" but one wonders how much of that was their own
               | doing...
               | 
               | I'm reading that as `enemy == fingerprinters`, `that ==
               | AI bots DDoSing`, and `their own == webmasters, hosting
               | providers, and CDNs (i.e., the fingerprinters)`, which
               | sounds pretty straightforwardly like the fingerprinters
               | are responsible for the DDoSing they're receiving.
               | 
               | That interpretation doesn't seem to match the rest of the
               | post though. Do you happen to have a better one?
        
               | userbinator wrote:
               | "their own" = CloudFlare and/or those who have vested
               | interests in closing up the Internet.
        
             | jillyboel wrote:
             | Your costs only went up 100x if you built your site poorly
        
               | hansvm wrote:
               | I'll bite. How do you serve 100x the traffic without 100x
               | the costs? It costs something like 1e-10 dollars to serve
               | a recipe page with a few photos, for example. If you
               | serve it 100x more times, how does that not scale up?
        
               | jillyboel wrote:
               | It might scale up but if you're anywhere near efficient
               | you're way overprovisioned to begin with. The compute
               | cost should be miniscule due to caching and bandwidth is
               | cheap if you're not with one of the big clouds. As an
               | example, according to dang HN runs on a single server and
               | yet many websites that get posted _to_ HN, and thus
               | receive a fraction of the traffic, go down due to the
               | load.
        
           | SoftTalker wrote:
           | It's entirely deliberate. CloudFlare could certainly
           | distinguish low-volume but legit web browsers from bots, as
           | much as they can distinguish chrome/edge/safari/firefox from
           | bots. That is if they cared to.
        
           | cyanydeez wrote:
           | I dont think they're doing this to kill off browser engines;
           | they're trying to sift browsers into "user" and "AI slop", so
           | they can prioritize users.
           | 
           | This is entirely web crawler 2.0 apocolypse.
        
             | nicman23 wrote:
             | man i just want a bot to buy groceries for me
        
               | baq wrote:
               | That's one of the few reasons to leave the house. I'd
               | like dishes and laundry bots first, please.
        
               | dodslaser wrote:
               | You mean dishwashers and washing machines?
        
               | baq wrote:
               | Yes, but no. I want a robot to load and unload those.
        
               | dec0dedab0de wrote:
               | I have been paying my local laundromat to do mu laundry
               | for over a decade now, it's probably cheaper than youre
               | imagining and sooo worth it.
        
               | baq wrote:
               | my household is 6 people, it isn't uncommon to run 3
               | washing machine loads in a day and days without at least
               | one are rare. I can imagine the convenience, but at this
               | scale it sounds a bit unreasonable.
               | 
               | dishwasher runs at least once a day, at least 80% full,
               | every day, unless we're traveling.
        
             | extraduder_ire wrote:
             | I think "slop" only refers to the output of generative AI
             | systems. bot, crawler, scraper, or spider would be a more
             | apt term for software making (excessive) requests to
             | collect data.
        
         | johnisgood wrote:
         | I used to call it "cURL", but apparently officially it is curl,
         | correct?
        
           | cruffle_duffle wrote:
           | As in "See-URL"? I've always called it curl but "see url"
           | makes a hell of a lot of sense too! I've just never
           | considered it and it's one of those things you rarely say out
           | loud.
        
             | johnisgood wrote:
             | I prefer cURL as well, but according to official sources it
             | is curl. :D Not sure how it is pronounced though, I
             | pronounce it as "see-url" and/or "see-U-R-L". It might be
             | pronounced as "curl" though.
        
           | bdhcuidbebe wrote:
           | I'd guess Daniel pronounce it as "kurl", with a hard C like
           | in "crust", since hes swedish.
        
         | devwastaken wrote:
         | ladybird does not have the resources to be a contender to
         | current browsers. its well marketed but has no benefits or
         | reason to exist over chromium. its also a major security risk
         | as it is designed yet again in demonstrably unsafe c++.
        
       | ec109685 wrote:
       | There are API's that chrome provides that allows servers to
       | validate whether the request came from an official chrome
       | browser. That would detect that this curl isn't really chrome.
       | 
       | It'd be nice if something could support curl's arguments but
       | drive an actual headless chrome browser.
        
         | binarymax wrote:
         | I'm interested in learning more about this. Are these APIs
         | documented anywhere and are there server side implementation
         | examples that you know of?
         | 
         | EDIT: this is the closest I could find.
         | https://developers.google.com/chrome/verified-access/overvie...
         | ...but it's not generic enough to lead me to the declaration
         | you made.
        
           | KTibow wrote:
           | I think they confused Chrome and Googlebot.
        
         | bowmessage wrote:
         | There's no way this couldn't be replicated by a special build
         | of curl.
        
         | darrenf wrote:
         | Are you referring to the Web Environment Integrity[0] stuff, or
         | something else? 'cos WEI was abandoned in late 2023.
         | 
         | [0] https://github.com/explainers-by-googlers/Web-Environment-
         | In...
        
         | do_not_redeem wrote:
         | Siblings are being more charitable about this, but I just don't
         | think what you're suggesting is even possible.
         | 
         | An HTTP client sends a request. The server sends a response.
         | The request and response are made of bytes. Any bytes Chrome
         | can send, curl-impersonate could also send.
         | 
         | Chromium is open source. If there was some super secret
         | handshake, anyone could copy that code to curl-impersonate. And
         | if it's only in closed-source Chrome, someone will disassemble
         | it and copy it over anyway.
        
           | gruez wrote:
           | >Chromium is open source. If there was some super secret
           | handshake, anyone could copy that code to curl-impersonate.
           | And if it's only in closed-source Chrome, someone will
           | disassemble it and copy it over anyway.
           | 
           | Not if the "super secret handshake" is based on hardware-
           | backed attestation.
        
             | do_not_redeem wrote:
             | True, but beside the point.
             | 
             | GP claims the API can detect the official chrome browser,
             | and the official chrome browser runs fine without
             | attestation.
        
           | dist-epoch wrote:
           | > someone will disassemble it and copy it over anyway.
           | 
           | Not if Chrome uses homomorphic encryption to sign a
           | challange. It's doable today. But then you could run a real
           | Chrome and forward the request to it.
        
             | do_not_redeem wrote:
             | No, even homomorphic encryption wouldn't help.
             | 
             | It doesn't matter how complicated the operation is, if you
             | have a copy of the Chrome binary, you can observe what CPU
             | instructions it uses to sign the challenge, and replicate
             | the operations yourself. Proxying to a real Chrome is the
             | most blunt approach, but there's nothing stopping you from
             | disassembling the binary and copying the code to run in
             | your own process, independent of Chrome.
        
               | dist-epoch wrote:
               | > you can observe what CPU instructions it uses to sign
               | the challenge, and replicate the operations yourself.
               | 
               | No you can't, that's the whole thing with homomorphic
               | encryption. Ask GPT to explain it to you why it's so.
               | 
               | You have no way of knowing the bounds of the code I will
               | access from the inside the homomorphic code. Depending on
               | the challenge I can query parts of the binary and hash
               | that in the response. So you will need to replicate the
               | whole binary.
               | 
               | Similar techniques are already used today by various
               | copy-protection/anti-cheat game protectors. Most of them
               | remain unbroken.
        
               | fc417fc802 wrote:
               | I don't believe this is correct. Homomorphic encryption
               | enables computation on encrypted data without needing to
               | decrypt it.
               | 
               | You can't use the result of that computation without
               | first decrypting it though. And you can't decrypt it
               | without the key. So what you describe regarding memory
               | addresses is merely garden variety obfuscation.
               | 
               | Unmasking an obfuscated set of allowable address ranges
               | for hashing given an arbitrary binary is certainly a
               | difficult problem. However as you point out it is easily
               | sidestepped.
               | 
               | You are also mistaken about anti-cheat measures. The ones
               | that pose the most difficulty primarily rely on kernel
               | mode drivers. Even then, without hardware attestation
               | it's "just" an obfuscation effort that raises the bar to
               | make breaking it more time consuming.
               | 
               | What you're actually witnessing there is that if a
               | sufficient amount of effort is invested in obfuscation
               | and those efforts carried out continuously in order to
               | regularly change the obfuscation then you can outstrip
               | the ability of the other party to keep up with you.
        
               | do_not_redeem wrote:
               | You're just describing ordinary challenge-response. That
               | has nothing to do with homomorphic encryption and there
               | are plenty of examples from before homomorphic encryption
               | became viable, for example https://www.geoffchappell.com/
               | notes/security/aim/index.htm
               | 
               | Homomorphic encryption hides data, not computation. If
               | you've been trying to learn compsci from GPT, you might
               | have fallen victim to hallucinations. I'd recommend
               | starting from wikipedia instead.
               | https://en.wikipedia.org/wiki/Homomorphic_encryption
               | 
               | And btw most games are cracked within a week of release.
               | You have way too much faith in buzzwords and way too
               | little faith in bored Eastern European teenagers.
        
               | dist-epoch wrote:
               | > Homomorphic encryption hides data, not computation
               | 
               | Data is computation.                 x = challenge_byte ^
               | secret_key       if x > 64:         y =
               | hash_memory_range()       else:         y =
               | something_else()       return sign(y, secret_key)
        
               | do_not_redeem wrote:
               | That snippet has nothing to do with homomorphic
               | encryption. It's just the same kind of challenge-response
               | AIM and many others were already doing in the 90s.
               | 
               | You seem convinced that homomorphic encryption is some
               | kind of magic that prevents someone from observing their
               | own hardware, or from running Chrome under a debugger.
               | That's just not true. And I suspect we don't share enough
               | of a common vocabulary to have a productive discussion,
               | so I'll end it here.
        
       | anon6362 wrote:
       | Set a UA and any headers and/or cookies with regular cURL
       | compiled with HTTP/3. This can be done with wrapper scripts very
       | easily. 99.999% of problems solved with no special magic buried
       | in an unclean fork.
        
         | andrewmcwatters wrote:
         | That's not the point of this fork.
         | 
         | And "unclean fork" is such an unnecessary and unprofessional
         | comment.
         | 
         | There's an entire industry of stealth browser technologies out
         | there that this falls under.
        
         | psanford wrote:
         | That doesn't solve the problem of TLS handshake fingerprinting,
         | which is the whole point of this project.
        
         | mmh0000 wrote:
         | You should really read the "Why" section of the README before
         | jumping to conclusions:
         | 
         | ``` some web services use the TLS and HTTP handshakes to
         | fingerprint which client is accessing them, and then present
         | different content for different clients. These methods are
         | known as TLS fingerprinting and HTTP/2 fingerprinting
         | respectively. Their widespread use has led to the web becoming
         | less open, less private and much more restrictive towards
         | specific web clients
         | 
         | With the modified curl in this repository, the TLS and HTTP
         | handshakes look exactly like those of a real browser. ```
         | 
         | For example, this will get you past Cloudflare's bot detection.
        
         | 01HNNWZ0MV43FF wrote:
         | The README indicates that this fork is compiled with nss (from
         | Firefox) and BoringSSL (from Chromium) to resist fingerprinting
         | based on the TLS lib. CLI flags won't do that.
        
       | bossyTeacher wrote:
       | Cool tool but it shouldn't matter whether the client is a browser
       | or not. I feel sad that we need such a tool in the real world
        
         | brutal_chaos_ wrote:
         | You may enter our site iff you use software we approve.
         | Anything else will be seen as malicious. Papers please!
         | 
         | I, too, am saddened by this gatekeeping. IIUC custom browsers
         | (or user-agent) from scratch will never work on cloudflare
         | sites and the like until the UA has enough clout (money, users,
         | etc) to sway them.
        
           | DrillShopper wrote:
           | This was sadly always going to be the outcome of the Internet
           | going commercial.
           | 
           | There's too much lost revenue in open things for companies to
           | embrace fully open technology anymore.
        
             | jrockway wrote:
             | It's kind of the opposite problem as well; huge well-funded
             | companies bringing down open source project websites. See
             | Xe's journey here: https://xeiaso.net/blog/2025/anubis/
             | 
             | One may posit "maybe these projects should cache stuff so
             | page loads aren't actually expensive" but these things are
             | best-effort and not the core focus of these projects. You
             | install some Git forge or Trac or something and it's Good
             | Enough for your contributors to get work done. But you have
             | to block the LLM bots because they ignore robots.txt and
             | naively ask for the same expensive-to-render page over and
             | over again.
             | 
             | The commercial impact is also not to be understated. I
             | remember when I worked for a startup with a cloud service.
             | It got talked about here, and suddenly every free-for-open-
             | source CI provider IP range was signing up for free trials
             | in a tight loop. These mechanical users had to be blocked.
             | It made me sad, but we wanted people to use our product,
             | not mine crypto ;)
        
               | burnished wrote:
               | >> Otherwise your users have to see a happy anime girl
               | every time they solve a challenge. This is a feature.
               | 
               | I love that human, what a gem
        
             | everfrustrated wrote:
             | Wait until you hear many antivirus/endpoint software block
             | "recent" domain names from being loaded. According to them
             | new domains are only used by evil people and should be
             | blocked.
        
         | jimt1234 wrote:
         | About six months ago I went to a government auction site that
         | _required_ Internet Explorer. Yes, Internet Explorer. The site
         | was active, too; the auction data was up-to-date. I added a
         | user-agent extension in Chrome, switched to IE, retried and it
         | worked; all functionality on the site was fine. So yeah, I was
         | both sad and annoyed. My guess is this government office paid
         | for a website 25 years ago and it hasn 't been updated since.
        
           | IMSAI8080 wrote:
           | Yeah it's probably an ancient web site. This was commonplace
           | back in the day when Internet Explorer had 90%+ market share.
           | Lazy web devs couldn't be bothered to support other browsers
           | (or didn't know how) so just added a message demanding you
           | use IE as opposed to fixing the problems with the site.
        
           | jorvi wrote:
           | In South Korea, ActiveX is still required for many things
           | like banking and government stuff. So they're stuck with both
           | IE and the gaping security hole in it that is ActiveX.
        
             | asddubs wrote:
             | is this still true? I know this was the case in the past,
             | but even in 2025?
        
               | kijin wrote:
               | Not really. You can access any Korean bank or government
               | website using Chrome, and they actually recommend Chrome
               | these days.
               | 
               | They still want to install a bunch of programs on your
               | computer, though. It's more or less the same stuff that
               | used to be written as ActiveX extensions, but rewritten
               | using modern browser APIs. :(
        
       | VladVladikoff wrote:
       | Wait a sec... if the TLS handshakes look different, would it be
       | possible to have an nginx level filter for traffic that claims to
       | be a web browser (eg chrome user agent), yet really is a
       | python/php script? Because this would account for the vast
       | majority of malicious bot traffic, and I would love to just block
       | it.
        
         | gruez wrote:
         | That's basically what security vendors like cloudflare does,
         | except with even more fingerprinting, like a javascript
         | challenge that checks the js interpreter/DOM.
        
           | walrus01 wrote:
           | JS to check user agent things like screen window dimensions
           | as well, which legit browsers will have and bots will also
           | present but with a more uniform and predictable set of x and
           | y dimensions per set of source IPs. Lots of possibilities for
           | js endpoint fingerprinting.
        
             | Fripplebubby wrote:
             | I also present a uniform and predictable set of x and y
             | dimensions per source IPs as a human user who maximizes my
             | browser window
        
               | gruez wrote:
               | Maximizing reduces the variations, but there's still
               | quite a bit of variation because of different display
               | resolution + scaling settings + OS configuration (eg.
               | short or tall taskbars).
        
               | walrus01 wrote:
               | Or settings like auto-hide MacOS dock vs not auto hide,
               | affecting the vertical size of the browser window.
        
         | aaron42net wrote:
         | Cloudflare uses JA3 and now JA4 TLS fingerprints, which are
         | hashes of various TLS handshake parameters.
         | https://github.com/FoxIO-LLC/ja4/blob/main/technical_details...
         | has more details on how that works, and they do offer an Nginx
         | module: https://github.com/FoxIO-LLC/ja4-nginx-module
        
         | immibis wrote:
         | Yes, and sites are doing this and it absolutely sucks because
         | it's not reliable and blocks everyone who isn't using the
         | latest Chrome on the latest Windows. Please don't whitelist TLS
         | fingerprints unless you're actually under attack right now.
        
           | fc417fc802 wrote:
           | If you're going to whitelist (or block at all really) please
           | simply redirect all rejected connections to a proof of work
           | scheme. At least that way things continue to work with only
           | mild inconvenience.
        
             | jrochkind1 wrote:
             | I am _very_ curious if the current wave of mystery
             | distributed (AI?) bots will just run javascript and be able
             | to get past proof of work too....
             | 
             | Based on the fact that they are requesting the same
             | _absolutely useless and duplicative_ pages (like every
             | possible combniation of query params even if it does not
             | lead to unique content) from me _hundreds of times per url_
             | , and are able to distribute so much that I'm only getting
             | 1-5 requests per day from each IP...
             | 
             | ...cost does not seem to be a concern for them? Maybe they
             | won't actually mind ~5 seconds of CPU on a proof of work
             | either? They are really a mystery to me.
             | 
             | I currently am using CloudFlare Turnstile, which
             | incorporates proof of work but also various other signals,
             | which is working, but I know does have false positives. I
             | am working on implementing a simpler nothing but JS proof
             | of work (SHA-512-based), and am going to switch that in and
             | if it works great (becuase I don't want to keep out the
             | false positives!), but if it doesn't, back to Turnstile.
             | 
             | The mystery distributred idiot bots were too much. (Scaling
             | up resources -- they just scaled up their bot rates too!!!)
             | I don't mind people scraping if they do it respectfully and
             | reasonably; taht's not what's been going on, and it's an
             | internet-wide phenomenon of the past year.
        
           | RKFADU_UOFCCLEL wrote:
           | Blocking a hacking attack is not even a thing, they just
           | change IP address each time they learn a new fact about how
           | your system works and progress smoothly without interruption
           | until they exfiltrate your data. Same goes for scrapers the
           | only difference being there is no vulnerability to fix that
           | will stop them.
        
         | jrochkind1 wrote:
         | Well, I think that's what OP is meant to avoid you doing,
         | exactly.
        
       | ryao wrote:
       | Did they also set IP_TTL to set the TTL value to match the
       | platform being impersonated?
       | 
       | If not, then fingerprinting could still be done to some extent at
       | the IP layer. If the TTL value in the IP layer is below 64, it is
       | obvious this is either not running on modern Windows or is
       | running on a modern Windows machine that has had its default TTL
       | changed, since by default the TTL of packets on modern Windows
       | starts at 128 while most other platforms start it at 64. Since
       | the other platforms do not have issues communicating over the
       | internet, so IP packets from modern Windows will always be seen
       | by the remote end with TTLs at or above 64 (likely just above).
       | 
       | That said, it would be difficult to fingerprint at the IP layer,
       | although it is not impossible.
        
         | xrisk wrote:
         | Wouldn't the TTL value of received packets depend on network
         | conditions? Can you recover the client's value from the server?
        
           | ralferoo wrote:
           | The argument is that if the many (maybe the majority) of
           | systems are sending packets with a TTL of 64 and they don't
           | experience problems on the internet, then it stands to reason
           | that almost everywhere on the internet is reachable in less
           | than 64 hops (personally, I'd be amazed if it any routes are
           | actually as high as 32 hops).
           | 
           | If everywhere is reachable in under 64 hops, then packets
           | sent from systems that use a TTL of 128 will arrive at the
           | destination with a TTL still over 64 (or else they'd have
           | been discarded for all the other systems already).
        
             | ryao wrote:
             | Windows 9x used a TTL of 32. I vaguely recall hearing that
             | it caused problems in extremely exotic cases, but that
             | could have been misinformation. I imagine that >99.999% of
             | the time, 32 is enough. This makes fingerprinting via TTL
             | to distinguish between those who set it at 32, 64, 128 and
             | 255 (OpenSolaris and derivatives) viable. That said, almost
             | nobody uses Windows 9x or OpenSolaris derivatives on the
             | internet these days, so I used values from systems that
             | they do use for my argument that fingerprinting via TTL is
             | possible.
        
         | gruez wrote:
         | >That said, it would be difficult to fingerprint at the IP
         | layer, although it is not impossible.
         | 
         | Only if you're using PaaS/IaaS providers don't give you low
         | level access to the TCP/IP stack. If you're running your own
         | servers it's trivial to fingerprint all manner of TCP/IP
         | properties.
         | 
         | https://en.wikipedia.org/wiki/TCP/IP_stack_fingerprinting
        
           | ryao wrote:
           | I meant it is difficult relative to fingerprinting TLS and
           | HTTP. The information is not exported by the berkeley socket
           | API unless you use raw sockets and implement your own
           | userland TCP stack.
        
             | sneak wrote:
             | Couldn't you just monitor the inbound traffic and associate
             | the packets to the connections? Doing your own TCP seems
             | silly.
        
               | gruez wrote:
               | Yeah, some sort of packet mirroring setup (eg. in
               | iptables or at the switch level) + packet capture tool
               | should be enough. Then you just need to join the data
               | from the packet capture program/machine with your load
               | balancer, using src ip + port + time.
        
         | fc417fc802 wrote:
         | What is the reasoning behind TTL counting down instead of up,
         | anyway? Wouldn't we generally expect those routing the traffic
         | to determine if and how to do so?
        
           | sadjad wrote:
           | The primary purpose of TTL is to prevent packets from looping
           | endlessly during routing. If a packet gets stuck in a loop,
           | its TTL will eventually reach zero, and then it will be
           | dropped.
        
             | fc417fc802 wrote:
             | That doesn't answer my question. If it counted up then it
             | would be up to each hop to set its own policy. Things
             | wouldn't loop endlessly in that scenario either.
        
               | burnished wrote:
               | This is a wild guess but: I am under the impression that
               | the early internet was built somewhat naively so I guess
               | that the sender sets it because they know best how long
               | it stays relevant for/when it makes sense to restart or
               | fail rather than wait.
        
               | knome wrote:
               | It does make traceroute, where each packet is fired with
               | one more available step than the last, feasible, whereas
               | 'up' wouldn't. Of course, then we'd just start with max-
               | hops and walk the number down I suppose. I still expect
               | it would be inconvenient during debugging for various
               | devices to have various ceilings.
        
               | ryao wrote:
               | Then random internet routers could break internet traffic
               | by setting it really low and the user could not do a
               | thing about it. They technically still can by discarding
               | all traffic whose value is less than some value, but they
               | don't. The idea that they should set their own policy
               | could fundamentally break network traffic flows if it
               | ever became practiced.
        
           | ryao wrote:
           | If your doctor says you have only 128 days to live, you count
           | down, not up. TTL is time to live, which is the same thing.
        
           | therealcamino wrote:
           | To allow the sender to set the TTL, right? Without adding
           | another field to the packet header.
           | 
           | If you count up from zero, then you'd also have to include in
           | every packet how high it can go, so that a router has enough
           | info to decide if the packet is still live. Otherwise every
           | connection in the network would have to share the same fixed
           | TTL, or obey the TTL set in whatever random routers it goes
           | through. If you count down, you're always checking against
           | zero.
        
       | jruohonen wrote:
       | The notion of real-world TLS/HTTP fingerprinting was somewhat new
       | to me, and it looks interesting in theory, but I wonder what the
       | build's use case really is? I mean you have the heavy-handed
       | JavaScript running everywhere now.
        
       | jamal-kumar wrote:
       | This tool is pretty sweet in little bash scripts combo'd up with
       | gnu parallel on red team engagements for mapping https endpoints
       | within whatever scoped address ranges that will only respond to
       | either proper browsers due to whatever, or with the SNI stuff in
       | order. Been finding it super sweet for that. Can do all the
       | normal curl switches like -H for header spoofing
        
       | davidsojevic wrote:
       | There's a fork of this that has some great improvements over to
       | the top of the original and it is also actively maintained:
       | https://github.com/lexiforest/curl-impersonate
       | 
       | There's also Python bindings for the fork for anyone who uses
       | Python: https://github.com/lexiforest/curl_cffi
        
         | nyanpasu64 wrote:
         | I suppose it does make sense that a "make curl look like a
         | browser" program would get sponsored by "bypass bot detection"
         | services...
        
           | ImHereToVote wrote:
           | Easy. Just make a small fragment shader to produce a token in
           | your client. No bot is going to waste GPU resources to
           | compile your shader.
        
             | kelsey978126 wrote:
             | Why do people even think this? Bots almost always just use
             | headful instrumented browsers now. if a human sitting at a
             | keyboard can load the content, so can a bot.
        
               | simpaticoder wrote:
               | Security measures never prevent all abuse. They raise the
               | cost of abuse above an acceptable threshold. Many things
               | work like this. Cleaning doesn't eliminate dirt, it
               | dilutes the dirt below an acceptable threshold. Same for
               | "repairing" and "defects", and some other pairs of things
               | that escape me atm.
        
               | abofh wrote:
               | That's the same argument as CAPTCHA's - as far as I know
               | there are no bots protesting them making their lives
               | harder, but as a human - my life is much harder than it
               | needs to be because things need me to prove I'm a human.
               | 
               | Clean for data ingestion usually means complicated for
               | data creation - optimizing for the advertisers has
               | material cash value downstream, but customers are
               | upstream, and making it harder is material too.
        
               | ImHereToVote wrote:
               | What is so hard about running a fragment shader after the
               | site has loaded?
        
               | abofh wrote:
               | I have to assume /s, but lacking that -- Why can't you
               | just allow `curl`? You need a human for advertising
               | dollars or a poor mechanism of rate limiting. I want to
               | use your service. If you're buying me a fragment shader,
               | I guess that's fine, but I'm feeding it to the dogs, not
               | plugging in your rando hardware in to my web-browser.
        
               | ImHereToVote wrote:
               | We are talking about Curl bots here. How is what you are
               | saying relevant?
        
               | cAtte_ wrote:
               | no, nyanpasu64's comment extended the discussion to
               | general bot detection
        
             | zffr wrote:
             | Can't a bot just collect a few real tokens and then send
             | those instead of trying to run the shader?
        
               | ImHereToVote wrote:
               | How do you automate that? Just generate a new token for
               | each day.
        
             | gruez wrote:
             | Can't they use a software renderer like swiftshader? You
             | don't need to pass in an actual gpu through virtio or
             | whatever.
        
               | ImHereToVote wrote:
               | Maybe you can call a WebGL extension that isn't
               | supported. Or better yet have a couple of overdraws of
               | quads. Their bot will handle it, but it will throttle
               | their CPU like gangbusters.
        
               | gruez wrote:
               | Sounds like a PoW system with extra steps?
        
         | illegally wrote:
         | There's also a module for fully integrating this with the
         | Python requests library: https://github.com/el1s7/curl-adapter
        
         | RKFADU_UOFCCLEL wrote:
         | All these "advanced" technologies that change faster than I can
         | turn my neck, to make a simple request that looks like it was
         | one of the "certified" big 3 web browsers, which will
         | ironically tax the server less than a certified browser. Is
         | this the nightmare dystopia I was warned about in the 90's? I
         | wonder if anyone here can name the one company that is
         | responsible for this despite positioning themselves as a good
         | guy open source / hacker community contributor.
        
       | userbinator wrote:
       | I'm always ambivalent about things like this showing up here. On
       | one hand, it's good to let others know that there is still that
       | bit of rebelliousness and independence alive amongst the
       | population. On the other hand, much like other "freedom is
       | insecurity" projects, attracting unwanted attention may make it
       | worse for those who rely on them.
       | 
       | Writing a browser is hard, and the incumbents are continually
       | making it harder.
        
         | jolmg wrote:
         | Your comment makes it sound like a browser being
         | fingerprintable is a desired property by browser developers.
         | It's just something that happens on its own from different
         | people doing things differently. I don't see this as being
         | about rebelliousness. Software being fingerprintable erodes
         | privacy and software diversity.
        
           | gkbrk wrote:
           | Not all browsers, but Chrome certainly desires to be
           | fingerprintable. They even try to cryptographically prove
           | that the current browser is an unmodified Chrome with Web
           | Environment Integrity [1].
           | 
           | Doesn't get more fingerprintable than that. They provide an
           | un-falsifiable certificate that "the current browser is an
           | unmodified Chrome build, running on an unmodified Android
           | phone with secure boot".
           | 
           | If they didn't want to fingerprintable, they could just not
           | do that and spend all the engineering time and money on
           | something else.
           | 
           | [1]: https://en.wikipedia.org/wiki/Web_Environment_Integrity
        
       | matt-p wrote:
       | I do kind of yern for the simpler days when if a website didn't
       | mind bots it allowed it and if they did they blocked your user
       | agent.
        
         | andrethegiant wrote:
         | Back then websites weren't so resource intensive. The negative
         | backlash towards bots is kind of a side effect of how demanding
         | expectations of web experiences has become.
        
       | yard2010 wrote:
       | Now I'm waiting for the MCP version of this.. :)
        
         | andrethegiant wrote:
         | https://github.com/puremd/puremd-mcp handles this, probably
         | some other MCP servers out there that handle this too
        
       | doctor_radium wrote:
       | Kudos to the coder and the poster. I'm involved in a browser
       | project that runs on OpenSSL, and figured I'd have to dig through
       | WireShark myself at some point to figure this stuff out. Well, I
       | may still need to, but now have many points of reference. If the
       | most common use of OpenSSL is Python, then in the age of
       | Cloudflare, a Firefox TLS spoofing option isn't just a good idea,
       | it's a necessity.
        
       | INTPenis wrote:
       | Only three patches and shell wrappers, this should get Daniel
       | coding. Imho this should definitely be in mainline curl.
        
       | lcfcjs6 wrote:
       | ive been using puppeteer to query and read responses from
       | deepseek.com, it works really well but i have to use a stealth
       | mode and "headed" version to make it think its a person
        
       | GNOMES wrote:
       | I had to do something like this with Ansible's get_url module
       | once.
       | 
       | Was having issues getting module to download an installer from a
       | vendors site.
       | 
       | Played with Curl/WGET, but was running into the same, while it
       | worked from a browser.
       | 
       | I ended up getting both Curl + get_url to work by passing the
       | same headers my browser sent such as User-Agent, encoding, etc
        
       | ck2 wrote:
       | Good luck getting past imperva
       | 
       | If you thought cloudflare challenge can be bad, imperva doesn't
       | even want most humans through
        
       | 1vuio0pswjnm7 wrote:
       | "For these reasons, some web services use the TLS and HTTP
       | handshakes to fingerprint which client is accessing them, and
       | then present different content for different clients."
       | 
       | Examples: [missing]
        
       ___________________________________________________________________
       (page generated 2025-04-04 23:01 UTC)