[HN Gopher] Curl-impersonate: Special build of curl that can imp...
       ___________________________________________________________________
        
       Curl-impersonate: Special build of curl that can impersonate the
       major browsers
        
       Author : mmh0000
       Score  : 226 points
       Date   : 2025-04-03 15:24 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | pvg wrote:
       | Showhn at the time https://news.ycombinator.com/item?id=30378562
        
         | croemer wrote:
         | Back then (2022) it was Firefox only
        
       | jchw wrote:
       | I'm rooting for Ladybird to gain traction in the future.
       | Currently, it is using cURL proper for networking. That is
       | probably going to have some challenges (I think cURL is still
       | limited in some ways, e.g. I don't think it can do WebSockets
       | over h2 yet) but on the other hand, having a rising browser
       | engine might eventually remove this avenue for fingerprinting
       | since legitimate traffic will have the same fingerprint as stock
       | cURL.
        
         | rhdunn wrote:
         | It would be good to see Ladybird's cURL usage improve cURL
         | itself, such as the WebSocket over h2 example you mention. It
         | is also a good test of cURL to see and identify what
         | functionality cURL is missing w.r.t. real-world browser
         | workflows.
        
         | eesmith wrote:
         | I'm hoping this means Ladybird might support ftp URLs.
        
           | navanchauhan wrote:
           | and even the Gopher protocol!
        
         | nonrandomstring wrote:
         | When I spoke to these guys [0] we touched on those quirks and
         | foibles that make a signature (including TCP stack stuff beyond
         | control of any userspace app).
         | 
         | I love this curl, but I worry that if a component takes on the
         | role of deception in order to "keep up" it accumulates a legacy
         | of hard to maintain "compatibility" baggage.
         | 
         | Ideally it should just say... "hey I'm curl, let me in"
         | 
         | The problem of course lies with a server that is picky about
         | dress codes, and that problem in turn is caused by crooks
         | sneaking in disguise, so it's rather a circular chicken and egg
         | thing.
         | 
         | [0] https://cybershow.uk/episodes.php?id=39
        
           | immibis wrote:
           | What should instead happen is that Chrome should stop sending
           | as much of a fingerprint, so that sites won't be able to
           | fingerprint. That won't happen, since it's against Google's
           | interests.
        
             | gruez wrote:
             | This is a fundamental misunderstanding of how TLS
             | fingerprinting works. The "fingerprint" isn't from chrome
             | sending a "fingerprint: [random uuid]" attribute in every
             | TLS negotiation. It's derived from various properties of
             | the TLS stack, like what ciphers it can accept. You can't
             | make "stop sending as much of a fingerprint", without every
             | browser agreeing on the same TLS stack. It's already
             | minimal as it is, because there's basically no aspect of
             | the TLS stack that users can configure, and chrome bundles
             | its own, so you'd expect every chrome user to have the same
             | TLS fingerprint. It's only really useful to distinguish
             | "fake" chrome users (eg. curl with custom header set, or
             | firefox users with user agent spoofer) from "real" chrome
             | users.
        
               | dochtman wrote:
               | Part of the fingerprint is stuff like the ordering of
               | extensions, which Chrome could easily do but AFAIK
               | doesn't.
               | 
               | (AIUI Google's Play Store is one of the biggest TLS
               | fingerprinting culprits.)
        
               | gruez wrote:
               | What's the advantage of randomizing the order, when all
               | chrome users already have the same order? Practically
               | speaking there's a bazillion ways to fingerprint Chrome
               | besides TLS cipher ordering, that it's not worth adding
               | random mitigations like this.
        
               | shiomiru wrote:
               | Chrome has randomized its ClientHello extension order for
               | two years now.[0]
               | 
               | The companies to blame here are solely the ones employing
               | these fingerprinting techniques, and those relying on
               | services of these companies (which is a worryingly large
               | chunk of the web). For example, after the Chrome change,
               | Cloudflare just switched to a fingerprinter that doesn't
               | check the order.[1]
               | 
               | [0]: https://chromestatus.com/feature/5124606246518784
               | 
               | [1]: https://blog.cloudflare.com/ja4-signals/
        
               | nonrandomstring wrote:
               | > blame here are solely the ones employing these
               | fingerprinting techniques,
               | 
               | Sure. And it's a tragedy. But when you look at the bot
               | situation and the sheer magnitude of resource abuse out
               | there, you have to see it from the other side.
               | 
               | FWIW the conversation mentioned above, we acknowledged
               | that and moved on to talk about _behavioural_
               | fingerprinting and why it makes sense not to focus on the
               | browser /agent alone but what gets done with it.
        
               | NavinF wrote:
               | Last time I saw someone complaining about scrapers, they
               | were talking about 100gib/month. That's 300kbps. Less
               | than $1/month in IP transit and ~$0 in compute.
               | Personally I've never noticed bots show up on a resource
               | graph. As long as you don't block them, they won't bother
               | using more than a few IPs and they'll backoff when
               | they're throttled
        
               | marcus0x62 wrote:
               | For some sites, things are a lot worse. See, for example,
               | Jonathan Corbet's report[0].
               | 
               | 0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664
        
               | fc417fc802 wrote:
               | > The companies to blame here are solely the ones
               | employing these fingerprinting techniques,
               | 
               | Let's not go blaming vulnerabilities on those exploiting
               | them. Exploitation is _also_ bad but being exploitable is
               | a problem in and of itself.
        
           | thaumasiotes wrote:
           | > Ideally it should just say... "hey I'm curl, let me in"
           | 
           | What? Ideally it should just say "GET /path/to/page".
           | 
           | Sending a user agent is a bad idea. That shouldn't be
           | happening at all, from any source.
        
       | ec109685 wrote:
       | There are API's that chrome provides that allows servers to
       | validate whether the request came from an official chrome
       | browser. That would detect that this curl isn't really chrome.
       | 
       | It'd be nice if something could support curl's arguments but
       | drive an actual headless chrome browser.
        
         | binarymax wrote:
         | I'm interested in learning more about this. Are these APIs
         | documented anywhere and are there server side implementation
         | examples that you know of?
         | 
         | EDIT: this is the closest I could find.
         | https://developers.google.com/chrome/verified-access/overvie...
         | ...but it's not generic enough to lead me to the declaration
         | you made.
        
           | KTibow wrote:
           | I think they confused Chrome and Googlebot.
        
         | bowmessage wrote:
         | There's no way this couldn't be replicated by a special build
         | of curl.
        
         | darrenf wrote:
         | Are you referring to the Web Environment Integrity[0] stuff, or
         | something else? 'cos WEI was abandoned in late 2023.
         | 
         | [0] https://github.com/explainers-by-googlers/Web-Environment-
         | In...
        
         | do_not_redeem wrote:
         | Siblings are being more charitable about this, but I just don't
         | think what you're suggesting is even possible.
         | 
         | An HTTP client sends a request. The server sends a response.
         | The request and response are made of bytes. Any bytes Chrome
         | can send, curl-impersonate could also send.
         | 
         | Chromium is open source. If there was some super secret
         | handshake, anyone could copy that code to curl-impersonate. And
         | if it's only in closed-source Chrome, someone will disassemble
         | it and copy it over anyway.
        
           | gruez wrote:
           | >Chromium is open source. If there was some super secret
           | handshake, anyone could copy that code to curl-impersonate.
           | And if it's only in closed-source Chrome, someone will
           | disassemble it and copy it over anyway.
           | 
           | Not if the "super secret handshake" is based on hardware-
           | backed attestation.
        
             | do_not_redeem wrote:
             | True, but beside the point.
             | 
             | GP claims the API can detect the official chrome browser,
             | and the official chrome browser runs fine without
             | attestation.
        
           | dist-epoch wrote:
           | > someone will disassemble it and copy it over anyway.
           | 
           | Not if Chrome uses homomorphic encryption to sign a
           | challange. It's doable today. But then you could run a real
           | Chrome and forward the request to it.
        
             | do_not_redeem wrote:
             | No, even homomorphic encryption wouldn't help.
             | 
             | It doesn't matter how complicated the operation is, if you
             | have a copy of the Chrome binary, you can observe what CPU
             | instructions it uses to sign the challenge, and replicate
             | the operations yourself. Proxying to a real Chrome is the
             | most blunt approach, but there's nothing stopping you from
             | disassembling the binary and copying the code to run in
             | your own process, independent of Chrome.
        
               | dist-epoch wrote:
               | > you can observe what CPU instructions it uses to sign
               | the challenge, and replicate the operations yourself.
               | 
               | No you can't, that's the whole thing with homomorphic
               | encryption. Ask GPT to explain it to you why it's so.
               | 
               | You have no way of knowing the bounds of the code I will
               | access from the inside the homomorphic code. Depending on
               | the challenge I can query parts of the binary and hash
               | that in the response. So you will need to replicate the
               | whole binary.
               | 
               | Similar techniques are already used today by various
               | copy-protection/anti-cheat game protectors. Most of them
               | remain unbroken.
        
               | fc417fc802 wrote:
               | I don't believe this is correct. Homomorphic encryption
               | enables computation on encrypted data without needing to
               | decrypt it.
               | 
               | You can't use the result of that computation without
               | first decrypting it though. And you can't decrypt it
               | without the key. So what you describe regarding memory
               | addresses is merely garden variety obfuscation.
               | 
               | Unmasking an obfuscated set of allowable address ranges
               | for hashing given an arbitrary binary is certainly a
               | difficult problem. However as you point out it is easily
               | sidestepped.
               | 
               | You are also mistaken about anti-cheat measures. The ones
               | that pose the most difficulty primarily rely on kernel
               | mode drivers. Even then, without hardware attestation
               | it's "just" an obfuscation effort that raises the bar to
               | make breaking it more time consuming.
               | 
               | What you're actually witnessing there is that if a
               | sufficient amount of effort is invested in obfuscation
               | and those efforts carried out continuously in order to
               | regularly change the obfuscation then you can outstrip
               | the ability of the other party to keep up with you.
        
       | anon6362 wrote:
       | Set a UA and any headers and/or cookies with regular cURL
       | compiled with HTTP/3. This can be done with wrapper scripts very
       | easily. 99.999% of problems solved with no special magic buried
       | in an unclean fork.
        
         | andrewmcwatters wrote:
         | That's not the point of this fork.
         | 
         | And "unclean fork" is such an unnecessary and unprofessional
         | comment.
         | 
         | There's an entire industry of stealth browser technologies out
         | there that this falls under.
        
         | psanford wrote:
         | That doesn't solve the problem of TLS handshake fingerprinting,
         | which is the whole point of this project.
        
         | mmh0000 wrote:
         | You should really read the "Why" section of the README before
         | jumping to conclusions:
         | 
         | ``` some web services use the TLS and HTTP handshakes to
         | fingerprint which client is accessing them, and then present
         | different content for different clients. These methods are
         | known as TLS fingerprinting and HTTP/2 fingerprinting
         | respectively. Their widespread use has led to the web becoming
         | less open, less private and much more restrictive towards
         | specific web clients
         | 
         | With the modified curl in this repository, the TLS and HTTP
         | handshakes look exactly like those of a real browser. ```
         | 
         | For example, this will get you past Cloudflare's bot detection.
        
         | 01HNNWZ0MV43FF wrote:
         | The README indicates that this fork is compiled with nss (from
         | Firefox) and BoringSSL (from Chromium) to resist fingerprinting
         | based on the TLS lib. CLI flags won't do that.
        
       | bossyTeacher wrote:
       | Cool tool but it shouldn't matter whether the client is a browser
       | or not. I feel sad that we need such a tool in the real world
        
         | brutal_chaos_ wrote:
         | You may enter our site iff you use software we approve.
         | Anything else will be seen as malicious. Papers please!
         | 
         | I, too, am saddened by this gatekeeping. IIUC custom browsers
         | (or user-agent) from scratch will never work on cloudflare
         | sites and the like until the UA has enough clout (money, users,
         | etc) to sway them.
        
           | DrillShopper wrote:
           | This was sadly always going to be the outcome of the Internet
           | going commercial.
           | 
           | There's too much lost revenue in open things for companies to
           | embrace fully open technology anymore.
        
             | jrockway wrote:
             | It's kind of the opposite problem as well; huge well-funded
             | companies bringing down open source project websites. See
             | Xe's journey here: https://xeiaso.net/blog/2025/anubis/
             | 
             | One may posit "maybe these projects should cache stuff so
             | page loads aren't actually expensive" but these things are
             | best-effort and not the core focus of these projects. You
             | install some Git forge or Trac or something and it's Good
             | Enough for your contributors to get work done. But you have
             | to block the LLM bots because they ignore robots.txt and
             | naively ask for the same expensive-to-render page over and
             | over again.
             | 
             | The commercial impact is also not to be understated. I
             | remember when I worked for a startup with a cloud service.
             | It got talked about here, and suddenly every free-for-open-
             | source CI provider IP range was signing up for free trials
             | in a tight loop. These mechanical users had to be blocked.
             | It made me sad, but we wanted people to use our product,
             | not mine crypto ;)
        
         | jimt1234 wrote:
         | About six months ago I went to a government auction site that
         | _required_ Internet Explorer. Yes, Internet Explorer. The site
         | was active, too; the auction data was up-to-date. I added a
         | user-agent extension in Chrome, switched to IE, retried and it
         | worked; all functionality on the site was fine. So yeah, I was
         | both sad and annoyed. My guess is this government office paid
         | for a website 25 years ago and it hasn 't been updated since.
        
           | IMSAI8080 wrote:
           | Yeah it's probably an ancient web site. This was commonplace
           | back in the day when Internet Explorer had 90%+ market share.
           | Lazy web devs couldn't be bothered to support other browsers
           | (or didn't know how) so just added a message demanding you
           | use IE as opposed to fixing the problems with the site.
        
           | jorvi wrote:
           | In South Korea, ActiveX is still required for many things
           | like banking and government stuff. So they're stuck with both
           | IE and the gaping security hole in it that is ActiveX.
        
             | pixl97 wrote:
             | SK: "Why fix a problem when we're going extinct in 3
             | generations anyway"
        
       | VladVladikoff wrote:
       | Wait a sec... if the TLS handshakes look different, would it be
       | possible to have an nginx level filter for traffic that claims to
       | be a web browser (eg chrome user agent), yet really is a
       | python/php script? Because this would account for the vast
       | majority of malicious bot traffic, and I would love to just block
       | it.
        
         | gruez wrote:
         | That's basically what security vendors like cloudflare does,
         | except with even more fingerprinting, like a javascript
         | challenge that checks the js interpreter/DOM.
        
           | walrus01 wrote:
           | JS to check user agent things like screen window dimensions
           | as well, which legit browsers will have and bots will also
           | present but with a more uniform and predictable set of x and
           | y dimensions per set of source IPs. Lots of possibilities for
           | js endpoint fingerprinting.
        
         | aaron42net wrote:
         | Cloudflare uses JA3 and now JA4 TLS fingerprints, which are
         | hashes of various TLS handshake parameters.
         | https://github.com/FoxIO-LLC/ja4/blob/main/technical_details...
         | has more details on how that works, and they do offer an Nginx
         | module: https://github.com/FoxIO-LLC/ja4-nginx-module
        
         | immibis wrote:
         | Yes, and sites are doing this and it absolutely sucks because
         | it's not reliable and blocks everyone who isn't using the
         | latest Chrome on the latest Windows. Please don't whitelist TLS
         | fingerprints unless you're actually under attack right now.
        
           | fc417fc802 wrote:
           | If you're going to whitelist (or block at all really) please
           | simply redirect all rejected connections to a proof of work
           | scheme. At least that way things continue to work with only
           | mild inconvenience.
        
         | jrochkind1 wrote:
         | Well, I think that's what OP is meant to avoid you doing,
         | exactly.
        
       | ryao wrote:
       | Did they also set IP_TTL to set the TTL value to match the
       | platform being impersonated?
       | 
       | If not, then fingerprinting could still be done to some extent at
       | the IP layer. If the TTL value in the IP layer is below 64, it is
       | obvious this is either not running on modern Windows or is
       | running on a modern Windows machine that has had its default TTL
       | changed, since by default the TTL of packets on modern Windows
       | starts at 128 while most other platforms start it at 64. Since
       | the other platforms do not have issues communicating over the
       | internet, so IP packets from modern Windows will always be seen
       | by the remote end with TTLs at or above 64 (likely just above).
       | 
       | That said, it would be difficult to fingerprint at the IP layer,
       | although it is not impossible.
        
         | xrisk wrote:
         | Wouldn't the TTL value of received packets depend on network
         | conditions? Can you recover the client's value from the server?
        
           | ralferoo wrote:
           | The argument is that if the many (maybe the majority) of
           | systems are sending packets with a TTL of 64 and they don't
           | experience problems on the internet, then it stands to reason
           | that almost everywhere on the internet is reachable in less
           | than 64 hops (personally, I'd be amazed if it any routes are
           | actually as high as 32 hops).
           | 
           | If everywhere is reachable in under 64 hops, then packets
           | sent from systems that use a TTL of 128 will arrive at the
           | destination with a TTL still over 64 (or else they'd have
           | been discarded for all the other systems already).
        
         | gruez wrote:
         | >That said, it would be difficult to fingerprint at the IP
         | layer, although it is not impossible.
         | 
         | Only if you're using PaaS/IaaS providers don't give you low
         | level access to the TCP/IP stack. If you're running your own
         | servers it's trivial to fingerprint all manner of TCP/IP
         | properties.
         | 
         | https://en.wikipedia.org/wiki/TCP/IP_stack_fingerprinting
        
         | fc417fc802 wrote:
         | What is the reasoning behind TTL counting down instead of up,
         | anyway? Wouldn't we generally expect those routing the traffic
         | to determine if and how to do so?
        
           | sadjad wrote:
           | The primary purpose of TTL is to prevent packets from looping
           | endlessly during routing. If a packet gets stuck in a loop,
           | its TTL will eventually reach zero, and then it will be
           | dropped.
        
             | fc417fc802 wrote:
             | That doesn't answer my question. If it counted up then it
             | would be up to each hop to set its own policy. Things
             | wouldn't loop endlessly in that scenario either.
        
       | jruohonen wrote:
       | The notion of real-world TLS/HTTP fingerprinting was somewhat new
       | to me, and it looks interesting in theory, but I wonder what the
       | build's use case really is? I mean you have the heavy-handed
       | JavaScript running everywhere now.
        
       | jamal-kumar wrote:
       | This tool is pretty sweet in little bash scripts combo'd up with
       | gnu parallel on red team engagements for mapping https endpoints
       | within whatever scoped address ranges that will only respond to
       | either proper browsers due to whatever, or with the SNI stuff in
       | order. Been finding it super sweet for that. Can do all the
       | normal curl switches like -H for header spoofing
        
       ___________________________________________________________________
       (page generated 2025-04-03 23:00 UTC)