[HN Gopher] Curl-impersonate: Special build of curl that can imp...
___________________________________________________________________
Curl-impersonate: Special build of curl that can impersonate the
major browsers
Author : mmh0000
Score : 226 points
Date : 2025-04-03 15:24 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| pvg wrote:
| Showhn at the time https://news.ycombinator.com/item?id=30378562
| croemer wrote:
| Back then (2022) it was Firefox only
| jchw wrote:
| I'm rooting for Ladybird to gain traction in the future.
| Currently, it is using cURL proper for networking. That is
| probably going to have some challenges (I think cURL is still
| limited in some ways, e.g. I don't think it can do WebSockets
| over h2 yet) but on the other hand, having a rising browser
| engine might eventually remove this avenue for fingerprinting
| since legitimate traffic will have the same fingerprint as stock
| cURL.
| rhdunn wrote:
| It would be good to see Ladybird's cURL usage improve cURL
| itself, such as the WebSocket over h2 example you mention. It
| is also a good test of cURL to see and identify what
| functionality cURL is missing w.r.t. real-world browser
| workflows.
| eesmith wrote:
| I'm hoping this means Ladybird might support ftp URLs.
| navanchauhan wrote:
| and even the Gopher protocol!
| nonrandomstring wrote:
| When I spoke to these guys [0] we touched on those quirks and
| foibles that make a signature (including TCP stack stuff beyond
| control of any userspace app).
|
| I love this curl, but I worry that if a component takes on the
| role of deception in order to "keep up" it accumulates a legacy
| of hard to maintain "compatibility" baggage.
|
| Ideally it should just say... "hey I'm curl, let me in"
|
| The problem of course lies with a server that is picky about
| dress codes, and that problem in turn is caused by crooks
| sneaking in disguise, so it's rather a circular chicken and egg
| thing.
|
| [0] https://cybershow.uk/episodes.php?id=39
| immibis wrote:
| What should instead happen is that Chrome should stop sending
| as much of a fingerprint, so that sites won't be able to
| fingerprint. That won't happen, since it's against Google's
| interests.
| gruez wrote:
| This is a fundamental misunderstanding of how TLS
| fingerprinting works. The "fingerprint" isn't from chrome
| sending a "fingerprint: [random uuid]" attribute in every
| TLS negotiation. It's derived from various properties of
| the TLS stack, like what ciphers it can accept. You can't
| make "stop sending as much of a fingerprint", without every
| browser agreeing on the same TLS stack. It's already
| minimal as it is, because there's basically no aspect of
| the TLS stack that users can configure, and chrome bundles
| its own, so you'd expect every chrome user to have the same
| TLS fingerprint. It's only really useful to distinguish
| "fake" chrome users (eg. curl with custom header set, or
| firefox users with user agent spoofer) from "real" chrome
| users.
| dochtman wrote:
| Part of the fingerprint is stuff like the ordering of
| extensions, which Chrome could easily do but AFAIK
| doesn't.
|
| (AIUI Google's Play Store is one of the biggest TLS
| fingerprinting culprits.)
| gruez wrote:
| What's the advantage of randomizing the order, when all
| chrome users already have the same order? Practically
| speaking there's a bazillion ways to fingerprint Chrome
| besides TLS cipher ordering, that it's not worth adding
| random mitigations like this.
| shiomiru wrote:
| Chrome has randomized its ClientHello extension order for
| two years now.[0]
|
| The companies to blame here are solely the ones employing
| these fingerprinting techniques, and those relying on
| services of these companies (which is a worryingly large
| chunk of the web). For example, after the Chrome change,
| Cloudflare just switched to a fingerprinter that doesn't
| check the order.[1]
|
| [0]: https://chromestatus.com/feature/5124606246518784
|
| [1]: https://blog.cloudflare.com/ja4-signals/
| nonrandomstring wrote:
| > blame here are solely the ones employing these
| fingerprinting techniques,
|
| Sure. And it's a tragedy. But when you look at the bot
| situation and the sheer magnitude of resource abuse out
| there, you have to see it from the other side.
|
| FWIW the conversation mentioned above, we acknowledged
| that and moved on to talk about _behavioural_
| fingerprinting and why it makes sense not to focus on the
| browser /agent alone but what gets done with it.
| NavinF wrote:
| Last time I saw someone complaining about scrapers, they
| were talking about 100gib/month. That's 300kbps. Less
| than $1/month in IP transit and ~$0 in compute.
| Personally I've never noticed bots show up on a resource
| graph. As long as you don't block them, they won't bother
| using more than a few IPs and they'll backoff when
| they're throttled
| marcus0x62 wrote:
| For some sites, things are a lot worse. See, for example,
| Jonathan Corbet's report[0].
|
| 0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664
| fc417fc802 wrote:
| > The companies to blame here are solely the ones
| employing these fingerprinting techniques,
|
| Let's not go blaming vulnerabilities on those exploiting
| them. Exploitation is _also_ bad but being exploitable is
| a problem in and of itself.
| thaumasiotes wrote:
| > Ideally it should just say... "hey I'm curl, let me in"
|
| What? Ideally it should just say "GET /path/to/page".
|
| Sending a user agent is a bad idea. That shouldn't be
| happening at all, from any source.
| ec109685 wrote:
| There are API's that chrome provides that allows servers to
| validate whether the request came from an official chrome
| browser. That would detect that this curl isn't really chrome.
|
| It'd be nice if something could support curl's arguments but
| drive an actual headless chrome browser.
| binarymax wrote:
| I'm interested in learning more about this. Are these APIs
| documented anywhere and are there server side implementation
| examples that you know of?
|
| EDIT: this is the closest I could find.
| https://developers.google.com/chrome/verified-access/overvie...
| ...but it's not generic enough to lead me to the declaration
| you made.
| KTibow wrote:
| I think they confused Chrome and Googlebot.
| bowmessage wrote:
| There's no way this couldn't be replicated by a special build
| of curl.
| darrenf wrote:
| Are you referring to the Web Environment Integrity[0] stuff, or
| something else? 'cos WEI was abandoned in late 2023.
|
| [0] https://github.com/explainers-by-googlers/Web-Environment-
| In...
| do_not_redeem wrote:
| Siblings are being more charitable about this, but I just don't
| think what you're suggesting is even possible.
|
| An HTTP client sends a request. The server sends a response.
| The request and response are made of bytes. Any bytes Chrome
| can send, curl-impersonate could also send.
|
| Chromium is open source. If there was some super secret
| handshake, anyone could copy that code to curl-impersonate. And
| if it's only in closed-source Chrome, someone will disassemble
| it and copy it over anyway.
| gruez wrote:
| >Chromium is open source. If there was some super secret
| handshake, anyone could copy that code to curl-impersonate.
| And if it's only in closed-source Chrome, someone will
| disassemble it and copy it over anyway.
|
| Not if the "super secret handshake" is based on hardware-
| backed attestation.
| do_not_redeem wrote:
| True, but beside the point.
|
| GP claims the API can detect the official chrome browser,
| and the official chrome browser runs fine without
| attestation.
| dist-epoch wrote:
| > someone will disassemble it and copy it over anyway.
|
| Not if Chrome uses homomorphic encryption to sign a
| challange. It's doable today. But then you could run a real
| Chrome and forward the request to it.
| do_not_redeem wrote:
| No, even homomorphic encryption wouldn't help.
|
| It doesn't matter how complicated the operation is, if you
| have a copy of the Chrome binary, you can observe what CPU
| instructions it uses to sign the challenge, and replicate
| the operations yourself. Proxying to a real Chrome is the
| most blunt approach, but there's nothing stopping you from
| disassembling the binary and copying the code to run in
| your own process, independent of Chrome.
| dist-epoch wrote:
| > you can observe what CPU instructions it uses to sign
| the challenge, and replicate the operations yourself.
|
| No you can't, that's the whole thing with homomorphic
| encryption. Ask GPT to explain it to you why it's so.
|
| You have no way of knowing the bounds of the code I will
| access from the inside the homomorphic code. Depending on
| the challenge I can query parts of the binary and hash
| that in the response. So you will need to replicate the
| whole binary.
|
| Similar techniques are already used today by various
| copy-protection/anti-cheat game protectors. Most of them
| remain unbroken.
| fc417fc802 wrote:
| I don't believe this is correct. Homomorphic encryption
| enables computation on encrypted data without needing to
| decrypt it.
|
| You can't use the result of that computation without
| first decrypting it though. And you can't decrypt it
| without the key. So what you describe regarding memory
| addresses is merely garden variety obfuscation.
|
| Unmasking an obfuscated set of allowable address ranges
| for hashing given an arbitrary binary is certainly a
| difficult problem. However as you point out it is easily
| sidestepped.
|
| You are also mistaken about anti-cheat measures. The ones
| that pose the most difficulty primarily rely on kernel
| mode drivers. Even then, without hardware attestation
| it's "just" an obfuscation effort that raises the bar to
| make breaking it more time consuming.
|
| What you're actually witnessing there is that if a
| sufficient amount of effort is invested in obfuscation
| and those efforts carried out continuously in order to
| regularly change the obfuscation then you can outstrip
| the ability of the other party to keep up with you.
| anon6362 wrote:
| Set a UA and any headers and/or cookies with regular cURL
| compiled with HTTP/3. This can be done with wrapper scripts very
| easily. 99.999% of problems solved with no special magic buried
| in an unclean fork.
| andrewmcwatters wrote:
| That's not the point of this fork.
|
| And "unclean fork" is such an unnecessary and unprofessional
| comment.
|
| There's an entire industry of stealth browser technologies out
| there that this falls under.
| psanford wrote:
| That doesn't solve the problem of TLS handshake fingerprinting,
| which is the whole point of this project.
| mmh0000 wrote:
| You should really read the "Why" section of the README before
| jumping to conclusions:
|
| ``` some web services use the TLS and HTTP handshakes to
| fingerprint which client is accessing them, and then present
| different content for different clients. These methods are
| known as TLS fingerprinting and HTTP/2 fingerprinting
| respectively. Their widespread use has led to the web becoming
| less open, less private and much more restrictive towards
| specific web clients
|
| With the modified curl in this repository, the TLS and HTTP
| handshakes look exactly like those of a real browser. ```
|
| For example, this will get you past Cloudflare's bot detection.
| 01HNNWZ0MV43FF wrote:
| The README indicates that this fork is compiled with nss (from
| Firefox) and BoringSSL (from Chromium) to resist fingerprinting
| based on the TLS lib. CLI flags won't do that.
| bossyTeacher wrote:
| Cool tool but it shouldn't matter whether the client is a browser
| or not. I feel sad that we need such a tool in the real world
| brutal_chaos_ wrote:
| You may enter our site iff you use software we approve.
| Anything else will be seen as malicious. Papers please!
|
| I, too, am saddened by this gatekeeping. IIUC custom browsers
| (or user-agent) from scratch will never work on cloudflare
| sites and the like until the UA has enough clout (money, users,
| etc) to sway them.
| DrillShopper wrote:
| This was sadly always going to be the outcome of the Internet
| going commercial.
|
| There's too much lost revenue in open things for companies to
| embrace fully open technology anymore.
| jrockway wrote:
| It's kind of the opposite problem as well; huge well-funded
| companies bringing down open source project websites. See
| Xe's journey here: https://xeiaso.net/blog/2025/anubis/
|
| One may posit "maybe these projects should cache stuff so
| page loads aren't actually expensive" but these things are
| best-effort and not the core focus of these projects. You
| install some Git forge or Trac or something and it's Good
| Enough for your contributors to get work done. But you have
| to block the LLM bots because they ignore robots.txt and
| naively ask for the same expensive-to-render page over and
| over again.
|
| The commercial impact is also not to be understated. I
| remember when I worked for a startup with a cloud service.
| It got talked about here, and suddenly every free-for-open-
| source CI provider IP range was signing up for free trials
| in a tight loop. These mechanical users had to be blocked.
| It made me sad, but we wanted people to use our product,
| not mine crypto ;)
| jimt1234 wrote:
| About six months ago I went to a government auction site that
| _required_ Internet Explorer. Yes, Internet Explorer. The site
| was active, too; the auction data was up-to-date. I added a
| user-agent extension in Chrome, switched to IE, retried and it
| worked; all functionality on the site was fine. So yeah, I was
| both sad and annoyed. My guess is this government office paid
| for a website 25 years ago and it hasn 't been updated since.
| IMSAI8080 wrote:
| Yeah it's probably an ancient web site. This was commonplace
| back in the day when Internet Explorer had 90%+ market share.
| Lazy web devs couldn't be bothered to support other browsers
| (or didn't know how) so just added a message demanding you
| use IE as opposed to fixing the problems with the site.
| jorvi wrote:
| In South Korea, ActiveX is still required for many things
| like banking and government stuff. So they're stuck with both
| IE and the gaping security hole in it that is ActiveX.
| pixl97 wrote:
| SK: "Why fix a problem when we're going extinct in 3
| generations anyway"
| VladVladikoff wrote:
| Wait a sec... if the TLS handshakes look different, would it be
| possible to have an nginx level filter for traffic that claims to
| be a web browser (eg chrome user agent), yet really is a
| python/php script? Because this would account for the vast
| majority of malicious bot traffic, and I would love to just block
| it.
| gruez wrote:
| That's basically what security vendors like cloudflare does,
| except with even more fingerprinting, like a javascript
| challenge that checks the js interpreter/DOM.
| walrus01 wrote:
| JS to check user agent things like screen window dimensions
| as well, which legit browsers will have and bots will also
| present but with a more uniform and predictable set of x and
| y dimensions per set of source IPs. Lots of possibilities for
| js endpoint fingerprinting.
| aaron42net wrote:
| Cloudflare uses JA3 and now JA4 TLS fingerprints, which are
| hashes of various TLS handshake parameters.
| https://github.com/FoxIO-LLC/ja4/blob/main/technical_details...
| has more details on how that works, and they do offer an Nginx
| module: https://github.com/FoxIO-LLC/ja4-nginx-module
| immibis wrote:
| Yes, and sites are doing this and it absolutely sucks because
| it's not reliable and blocks everyone who isn't using the
| latest Chrome on the latest Windows. Please don't whitelist TLS
| fingerprints unless you're actually under attack right now.
| fc417fc802 wrote:
| If you're going to whitelist (or block at all really) please
| simply redirect all rejected connections to a proof of work
| scheme. At least that way things continue to work with only
| mild inconvenience.
| jrochkind1 wrote:
| Well, I think that's what OP is meant to avoid you doing,
| exactly.
| ryao wrote:
| Did they also set IP_TTL to set the TTL value to match the
| platform being impersonated?
|
| If not, then fingerprinting could still be done to some extent at
| the IP layer. If the TTL value in the IP layer is below 64, it is
| obvious this is either not running on modern Windows or is
| running on a modern Windows machine that has had its default TTL
| changed, since by default the TTL of packets on modern Windows
| starts at 128 while most other platforms start it at 64. Since
| the other platforms do not have issues communicating over the
| internet, so IP packets from modern Windows will always be seen
| by the remote end with TTLs at or above 64 (likely just above).
|
| That said, it would be difficult to fingerprint at the IP layer,
| although it is not impossible.
| xrisk wrote:
| Wouldn't the TTL value of received packets depend on network
| conditions? Can you recover the client's value from the server?
| ralferoo wrote:
| The argument is that if the many (maybe the majority) of
| systems are sending packets with a TTL of 64 and they don't
| experience problems on the internet, then it stands to reason
| that almost everywhere on the internet is reachable in less
| than 64 hops (personally, I'd be amazed if it any routes are
| actually as high as 32 hops).
|
| If everywhere is reachable in under 64 hops, then packets
| sent from systems that use a TTL of 128 will arrive at the
| destination with a TTL still over 64 (or else they'd have
| been discarded for all the other systems already).
| gruez wrote:
| >That said, it would be difficult to fingerprint at the IP
| layer, although it is not impossible.
|
| Only if you're using PaaS/IaaS providers don't give you low
| level access to the TCP/IP stack. If you're running your own
| servers it's trivial to fingerprint all manner of TCP/IP
| properties.
|
| https://en.wikipedia.org/wiki/TCP/IP_stack_fingerprinting
| fc417fc802 wrote:
| What is the reasoning behind TTL counting down instead of up,
| anyway? Wouldn't we generally expect those routing the traffic
| to determine if and how to do so?
| sadjad wrote:
| The primary purpose of TTL is to prevent packets from looping
| endlessly during routing. If a packet gets stuck in a loop,
| its TTL will eventually reach zero, and then it will be
| dropped.
| fc417fc802 wrote:
| That doesn't answer my question. If it counted up then it
| would be up to each hop to set its own policy. Things
| wouldn't loop endlessly in that scenario either.
| jruohonen wrote:
| The notion of real-world TLS/HTTP fingerprinting was somewhat new
| to me, and it looks interesting in theory, but I wonder what the
| build's use case really is? I mean you have the heavy-handed
| JavaScript running everywhere now.
| jamal-kumar wrote:
| This tool is pretty sweet in little bash scripts combo'd up with
| gnu parallel on red team engagements for mapping https endpoints
| within whatever scoped address ranges that will only respond to
| either proper browsers due to whatever, or with the SNI stuff in
| order. Been finding it super sweet for that. Can do all the
| normal curl switches like -H for header spoofing
___________________________________________________________________
(page generated 2025-04-03 23:00 UTC)