[HN Gopher] Curl-impersonate: Special build of curl that can imp...
___________________________________________________________________
Curl-impersonate: Special build of curl that can impersonate the
major browsers
Author : mmh0000
Score : 499 points
Date : 2025-04-03 15:24 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| pvg wrote:
| Showhn at the time https://news.ycombinator.com/item?id=30378562
| croemer wrote:
| Back then (2022) it was Firefox only
| jchw wrote:
| I'm rooting for Ladybird to gain traction in the future.
| Currently, it is using cURL proper for networking. That is
| probably going to have some challenges (I think cURL is still
| limited in some ways, e.g. I don't think it can do WebSockets
| over h2 yet) but on the other hand, having a rising browser
| engine might eventually remove this avenue for fingerprinting
| since legitimate traffic will have the same fingerprint as stock
| cURL.
| rhdunn wrote:
| It would be good to see Ladybird's cURL usage improve cURL
| itself, such as the WebSocket over h2 example you mention. It
| is also a good test of cURL to see and identify what
| functionality cURL is missing w.r.t. real-world browser
| workflows.
| eesmith wrote:
| I'm hoping this means Ladybird might support ftp URLs.
| navanchauhan wrote:
| and even the Gopher protocol!
| nonrandomstring wrote:
| When I spoke to these guys [0] we touched on those quirks and
| foibles that make a signature (including TCP stack stuff beyond
| control of any userspace app).
|
| I love this curl, but I worry that if a component takes on the
| role of deception in order to "keep up" it accumulates a legacy
| of hard to maintain "compatibility" baggage.
|
| Ideally it should just say... "hey I'm curl, let me in"
|
| The problem of course lies with a server that is picky about
| dress codes, and that problem in turn is caused by crooks
| sneaking in disguise, so it's rather a circular chicken and egg
| thing.
|
| [0] https://cybershow.uk/episodes.php?id=39
| immibis wrote:
| What should instead happen is that Chrome should stop sending
| as much of a fingerprint, so that sites won't be able to
| fingerprint. That won't happen, since it's against Google's
| interests.
| gruez wrote:
| This is a fundamental misunderstanding of how TLS
| fingerprinting works. The "fingerprint" isn't from chrome
| sending a "fingerprint: [random uuid]" attribute in every
| TLS negotiation. It's derived from various properties of
| the TLS stack, like what ciphers it can accept. You can't
| make "stop sending as much of a fingerprint", without every
| browser agreeing on the same TLS stack. It's already
| minimal as it is, because there's basically no aspect of
| the TLS stack that users can configure, and chrome bundles
| its own, so you'd expect every chrome user to have the same
| TLS fingerprint. It's only really useful to distinguish
| "fake" chrome users (eg. curl with custom header set, or
| firefox users with user agent spoofer) from "real" chrome
| users.
| dochtman wrote:
| Part of the fingerprint is stuff like the ordering of
| extensions, which Chrome could easily do but AFAIK
| doesn't.
|
| (AIUI Google's Play Store is one of the biggest TLS
| fingerprinting culprits.)
| gruez wrote:
| What's the advantage of randomizing the order, when all
| chrome users already have the same order? Practically
| speaking there's a bazillion ways to fingerprint Chrome
| besides TLS cipher ordering, that it's not worth adding
| random mitigations like this.
| shiomiru wrote:
| Chrome has randomized its ClientHello extension order for
| two years now.[0]
|
| The companies to blame here are solely the ones employing
| these fingerprinting techniques, and those relying on
| services of these companies (which is a worryingly large
| chunk of the web). For example, after the Chrome change,
| Cloudflare just switched to a fingerprinter that doesn't
| check the order.[1]
|
| [0]: https://chromestatus.com/feature/5124606246518784
|
| [1]: https://blog.cloudflare.com/ja4-signals/
| nonrandomstring wrote:
| > blame here are solely the ones employing these
| fingerprinting techniques,
|
| Sure. And it's a tragedy. But when you look at the bot
| situation and the sheer magnitude of resource abuse out
| there, you have to see it from the other side.
|
| FWIW the conversation mentioned above, we acknowledged
| that and moved on to talk about _behavioural_
| fingerprinting and why it makes sense not to focus on the
| browser /agent alone but what gets done with it.
| NavinF wrote:
| Last time I saw someone complaining about scrapers, they
| were talking about 100gib/month. That's 300kbps. Less
| than $1/month in IP transit and ~$0 in compute.
| Personally I've never noticed bots show up on a resource
| graph. As long as you don't block them, they won't bother
| using more than a few IPs and they'll backoff when
| they're throttled
| marcus0x62 wrote:
| For some sites, things are a lot worse. See, for example,
| Jonathan Corbet's report[0].
|
| 0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664
| lmz wrote:
| How can you say it's $0 in compute without knowing if the
| data returned required any computation?
| nonrandomstring wrote:
| Didn't rachelbytheebay post recently that her blog was
| being swamped? I've heard that from a few self-hosting
| bloggers now. And Wikipedia has recently said more than
| half of traffic is noe bots. ARe you claiming this isn't
| a real problem?
| fc417fc802 wrote:
| > The companies to blame here are solely the ones
| employing these fingerprinting techniques,
|
| Let's not go blaming vulnerabilities on those exploiting
| them. Exploitation is _also_ bad but being exploitable is
| a problem in and of itself.
| shiomiru wrote:
| > Let's not go blaming vulnerabilities on those
| exploiting them. Exploitation is also bad but being
| exploitable is a problem in and of itself.
|
| There's "vulnerabilities" and there's "inherent
| properties of a complex protocol that is used to transfer
| data securely". One of the latter is that metadata may
| differ from client to client for various reasons, inside
| the bounds accepted in the standard. If you discriminate
| based on such metadata, you have effectively invented a
| new proprietary protocol that certain existing browsers
| just so happen to implement.
|
| It's like the UA string, but instead of just copying a
| single HTTP header, new browsers now have to reverse
| engineer the network stack of existing ones to get an
| identical user experience.
| fc417fc802 wrote:
| I get that. I don't condone the behavior of those doing
| the fingerprinting. But what I'm saying is that the fact
| that it is possible to fingerprint should in pretty much
| all cases be viewed as a sort of vulnerability.
|
| It isn't necessarily a critical vulnerability. But it is
| a problem on _some_ level nonetheless. To the extent
| possible you should not be leaking information that you
| did not intend to share.
|
| A protocol that can be fingerprinted is similar to a
| water pipe with a pinhole leak. It still works, it isn't
| (necessarily) catastrophic, but it definitely would be
| better if it wasn't leaking.
| Jubijub wrote:
| I'm sorry but you comment shows you never had to fight
| this problem a scale. The challenge is not small time
| crawlers, the challenge is blocking large / dedicated
| actors. The problem is simple : if there is more than X
| volume of traffic per <aggregation criteria >, block it.
| Problem : most aggregation criteria are trivially
| spoofable, or very cheap to change : - IP : with IPv6
| this is not an issue to rotate your IP often - UA :
| changing this is scraping 101 - SSL fingerprint : easy to
| use the same as everyone - IP stack fingerprint : also
| easy to use a common one - request / session tokens :
| it's cheap to create a new session You can force login,
| but then you have a spam account creation challenge, with
| the same issues as above, and depending on your infra
| this can become heavy
|
| Add to this that the minute you use a signal for
| detection, you "burn" it as adversaries will avoid using
| it, and you lose measurement thus the ability to know if
| you are fixing the problem at all.
|
| I worked on this kind of problem for a FAANG service,
| whoever claims it's easy clearly never had to deal with
| motivated adversaries
| RKFADU_UOFCCLEL wrote:
| What? Just fix the ciphers to a list of what's known to
| work + some safety margin. Each user needing some
| different specific cipher (like a cipher for horses, and
| one for dogs), is not a thing.
| gruez wrote:
| >Just fix the ciphers to a list of what's known to work +
| some safety margin.
|
| That's already the case. The trouble is that NSS (what
| firefox uses) doesn't support the same cipher suites as
| boringssl (what chrome uses?).
| thaumasiotes wrote:
| > Ideally it should just say... "hey I'm curl, let me in"
|
| What? Ideally it should just say "GET /path/to/page".
|
| Sending a user agent is a bad idea. That shouldn't be
| happening at all, from any source.
| Tor3 wrote:
| Since the first browser appeared I've always meant that
| sending a user agent id was a really bad idea. It breaks
| with the fundamental idea of the web protocol, that it's
| the server's responsibility to provide data and it's the
| client's responsibility to present it to the user. The
| server does not need to know anything about the client.
| Including user agent in this whole thing was a huge mistake
| as it allowed web site designers to code for specific
| quirks in browsers. I can to some extent accept a
| capability list from the client, but I'm not so sure even
| that is necessary.
| nonrandomstring wrote:
| Absolutely, yes! A protocol should not be tied to client
| details. Where did "User Agent" strings even come from?
| darrenf wrote:
| They're in the HTTP/1.0 spec. https://www.rfc-
| editor.org/rfc/rfc1945#section-10.15
|
| 10.15 User-Agent The User-Agent request-
| header field contains information about the user
| agent originating the request. This is for statistical
| purposes, the tracing of protocol violations, and
| automated recognition of user agents for the sake
| of tailoring responses to avoid particular user
| agent limitations.
| userbinator wrote:
| _but on the other hand, having a rising browser engine might
| eventually remove this avenue for fingerprinting_
|
| If what I've seen from CloudFlare et.al. are any indication,
| it's the exact opposite --- the amount of fingerprinting and
| "exploitation" of implementation-defined behaviour has
| _increased_ significantly in the past few months, likely in an
| attempt to kill off other browser engines; the incumbents do
| not like competition at all.
|
| The enemy has been trying to spin it as "AI bots DDoSing" but
| one wonders how much of that was their own doing...
| hansvm wrote:
| Hold up, one of those things is not like the other. Are we
| really blaming webmasters for 100x increases in costs from a
| huge wave of poorly written and maliciously aggressive bots?
| refulgentis wrote:
| > Are we really blaming...
|
| No, they're discussing increased fingerprinting / browser
| profiling recently and how it affects low-market-share
| browsers.
| hansvm wrote:
| I saw that, but I'm still not sure how this fits in:
|
| > The enemy has been trying to spin it as "AI bots
| DDoSing" but one wonders how much of that was their own
| doing...
|
| I'm reading that as `enemy == fingerprinters`, `that ==
| AI bots DDoSing`, and `their own == webmasters, hosting
| providers, and CDNs (i.e., the fingerprinters)`, which
| sounds pretty straightforwardly like the fingerprinters
| are responsible for the DDoSing they're receiving.
|
| That interpretation doesn't seem to match the rest of the
| post though. Do you happen to have a better one?
| userbinator wrote:
| "their own" = CloudFlare and/or those who have vested
| interests in closing up the Internet.
| jillyboel wrote:
| Your costs only went up 100x if you built your site poorly
| hansvm wrote:
| I'll bite. How do you serve 100x the traffic without 100x
| the costs? It costs something like 1e-10 dollars to serve
| a recipe page with a few photos, for example. If you
| serve it 100x more times, how does that not scale up?
| jillyboel wrote:
| It might scale up but if you're anywhere near efficient
| you're way overprovisioned to begin with. The compute
| cost should be miniscule due to caching and bandwidth is
| cheap if you're not with one of the big clouds. As an
| example, according to dang HN runs on a single server and
| yet many websites that get posted _to_ HN, and thus
| receive a fraction of the traffic, go down due to the
| load.
| SoftTalker wrote:
| It's entirely deliberate. CloudFlare could certainly
| distinguish low-volume but legit web browsers from bots, as
| much as they can distinguish chrome/edge/safari/firefox from
| bots. That is if they cared to.
| cyanydeez wrote:
| I dont think they're doing this to kill off browser engines;
| they're trying to sift browsers into "user" and "AI slop", so
| they can prioritize users.
|
| This is entirely web crawler 2.0 apocolypse.
| nicman23 wrote:
| man i just want a bot to buy groceries for me
| baq wrote:
| That's one of the few reasons to leave the house. I'd
| like dishes and laundry bots first, please.
| dodslaser wrote:
| You mean dishwashers and washing machines?
| baq wrote:
| Yes, but no. I want a robot to load and unload those.
| dec0dedab0de wrote:
| I have been paying my local laundromat to do mu laundry
| for over a decade now, it's probably cheaper than youre
| imagining and sooo worth it.
| baq wrote:
| my household is 6 people, it isn't uncommon to run 3
| washing machine loads in a day and days without at least
| one are rare. I can imagine the convenience, but at this
| scale it sounds a bit unreasonable.
|
| dishwasher runs at least once a day, at least 80% full,
| every day, unless we're traveling.
| extraduder_ire wrote:
| I think "slop" only refers to the output of generative AI
| systems. bot, crawler, scraper, or spider would be a more
| apt term for software making (excessive) requests to
| collect data.
| johnisgood wrote:
| I used to call it "cURL", but apparently officially it is curl,
| correct?
| cruffle_duffle wrote:
| As in "See-URL"? I've always called it curl but "see url"
| makes a hell of a lot of sense too! I've just never
| considered it and it's one of those things you rarely say out
| loud.
| johnisgood wrote:
| I prefer cURL as well, but according to official sources it
| is curl. :D Not sure how it is pronounced though, I
| pronounce it as "see-url" and/or "see-U-R-L". It might be
| pronounced as "curl" though.
| bdhcuidbebe wrote:
| I'd guess Daniel pronounce it as "kurl", with a hard C like
| in "crust", since hes swedish.
| devwastaken wrote:
| ladybird does not have the resources to be a contender to
| current browsers. its well marketed but has no benefits or
| reason to exist over chromium. its also a major security risk
| as it is designed yet again in demonstrably unsafe c++.
| ec109685 wrote:
| There are API's that chrome provides that allows servers to
| validate whether the request came from an official chrome
| browser. That would detect that this curl isn't really chrome.
|
| It'd be nice if something could support curl's arguments but
| drive an actual headless chrome browser.
| binarymax wrote:
| I'm interested in learning more about this. Are these APIs
| documented anywhere and are there server side implementation
| examples that you know of?
|
| EDIT: this is the closest I could find.
| https://developers.google.com/chrome/verified-access/overvie...
| ...but it's not generic enough to lead me to the declaration
| you made.
| KTibow wrote:
| I think they confused Chrome and Googlebot.
| bowmessage wrote:
| There's no way this couldn't be replicated by a special build
| of curl.
| darrenf wrote:
| Are you referring to the Web Environment Integrity[0] stuff, or
| something else? 'cos WEI was abandoned in late 2023.
|
| [0] https://github.com/explainers-by-googlers/Web-Environment-
| In...
| do_not_redeem wrote:
| Siblings are being more charitable about this, but I just don't
| think what you're suggesting is even possible.
|
| An HTTP client sends a request. The server sends a response.
| The request and response are made of bytes. Any bytes Chrome
| can send, curl-impersonate could also send.
|
| Chromium is open source. If there was some super secret
| handshake, anyone could copy that code to curl-impersonate. And
| if it's only in closed-source Chrome, someone will disassemble
| it and copy it over anyway.
| gruez wrote:
| >Chromium is open source. If there was some super secret
| handshake, anyone could copy that code to curl-impersonate.
| And if it's only in closed-source Chrome, someone will
| disassemble it and copy it over anyway.
|
| Not if the "super secret handshake" is based on hardware-
| backed attestation.
| do_not_redeem wrote:
| True, but beside the point.
|
| GP claims the API can detect the official chrome browser,
| and the official chrome browser runs fine without
| attestation.
| dist-epoch wrote:
| > someone will disassemble it and copy it over anyway.
|
| Not if Chrome uses homomorphic encryption to sign a
| challange. It's doable today. But then you could run a real
| Chrome and forward the request to it.
| do_not_redeem wrote:
| No, even homomorphic encryption wouldn't help.
|
| It doesn't matter how complicated the operation is, if you
| have a copy of the Chrome binary, you can observe what CPU
| instructions it uses to sign the challenge, and replicate
| the operations yourself. Proxying to a real Chrome is the
| most blunt approach, but there's nothing stopping you from
| disassembling the binary and copying the code to run in
| your own process, independent of Chrome.
| dist-epoch wrote:
| > you can observe what CPU instructions it uses to sign
| the challenge, and replicate the operations yourself.
|
| No you can't, that's the whole thing with homomorphic
| encryption. Ask GPT to explain it to you why it's so.
|
| You have no way of knowing the bounds of the code I will
| access from the inside the homomorphic code. Depending on
| the challenge I can query parts of the binary and hash
| that in the response. So you will need to replicate the
| whole binary.
|
| Similar techniques are already used today by various
| copy-protection/anti-cheat game protectors. Most of them
| remain unbroken.
| fc417fc802 wrote:
| I don't believe this is correct. Homomorphic encryption
| enables computation on encrypted data without needing to
| decrypt it.
|
| You can't use the result of that computation without
| first decrypting it though. And you can't decrypt it
| without the key. So what you describe regarding memory
| addresses is merely garden variety obfuscation.
|
| Unmasking an obfuscated set of allowable address ranges
| for hashing given an arbitrary binary is certainly a
| difficult problem. However as you point out it is easily
| sidestepped.
|
| You are also mistaken about anti-cheat measures. The ones
| that pose the most difficulty primarily rely on kernel
| mode drivers. Even then, without hardware attestation
| it's "just" an obfuscation effort that raises the bar to
| make breaking it more time consuming.
|
| What you're actually witnessing there is that if a
| sufficient amount of effort is invested in obfuscation
| and those efforts carried out continuously in order to
| regularly change the obfuscation then you can outstrip
| the ability of the other party to keep up with you.
| do_not_redeem wrote:
| You're just describing ordinary challenge-response. That
| has nothing to do with homomorphic encryption and there
| are plenty of examples from before homomorphic encryption
| became viable, for example https://www.geoffchappell.com/
| notes/security/aim/index.htm
|
| Homomorphic encryption hides data, not computation. If
| you've been trying to learn compsci from GPT, you might
| have fallen victim to hallucinations. I'd recommend
| starting from wikipedia instead.
| https://en.wikipedia.org/wiki/Homomorphic_encryption
|
| And btw most games are cracked within a week of release.
| You have way too much faith in buzzwords and way too
| little faith in bored Eastern European teenagers.
| dist-epoch wrote:
| > Homomorphic encryption hides data, not computation
|
| Data is computation. x = challenge_byte ^
| secret_key if x > 64: y =
| hash_memory_range() else: y =
| something_else() return sign(y, secret_key)
| do_not_redeem wrote:
| That snippet has nothing to do with homomorphic
| encryption. It's just the same kind of challenge-response
| AIM and many others were already doing in the 90s.
|
| You seem convinced that homomorphic encryption is some
| kind of magic that prevents someone from observing their
| own hardware, or from running Chrome under a debugger.
| That's just not true. And I suspect we don't share enough
| of a common vocabulary to have a productive discussion,
| so I'll end it here.
| anon6362 wrote:
| Set a UA and any headers and/or cookies with regular cURL
| compiled with HTTP/3. This can be done with wrapper scripts very
| easily. 99.999% of problems solved with no special magic buried
| in an unclean fork.
| andrewmcwatters wrote:
| That's not the point of this fork.
|
| And "unclean fork" is such an unnecessary and unprofessional
| comment.
|
| There's an entire industry of stealth browser technologies out
| there that this falls under.
| psanford wrote:
| That doesn't solve the problem of TLS handshake fingerprinting,
| which is the whole point of this project.
| mmh0000 wrote:
| You should really read the "Why" section of the README before
| jumping to conclusions:
|
| ``` some web services use the TLS and HTTP handshakes to
| fingerprint which client is accessing them, and then present
| different content for different clients. These methods are
| known as TLS fingerprinting and HTTP/2 fingerprinting
| respectively. Their widespread use has led to the web becoming
| less open, less private and much more restrictive towards
| specific web clients
|
| With the modified curl in this repository, the TLS and HTTP
| handshakes look exactly like those of a real browser. ```
|
| For example, this will get you past Cloudflare's bot detection.
| 01HNNWZ0MV43FF wrote:
| The README indicates that this fork is compiled with nss (from
| Firefox) and BoringSSL (from Chromium) to resist fingerprinting
| based on the TLS lib. CLI flags won't do that.
| bossyTeacher wrote:
| Cool tool but it shouldn't matter whether the client is a browser
| or not. I feel sad that we need such a tool in the real world
| brutal_chaos_ wrote:
| You may enter our site iff you use software we approve.
| Anything else will be seen as malicious. Papers please!
|
| I, too, am saddened by this gatekeeping. IIUC custom browsers
| (or user-agent) from scratch will never work on cloudflare
| sites and the like until the UA has enough clout (money, users,
| etc) to sway them.
| DrillShopper wrote:
| This was sadly always going to be the outcome of the Internet
| going commercial.
|
| There's too much lost revenue in open things for companies to
| embrace fully open technology anymore.
| jrockway wrote:
| It's kind of the opposite problem as well; huge well-funded
| companies bringing down open source project websites. See
| Xe's journey here: https://xeiaso.net/blog/2025/anubis/
|
| One may posit "maybe these projects should cache stuff so
| page loads aren't actually expensive" but these things are
| best-effort and not the core focus of these projects. You
| install some Git forge or Trac or something and it's Good
| Enough for your contributors to get work done. But you have
| to block the LLM bots because they ignore robots.txt and
| naively ask for the same expensive-to-render page over and
| over again.
|
| The commercial impact is also not to be understated. I
| remember when I worked for a startup with a cloud service.
| It got talked about here, and suddenly every free-for-open-
| source CI provider IP range was signing up for free trials
| in a tight loop. These mechanical users had to be blocked.
| It made me sad, but we wanted people to use our product,
| not mine crypto ;)
| burnished wrote:
| >> Otherwise your users have to see a happy anime girl
| every time they solve a challenge. This is a feature.
|
| I love that human, what a gem
| everfrustrated wrote:
| Wait until you hear many antivirus/endpoint software block
| "recent" domain names from being loaded. According to them
| new domains are only used by evil people and should be
| blocked.
| jimt1234 wrote:
| About six months ago I went to a government auction site that
| _required_ Internet Explorer. Yes, Internet Explorer. The site
| was active, too; the auction data was up-to-date. I added a
| user-agent extension in Chrome, switched to IE, retried and it
| worked; all functionality on the site was fine. So yeah, I was
| both sad and annoyed. My guess is this government office paid
| for a website 25 years ago and it hasn 't been updated since.
| IMSAI8080 wrote:
| Yeah it's probably an ancient web site. This was commonplace
| back in the day when Internet Explorer had 90%+ market share.
| Lazy web devs couldn't be bothered to support other browsers
| (or didn't know how) so just added a message demanding you
| use IE as opposed to fixing the problems with the site.
| jorvi wrote:
| In South Korea, ActiveX is still required for many things
| like banking and government stuff. So they're stuck with both
| IE and the gaping security hole in it that is ActiveX.
| asddubs wrote:
| is this still true? I know this was the case in the past,
| but even in 2025?
| kijin wrote:
| Not really. You can access any Korean bank or government
| website using Chrome, and they actually recommend Chrome
| these days.
|
| They still want to install a bunch of programs on your
| computer, though. It's more or less the same stuff that
| used to be written as ActiveX extensions, but rewritten
| using modern browser APIs. :(
| VladVladikoff wrote:
| Wait a sec... if the TLS handshakes look different, would it be
| possible to have an nginx level filter for traffic that claims to
| be a web browser (eg chrome user agent), yet really is a
| python/php script? Because this would account for the vast
| majority of malicious bot traffic, and I would love to just block
| it.
| gruez wrote:
| That's basically what security vendors like cloudflare does,
| except with even more fingerprinting, like a javascript
| challenge that checks the js interpreter/DOM.
| walrus01 wrote:
| JS to check user agent things like screen window dimensions
| as well, which legit browsers will have and bots will also
| present but with a more uniform and predictable set of x and
| y dimensions per set of source IPs. Lots of possibilities for
| js endpoint fingerprinting.
| Fripplebubby wrote:
| I also present a uniform and predictable set of x and y
| dimensions per source IPs as a human user who maximizes my
| browser window
| gruez wrote:
| Maximizing reduces the variations, but there's still
| quite a bit of variation because of different display
| resolution + scaling settings + OS configuration (eg.
| short or tall taskbars).
| walrus01 wrote:
| Or settings like auto-hide MacOS dock vs not auto hide,
| affecting the vertical size of the browser window.
| aaron42net wrote:
| Cloudflare uses JA3 and now JA4 TLS fingerprints, which are
| hashes of various TLS handshake parameters.
| https://github.com/FoxIO-LLC/ja4/blob/main/technical_details...
| has more details on how that works, and they do offer an Nginx
| module: https://github.com/FoxIO-LLC/ja4-nginx-module
| immibis wrote:
| Yes, and sites are doing this and it absolutely sucks because
| it's not reliable and blocks everyone who isn't using the
| latest Chrome on the latest Windows. Please don't whitelist TLS
| fingerprints unless you're actually under attack right now.
| fc417fc802 wrote:
| If you're going to whitelist (or block at all really) please
| simply redirect all rejected connections to a proof of work
| scheme. At least that way things continue to work with only
| mild inconvenience.
| jrochkind1 wrote:
| I am _very_ curious if the current wave of mystery
| distributed (AI?) bots will just run javascript and be able
| to get past proof of work too....
|
| Based on the fact that they are requesting the same
| _absolutely useless and duplicative_ pages (like every
| possible combniation of query params even if it does not
| lead to unique content) from me _hundreds of times per url_
| , and are able to distribute so much that I'm only getting
| 1-5 requests per day from each IP...
|
| ...cost does not seem to be a concern for them? Maybe they
| won't actually mind ~5 seconds of CPU on a proof of work
| either? They are really a mystery to me.
|
| I currently am using CloudFlare Turnstile, which
| incorporates proof of work but also various other signals,
| which is working, but I know does have false positives. I
| am working on implementing a simpler nothing but JS proof
| of work (SHA-512-based), and am going to switch that in and
| if it works great (becuase I don't want to keep out the
| false positives!), but if it doesn't, back to Turnstile.
|
| The mystery distributred idiot bots were too much. (Scaling
| up resources -- they just scaled up their bot rates too!!!)
| I don't mind people scraping if they do it respectfully and
| reasonably; taht's not what's been going on, and it's an
| internet-wide phenomenon of the past year.
| RKFADU_UOFCCLEL wrote:
| Blocking a hacking attack is not even a thing, they just
| change IP address each time they learn a new fact about how
| your system works and progress smoothly without interruption
| until they exfiltrate your data. Same goes for scrapers the
| only difference being there is no vulnerability to fix that
| will stop them.
| jrochkind1 wrote:
| Well, I think that's what OP is meant to avoid you doing,
| exactly.
| ryao wrote:
| Did they also set IP_TTL to set the TTL value to match the
| platform being impersonated?
|
| If not, then fingerprinting could still be done to some extent at
| the IP layer. If the TTL value in the IP layer is below 64, it is
| obvious this is either not running on modern Windows or is
| running on a modern Windows machine that has had its default TTL
| changed, since by default the TTL of packets on modern Windows
| starts at 128 while most other platforms start it at 64. Since
| the other platforms do not have issues communicating over the
| internet, so IP packets from modern Windows will always be seen
| by the remote end with TTLs at or above 64 (likely just above).
|
| That said, it would be difficult to fingerprint at the IP layer,
| although it is not impossible.
| xrisk wrote:
| Wouldn't the TTL value of received packets depend on network
| conditions? Can you recover the client's value from the server?
| ralferoo wrote:
| The argument is that if the many (maybe the majority) of
| systems are sending packets with a TTL of 64 and they don't
| experience problems on the internet, then it stands to reason
| that almost everywhere on the internet is reachable in less
| than 64 hops (personally, I'd be amazed if it any routes are
| actually as high as 32 hops).
|
| If everywhere is reachable in under 64 hops, then packets
| sent from systems that use a TTL of 128 will arrive at the
| destination with a TTL still over 64 (or else they'd have
| been discarded for all the other systems already).
| ryao wrote:
| Windows 9x used a TTL of 32. I vaguely recall hearing that
| it caused problems in extremely exotic cases, but that
| could have been misinformation. I imagine that >99.999% of
| the time, 32 is enough. This makes fingerprinting via TTL
| to distinguish between those who set it at 32, 64, 128 and
| 255 (OpenSolaris and derivatives) viable. That said, almost
| nobody uses Windows 9x or OpenSolaris derivatives on the
| internet these days, so I used values from systems that
| they do use for my argument that fingerprinting via TTL is
| possible.
| gruez wrote:
| >That said, it would be difficult to fingerprint at the IP
| layer, although it is not impossible.
|
| Only if you're using PaaS/IaaS providers don't give you low
| level access to the TCP/IP stack. If you're running your own
| servers it's trivial to fingerprint all manner of TCP/IP
| properties.
|
| https://en.wikipedia.org/wiki/TCP/IP_stack_fingerprinting
| ryao wrote:
| I meant it is difficult relative to fingerprinting TLS and
| HTTP. The information is not exported by the berkeley socket
| API unless you use raw sockets and implement your own
| userland TCP stack.
| sneak wrote:
| Couldn't you just monitor the inbound traffic and associate
| the packets to the connections? Doing your own TCP seems
| silly.
| gruez wrote:
| Yeah, some sort of packet mirroring setup (eg. in
| iptables or at the switch level) + packet capture tool
| should be enough. Then you just need to join the data
| from the packet capture program/machine with your load
| balancer, using src ip + port + time.
| fc417fc802 wrote:
| What is the reasoning behind TTL counting down instead of up,
| anyway? Wouldn't we generally expect those routing the traffic
| to determine if and how to do so?
| sadjad wrote:
| The primary purpose of TTL is to prevent packets from looping
| endlessly during routing. If a packet gets stuck in a loop,
| its TTL will eventually reach zero, and then it will be
| dropped.
| fc417fc802 wrote:
| That doesn't answer my question. If it counted up then it
| would be up to each hop to set its own policy. Things
| wouldn't loop endlessly in that scenario either.
| burnished wrote:
| This is a wild guess but: I am under the impression that
| the early internet was built somewhat naively so I guess
| that the sender sets it because they know best how long
| it stays relevant for/when it makes sense to restart or
| fail rather than wait.
| knome wrote:
| It does make traceroute, where each packet is fired with
| one more available step than the last, feasible, whereas
| 'up' wouldn't. Of course, then we'd just start with max-
| hops and walk the number down I suppose. I still expect
| it would be inconvenient during debugging for various
| devices to have various ceilings.
| ryao wrote:
| Then random internet routers could break internet traffic
| by setting it really low and the user could not do a
| thing about it. They technically still can by discarding
| all traffic whose value is less than some value, but they
| don't. The idea that they should set their own policy
| could fundamentally break network traffic flows if it
| ever became practiced.
| ryao wrote:
| If your doctor says you have only 128 days to live, you count
| down, not up. TTL is time to live, which is the same thing.
| therealcamino wrote:
| To allow the sender to set the TTL, right? Without adding
| another field to the packet header.
|
| If you count up from zero, then you'd also have to include in
| every packet how high it can go, so that a router has enough
| info to decide if the packet is still live. Otherwise every
| connection in the network would have to share the same fixed
| TTL, or obey the TTL set in whatever random routers it goes
| through. If you count down, you're always checking against
| zero.
| jruohonen wrote:
| The notion of real-world TLS/HTTP fingerprinting was somewhat new
| to me, and it looks interesting in theory, but I wonder what the
| build's use case really is? I mean you have the heavy-handed
| JavaScript running everywhere now.
| jamal-kumar wrote:
| This tool is pretty sweet in little bash scripts combo'd up with
| gnu parallel on red team engagements for mapping https endpoints
| within whatever scoped address ranges that will only respond to
| either proper browsers due to whatever, or with the SNI stuff in
| order. Been finding it super sweet for that. Can do all the
| normal curl switches like -H for header spoofing
| davidsojevic wrote:
| There's a fork of this that has some great improvements over to
| the top of the original and it is also actively maintained:
| https://github.com/lexiforest/curl-impersonate
|
| There's also Python bindings for the fork for anyone who uses
| Python: https://github.com/lexiforest/curl_cffi
| nyanpasu64 wrote:
| I suppose it does make sense that a "make curl look like a
| browser" program would get sponsored by "bypass bot detection"
| services...
| ImHereToVote wrote:
| Easy. Just make a small fragment shader to produce a token in
| your client. No bot is going to waste GPU resources to
| compile your shader.
| kelsey978126 wrote:
| Why do people even think this? Bots almost always just use
| headful instrumented browsers now. if a human sitting at a
| keyboard can load the content, so can a bot.
| simpaticoder wrote:
| Security measures never prevent all abuse. They raise the
| cost of abuse above an acceptable threshold. Many things
| work like this. Cleaning doesn't eliminate dirt, it
| dilutes the dirt below an acceptable threshold. Same for
| "repairing" and "defects", and some other pairs of things
| that escape me atm.
| abofh wrote:
| That's the same argument as CAPTCHA's - as far as I know
| there are no bots protesting them making their lives
| harder, but as a human - my life is much harder than it
| needs to be because things need me to prove I'm a human.
|
| Clean for data ingestion usually means complicated for
| data creation - optimizing for the advertisers has
| material cash value downstream, but customers are
| upstream, and making it harder is material too.
| ImHereToVote wrote:
| What is so hard about running a fragment shader after the
| site has loaded?
| abofh wrote:
| I have to assume /s, but lacking that -- Why can't you
| just allow `curl`? You need a human for advertising
| dollars or a poor mechanism of rate limiting. I want to
| use your service. If you're buying me a fragment shader,
| I guess that's fine, but I'm feeding it to the dogs, not
| plugging in your rando hardware in to my web-browser.
| ImHereToVote wrote:
| We are talking about Curl bots here. How is what you are
| saying relevant?
| cAtte_ wrote:
| no, nyanpasu64's comment extended the discussion to
| general bot detection
| zffr wrote:
| Can't a bot just collect a few real tokens and then send
| those instead of trying to run the shader?
| ImHereToVote wrote:
| How do you automate that? Just generate a new token for
| each day.
| gruez wrote:
| Can't they use a software renderer like swiftshader? You
| don't need to pass in an actual gpu through virtio or
| whatever.
| ImHereToVote wrote:
| Maybe you can call a WebGL extension that isn't
| supported. Or better yet have a couple of overdraws of
| quads. Their bot will handle it, but it will throttle
| their CPU like gangbusters.
| gruez wrote:
| Sounds like a PoW system with extra steps?
| illegally wrote:
| There's also a module for fully integrating this with the
| Python requests library: https://github.com/el1s7/curl-adapter
| RKFADU_UOFCCLEL wrote:
| All these "advanced" technologies that change faster than I can
| turn my neck, to make a simple request that looks like it was
| one of the "certified" big 3 web browsers, which will
| ironically tax the server less than a certified browser. Is
| this the nightmare dystopia I was warned about in the 90's? I
| wonder if anyone here can name the one company that is
| responsible for this despite positioning themselves as a good
| guy open source / hacker community contributor.
| userbinator wrote:
| I'm always ambivalent about things like this showing up here. On
| one hand, it's good to let others know that there is still that
| bit of rebelliousness and independence alive amongst the
| population. On the other hand, much like other "freedom is
| insecurity" projects, attracting unwanted attention may make it
| worse for those who rely on them.
|
| Writing a browser is hard, and the incumbents are continually
| making it harder.
| jolmg wrote:
| Your comment makes it sound like a browser being
| fingerprintable is a desired property by browser developers.
| It's just something that happens on its own from different
| people doing things differently. I don't see this as being
| about rebelliousness. Software being fingerprintable erodes
| privacy and software diversity.
| gkbrk wrote:
| Not all browsers, but Chrome certainly desires to be
| fingerprintable. They even try to cryptographically prove
| that the current browser is an unmodified Chrome with Web
| Environment Integrity [1].
|
| Doesn't get more fingerprintable than that. They provide an
| un-falsifiable certificate that "the current browser is an
| unmodified Chrome build, running on an unmodified Android
| phone with secure boot".
|
| If they didn't want to fingerprintable, they could just not
| do that and spend all the engineering time and money on
| something else.
|
| [1]: https://en.wikipedia.org/wiki/Web_Environment_Integrity
| matt-p wrote:
| I do kind of yern for the simpler days when if a website didn't
| mind bots it allowed it and if they did they blocked your user
| agent.
| andrethegiant wrote:
| Back then websites weren't so resource intensive. The negative
| backlash towards bots is kind of a side effect of how demanding
| expectations of web experiences has become.
| yard2010 wrote:
| Now I'm waiting for the MCP version of this.. :)
| andrethegiant wrote:
| https://github.com/puremd/puremd-mcp handles this, probably
| some other MCP servers out there that handle this too
| doctor_radium wrote:
| Kudos to the coder and the poster. I'm involved in a browser
| project that runs on OpenSSL, and figured I'd have to dig through
| WireShark myself at some point to figure this stuff out. Well, I
| may still need to, but now have many points of reference. If the
| most common use of OpenSSL is Python, then in the age of
| Cloudflare, a Firefox TLS spoofing option isn't just a good idea,
| it's a necessity.
| INTPenis wrote:
| Only three patches and shell wrappers, this should get Daniel
| coding. Imho this should definitely be in mainline curl.
| lcfcjs6 wrote:
| ive been using puppeteer to query and read responses from
| deepseek.com, it works really well but i have to use a stealth
| mode and "headed" version to make it think its a person
| GNOMES wrote:
| I had to do something like this with Ansible's get_url module
| once.
|
| Was having issues getting module to download an installer from a
| vendors site.
|
| Played with Curl/WGET, but was running into the same, while it
| worked from a browser.
|
| I ended up getting both Curl + get_url to work by passing the
| same headers my browser sent such as User-Agent, encoding, etc
| ck2 wrote:
| Good luck getting past imperva
|
| If you thought cloudflare challenge can be bad, imperva doesn't
| even want most humans through
| 1vuio0pswjnm7 wrote:
| "For these reasons, some web services use the TLS and HTTP
| handshakes to fingerprint which client is accessing them, and
| then present different content for different clients."
|
| Examples: [missing]
___________________________________________________________________
(page generated 2025-04-04 23:01 UTC)