[HN Gopher] Implementing fast TCP fingerprinting with eBPF
___________________________________________________________________
Implementing fast TCP fingerprinting with eBPF
Author : halb
Score : 64 points
Date : 2025-06-29 10:59 UTC (12 hours ago)
(HTM) web link (halb.it)
(TXT) w3m dump (halb.it)
| OutOfHere wrote:
| More useless and harmful anti-bot nonsense, probably with many
| false detections, when a simple and neutral rate-limiting 429
| does the job.
| halb wrote:
| I guess the blame is on me here for providing only a very brief
| context on the topic, which makes it sound like this is just
| anti-scraping solutions.
|
| This kind of fingerprinting solutions are widely used
| everywhere, and they don't have the goal of directly detecting
| or blocking bots, especially harmless scrapers. They just
| provide an additional datapoint which can be used to track
| patterns in website traffic, and eventually block fraud or
| automated attacks - that kind of bots.
| OutOfHere wrote:
| If it's making a legitimate request, it's not an automated
| attack. If it's exceeding its usage quota, that's a simple
| problem that doesn't require eBPF.
| halb wrote:
| What kind of websites do you have in mind when I talk about
| fraud patterns? not everything is a static website, and I
| absolutely agree with you on that point: If your static
| website is struggling under the load of a scraper there is
| something deeply wrong with your architecture. We live in
| wonderful times, Nginx on my 2015 laptop can gracefully
| handle 10k Requests per second before I even activate
| ratelimiting.
|
| Unfortunately there are bad people out there, and they know
| how to write code. Take a look at popular websites like
| TikTok, amazon, or facebook. They are inundated by fraud
| requests whose goal is to use their services in a way that
| is harmful to others, or straight up illegal. From spam to
| money laundering. On social medial, bots impersonate people
| in an attempt to influence public discourse and undermine
| democracies.
| Retr0id wrote:
| This is an overly simplistic view that does not reflect
| reality in 2025.
| OutOfHere wrote:
| The simple reality is that if you don't want to put
| something online, then don't put it online. If something
| should be behind locked doors, then put it behind locked
| doors. Don't do the dance of promising to have something
| online, then stop legitimate users when they request it.
| That's basically what a lot of "spam blockers" do -- they
| block a ton of legitimate use as well.
| konsalexee wrote:
| Sure, buts its a nice exploration to layer 4 type of detection
| aorth wrote:
| Why is it useless and harmful? Many of us are struggling--
| without massive budgets or engineering teams--to keep services
| up due to incredible load from scrapers in recent years. We do
| use rate limiting, but scrapers circumvent it with residential
| proxies and brute force. I often see concurrent requests from
| hundreds or thousands of IPs in _one_ data center. Who do these
| people think they are?
| OutOfHere wrote:
| It is harmful because innocent users routinely get caught in
| your dragnet. And why even have a public website if the goal
| is not to serve it?
|
| What is the actual problem with serving users? You mentioned
| incredible load. I would stop using inefficient PHP or
| JavaScript or Ruby for web servers. I would use Go or Rust or
| a comparable efficient server with native concurrency.
| Survival always requires adaptation.
|
| How do you know that the alleged proxies belong to the same
| scrapers? I would look carefully at the values contained in
| the IP chain as determined by XFF to know which subnets to
| rate-limit as per their membership in the XFF.
|
| Another way is to require authentication for expensive
| endpoints.
| immibis wrote:
| Residential proxy users are paying on the order of $5 per
| gigabyte, so send them really big files once detected. Or
| "click here to load the page properly" followed by a trickle
| of garbage data.
| OutOfHere wrote:
| There is no real way to confidently tell if someone using a
| residential proxy.
| immibis wrote:
| Once you spot a specific pattern you can detect that
| pattern.
| ghotli wrote:
| I downvoted you due to the way you're communicating in this
| thread. Be kind, rewind. Review the guidelines here perhaps
| since your account is only a little over a year old.
|
| I found this article useful and insightful. I don't have a bot
| problem at present I have an adjacent problem and found this
| context useful for an ongoing investigation.
| DamonHD wrote:
| Almost nothing pays attention to 429s, at least not in a good
| way, including big-name sites. I've written a whole paper about
| it...
| noident wrote:
| Who cares if they pay attention to 429s? Your load balancer
| is giving them the boot, and your expensive backend resources
| aren't being wasted. They can make requests until the cows
| come home; they're not getting anything until they slow down.
| ranger_danger wrote:
| If you're rate-limiting by IP, well... some entire
| countries have only a handful (or one) externally visible
| IP.
| ranger_danger wrote:
| As a rule, strong feelings about issues do not emerge from deep
| understanding. -Sloman and Fernbach
| b0a04gl wrote:
| why do fingerprinting always happens right at connection start
| ,usually gives clean metadata during tcp syn. but what is it for
| components like static proxies or load balancers or mobile
| networks ,all of these can shift stack behavior midstream. this
| can make this activity itself a obsolete
| halb wrote:
| This is a good point. I guess that if you have the luxury of
| controlling the front-end side of the web application you can
| implement a system that polls the server routinely. Over time
| this will give you a clearer picture. You can notice that most
| real-world fingerprint systems run in part on the Javascript
| side, which enables all sort of tricks.
| 10000truths wrote:
| One of the biggest use cases for fingerprinting is as a way to
| reject requests from bot traffic, as mentioned in the article.
| That accept/reject decision should be made as early in the
| session lifecycle as possible to minimize resource impact and
| prevent exfiltration of data. You're right that TCP flags don't
| provide as much signal, as the TCP stack is mostly handled by
| the OS and middleboxes. A better source of fingerprinting info
| is in the TLS handshake - it has a lot more configurability,
| and is strongly correlated with the user agent.
| benreesman wrote:
| I have work reasons for needing to learn a lot about kernel-level
| networking primitives (it turns out tcpdump and eBPF compatible
| with almost anything, no "but boss, foobar is only compatible
| with bizbazz 7 or above!").
|
| So when an LLM vendor that shall remain nameless had a model
| start misidentifying itself while the website was complaining
| about load... I decided to get to the bottom of it.
|
| eBPF cuts through TLS obfuscation like a bunker buster bomb
| through a ventilation shaft or was it, well you know what I mean.
___________________________________________________________________
(page generated 2025-06-29 23:00 UTC)