[HN Gopher] Implementing fast TCP fingerprinting with eBPF
       ___________________________________________________________________
        
       Implementing fast TCP fingerprinting with eBPF
        
       Author : halb
       Score  : 64 points
       Date   : 2025-06-29 10:59 UTC (12 hours ago)
        
 (HTM) web link (halb.it)
 (TXT) w3m dump (halb.it)
        
       | OutOfHere wrote:
       | More useless and harmful anti-bot nonsense, probably with many
       | false detections, when a simple and neutral rate-limiting 429
       | does the job.
        
         | halb wrote:
         | I guess the blame is on me here for providing only a very brief
         | context on the topic, which makes it sound like this is just
         | anti-scraping solutions.
         | 
         | This kind of fingerprinting solutions are widely used
         | everywhere, and they don't have the goal of directly detecting
         | or blocking bots, especially harmless scrapers. They just
         | provide an additional datapoint which can be used to track
         | patterns in website traffic, and eventually block fraud or
         | automated attacks - that kind of bots.
        
           | OutOfHere wrote:
           | If it's making a legitimate request, it's not an automated
           | attack. If it's exceeding its usage quota, that's a simple
           | problem that doesn't require eBPF.
        
             | halb wrote:
             | What kind of websites do you have in mind when I talk about
             | fraud patterns? not everything is a static website, and I
             | absolutely agree with you on that point: If your static
             | website is struggling under the load of a scraper there is
             | something deeply wrong with your architecture. We live in
             | wonderful times, Nginx on my 2015 laptop can gracefully
             | handle 10k Requests per second before I even activate
             | ratelimiting.
             | 
             | Unfortunately there are bad people out there, and they know
             | how to write code. Take a look at popular websites like
             | TikTok, amazon, or facebook. They are inundated by fraud
             | requests whose goal is to use their services in a way that
             | is harmful to others, or straight up illegal. From spam to
             | money laundering. On social medial, bots impersonate people
             | in an attempt to influence public discourse and undermine
             | democracies.
        
             | Retr0id wrote:
             | This is an overly simplistic view that does not reflect
             | reality in 2025.
        
               | OutOfHere wrote:
               | The simple reality is that if you don't want to put
               | something online, then don't put it online. If something
               | should be behind locked doors, then put it behind locked
               | doors. Don't do the dance of promising to have something
               | online, then stop legitimate users when they request it.
               | That's basically what a lot of "spam blockers" do -- they
               | block a ton of legitimate use as well.
        
         | konsalexee wrote:
         | Sure, buts its a nice exploration to layer 4 type of detection
        
         | aorth wrote:
         | Why is it useless and harmful? Many of us are struggling--
         | without massive budgets or engineering teams--to keep services
         | up due to incredible load from scrapers in recent years. We do
         | use rate limiting, but scrapers circumvent it with residential
         | proxies and brute force. I often see concurrent requests from
         | hundreds or thousands of IPs in _one_ data center. Who do these
         | people think they are?
        
           | OutOfHere wrote:
           | It is harmful because innocent users routinely get caught in
           | your dragnet. And why even have a public website if the goal
           | is not to serve it?
           | 
           | What is the actual problem with serving users? You mentioned
           | incredible load. I would stop using inefficient PHP or
           | JavaScript or Ruby for web servers. I would use Go or Rust or
           | a comparable efficient server with native concurrency.
           | Survival always requires adaptation.
           | 
           | How do you know that the alleged proxies belong to the same
           | scrapers? I would look carefully at the values contained in
           | the IP chain as determined by XFF to know which subnets to
           | rate-limit as per their membership in the XFF.
           | 
           | Another way is to require authentication for expensive
           | endpoints.
        
           | immibis wrote:
           | Residential proxy users are paying on the order of $5 per
           | gigabyte, so send them really big files once detected. Or
           | "click here to load the page properly" followed by a trickle
           | of garbage data.
        
             | OutOfHere wrote:
             | There is no real way to confidently tell if someone using a
             | residential proxy.
        
               | immibis wrote:
               | Once you spot a specific pattern you can detect that
               | pattern.
        
         | ghotli wrote:
         | I downvoted you due to the way you're communicating in this
         | thread. Be kind, rewind. Review the guidelines here perhaps
         | since your account is only a little over a year old.
         | 
         | I found this article useful and insightful. I don't have a bot
         | problem at present I have an adjacent problem and found this
         | context useful for an ongoing investigation.
        
         | DamonHD wrote:
         | Almost nothing pays attention to 429s, at least not in a good
         | way, including big-name sites. I've written a whole paper about
         | it...
        
           | noident wrote:
           | Who cares if they pay attention to 429s? Your load balancer
           | is giving them the boot, and your expensive backend resources
           | aren't being wasted. They can make requests until the cows
           | come home; they're not getting anything until they slow down.
        
             | ranger_danger wrote:
             | If you're rate-limiting by IP, well... some entire
             | countries have only a handful (or one) externally visible
             | IP.
        
         | ranger_danger wrote:
         | As a rule, strong feelings about issues do not emerge from deep
         | understanding. -Sloman and Fernbach
        
       | b0a04gl wrote:
       | why do fingerprinting always happens right at connection start
       | ,usually gives clean metadata during tcp syn. but what is it for
       | components like static proxies or load balancers or mobile
       | networks ,all of these can shift stack behavior midstream. this
       | can make this activity itself a obsolete
        
         | halb wrote:
         | This is a good point. I guess that if you have the luxury of
         | controlling the front-end side of the web application you can
         | implement a system that polls the server routinely. Over time
         | this will give you a clearer picture. You can notice that most
         | real-world fingerprint systems run in part on the Javascript
         | side, which enables all sort of tricks.
        
         | 10000truths wrote:
         | One of the biggest use cases for fingerprinting is as a way to
         | reject requests from bot traffic, as mentioned in the article.
         | That accept/reject decision should be made as early in the
         | session lifecycle as possible to minimize resource impact and
         | prevent exfiltration of data. You're right that TCP flags don't
         | provide as much signal, as the TCP stack is mostly handled by
         | the OS and middleboxes. A better source of fingerprinting info
         | is in the TLS handshake - it has a lot more configurability,
         | and is strongly correlated with the user agent.
        
       | benreesman wrote:
       | I have work reasons for needing to learn a lot about kernel-level
       | networking primitives (it turns out tcpdump and eBPF compatible
       | with almost anything, no "but boss, foobar is only compatible
       | with bizbazz 7 or above!").
       | 
       | So when an LLM vendor that shall remain nameless had a model
       | start misidentifying itself while the website was complaining
       | about load... I decided to get to the bottom of it.
       | 
       | eBPF cuts through TLS obfuscation like a bunker buster bomb
       | through a ventilation shaft or was it, well you know what I mean.
        
       ___________________________________________________________________
       (page generated 2025-06-29 23:00 UTC)