[HN Gopher] Reverse Engineering Vercel's BotID
       ___________________________________________________________________
        
       Reverse Engineering Vercel's BotID
        
       Author : hazebooth
       Score  : 84 points
       Date   : 2025-06-30 12:19 UTC (10 hours ago)
        
 (HTM) web link (www.nullpt.rs)
 (TXT) w3m dump (www.nullpt.rs)
        
       | codedokode wrote:
       | Note that the bot detection script uses WebGL to obtain GPU name.
       | I assume this (fingerprinting) is the most popular use of WebGL.
       | Sad that independent browsers like Firefox do not supply fake
       | values.
        
         | nullpt_rs wrote:
         | Sadly, spoofing GPU vendor & renderer can be an even larger
         | flag since they can hash the resulting image of the canvas to
         | compare it with a database of collected fingerprints[0]
         | 
         | [0]: https://research.google/pubs/picasso-lightweight-device-
         | clas...
        
           | reaperducer wrote:
           | Until a major player gets on board. Then it works.
           | 
           | Apple does this by sending an imposter user agent from Safari
           | on iPads.
           | 
           | If only that was expanded to iPhones, too. And then send
           | rotating, or randomized user agents.
        
             | nerdsniper wrote:
             | Apple does it because they don't have a vested financial
             | interest in internet-wide tracking.
             | 
             | Google does.
             | 
             | And while Mozilla does too because the vast majority of
             | their funding comes from Google, it's more pertinent that
             | they don't have the market share to pull this off. Firefox
             | would just stop working on major websites if they did this.
        
           | andrewmcwatters wrote:
           | It's funny that trying to click on the Google Scholar link
           | there falsely identifies me as a bot.
        
         | grishka wrote:
         | IMO the use of <canvas> needs to be behind a permission prompt,
         | the same as e.g. geolocation or WebRTC. Few websites _actually
         | need_ canvas /WebGL for legitimate purposes.
        
       | ATechGuy wrote:
       | > At the moment, it seems Basic mode is so basic that it allows
       | everything to pass as human. That'll likely change as they gather
       | more telemetry to better identify what a bot signal looks like.
       | 
       | So they are basically collecting telemetry in the name of "free
       | basic anti-bot" solution.
        
         | cchance wrote:
         | free basic anti-bot solution that literally NEVER BLOCKS A BOT,
         | like what the actual fuck
        
       | b0a04gl wrote:
       | why is bot detection even happening at render time instead of
       | request time. why can't tell you're a bot from your headers, UA,
       | IP, TLS fingerprint. imo making it a surveillance. 'you're a bot,
       | ok not just go away, let's fingerprint your GPU and assign you a
       | behavioral risk score anyway'
        
         | n2d4 wrote:
         | It's really hard to detect it at request time. It's practically
         | trivial for an attacker to fake headers to resemble a real
         | browser.
        
           | indrora wrote:
           | Anubis does it pretty decently.
        
           | baby_souffle wrote:
           | You absolutely have options at request time. Arguably, some
           | of the things you can only do at request time are part of a
           | full and complete mitigation strategy.
           | 
           | You can fingerprint the originating TCP stack with some
           | degree of confidence. If the request looks like it came from
           | a Linux server but the user agent says Windows, that's a
           | signal.
           | 
           | Likewise, the IP address making the request has geographic
           | information associated with it. If my IP address says I'm in
           | Romania but my browser is asking for the English language
           | version of the page... That's a signal.
           | 
           | Similar to basic IP/Geo, you can do DNS and STUN based
           | profiling, too. This helps you catch people that are behind
           | proxies or VPNs.
           | 
           | To blur the line, you can use JavaScript to measure request
           | timing. Proxies that are going to tamper with the request to
           | hide its origins or change its fingerprint will add a
           | measurable latency.
        
             | n2d4 wrote:
             | None of these are conclusive by any means. The IP address
             | check you mentioned would mark anyone using a VPN, or
             | English speakers living abroad. Modern bot detection
             | combines lots of heuristics like these together, and being
             | able to run JavaScript in the browser (at render-time) adds
             | a lot more data that can be used to make a better
             | prediction.
        
             | cAtte_ wrote:
             | > If my IP address says I'm in Romania but my browser is
             | asking for the English language version of the page...
             | That's a signal.
             | 
             | jesus christ don't give them ideas. it's annoying enough to
             | have my country's language forced on me (i prefer english)
             | when there's a perfectly good http header for that. now
             | _blocking_ me based on this?!
        
       ___________________________________________________________________
       (page generated 2025-06-30 23:01 UTC)