[HN Gopher] Show HN: An open source access logs analytics script...
       ___________________________________________________________________
        
       Show HN: An open source access logs analytics script to block bot
       attacks
        
       This is a small PoC Python project for web server access logs
       analyzing to classify and dynamically block bad bots, such as L7
       (application-level) DDoS bots, web scrappers and so on.  We'll be
       happy to gather initial feedback on usability and features,
       especialy from people having good or bad experience wit bots.
       *Requirements*  The analyzer relies on 3 Tempesta FW specific
       features which you still can get with other HTTP servers or
       accelerators:  1. JA5 client fingerprinting (https://tempesta-
       tech.com/knowledge-base/Traffic-Filtering-b...). This is a HTTP and
       TLS layers fingerprinting, similar to JA4
       (https://blog.foxio.io/ja4%2B-network-fingerprinting) and JA3
       fingerprints. The last is also available in Envoy
       (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extension...)
       or Nginx module (https://github.com/fooinha/nginx-ssl-ja3), so
       check the documentation for your web server  2. Access logs are
       directly written to Clickhouse analytics database, which can
       cunsume large data batches and quickly run analytic queries. For
       other web proxies beside Tempesta FW, you typically need to build a
       custom pipeline to load access logs into Clickhouse. Such pipeliens
       aren't so rare though.  3. Abbility to block web clients by IP or
       JA5 hashes. IP blocking is probably available in any HTTP proxy.
       *How does it work*  This is a daemon, which  1. Learns normal
       traffic profiles: means and standard deviations for client requests
       per second, error responses, bytes per second and so on. Also it
       remembers client IPs and fingerprints.  2. If it sees a spike in
       z-score (https://en.wikipedia.org/wiki/Standard_score) for traffic
       characteristics or can be triggered manually. Next, it goes in data
       model search mode  3. For example, the first model could be top 100
       JA5 HTTP hashes, which produce the most error responses per second
       (typical for password crackers). Or it could be top 1000 IP
       addresses generating the most requests per second (L7 DDoS). Next,
       this model is going to be verified  4. The daemon repeats the
       query, but for some time, long enough history, in the past to see
       if in the past we saw a hige fraction of clients in both the query
       results. If yes, then the model is bad and we got to previous step
       to try another one. If not, then we (likely) has found the
       representative query.  5. Transfer the IP addresses or JA5 hashes
       from the query results into the web proxy blocking configuration
       and reload the proxy configuration (on-the-fly).
        
       Author : krizhanovsky
       Score  : 13 points
       Date   : 2025-10-14 19:15 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       ___________________________________________________________________
       (page generated 2025-10-14 23:00 UTC)