[HN Gopher] Show HN: Logparser - Alternative to GoAccess Written...
       ___________________________________________________________________
        
       Show HN: Logparser - Alternative to GoAccess Written in Python
        
       Author : lcnmrn
       Score  : 52 points
       Date   : 2021-09-23 12:05 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | simonw wrote:
       | Suggestion: take the time to package this up for PyPI as
       | something people can install using "pip install" (or "pipx
       | install").
       | 
       | This is hard the first time you do it, but worth learning because
       | it's a really great way to distribute your Python software.
       | 
       | I'm giving a talk about how to do this at PyGotham next month,
       | but the notes from that talk are already available and may be
       | useful to you: https://github.com/simonw/pygotham-packaging
       | 
       | You may also find this cookiecutter template that I use to build
       | and package Python CLI apps helpful:
       | https://github.com/simonw/click-app
        
         | dwohnitmok wrote:
         | Is there a way to distribute proprietary software with PyPI?
         | Based on the license text it appears the author wishes to keep
         | it proprietary (maybe source-available, but not open source).
        
           | uranusjr wrote:
           | I suppose you actually mean close source? Because it's
           | trivial to distribute proprietary code on PyPI: Just say that
           | in your license.
           | 
           | There is no true "close source" for pure Python programs, but
           | if obfuscation is close enough, you can choose to only deploy
           | wheels containing pre-compiled pyc files. This is good enough
           | for most situations.
        
       | vsajip wrote:
       | The license is currently just                   All rights
       | reserved.              Copyright (c) 2020 Lucian Marin
       | 
       | It was the MIT license at the time of initial commit, and been
       | updated to this. So it's not immediately clear if anyone else can
       | necessarily use Logparser - care to clarify, Lucian?
        
       | css wrote:
       | IMO this could benefit from using `collections.Counter` instead
       | of `defaultdict(set)`.
        
         | masklinn wrote:
         | With a `Counter` you would be counting each access from a given
         | IP as a hit against a category, rather than counting the IP
         | itsef.
         | 
         | Currently if 4 clients hit one URL and 1 client hits 5
         | logparser will register 5 records in each category (unless
         | they're classified as bots for browsers and systems). With a
         | Counter, it'd be 9.
         | 
         | Both informations could be accessible using a
         | `defaultdict(Counter)` but I don't know how useful that would
         | be to the people actually using logparser.
        
       | eatonphil wrote:
       | On a tangent, I've been looking into log parsing for an
       | application I'm building recently.
       | 
       | If you want to support pulling info out of common logs it's
       | pretty simple to pull together a list of regexes for the default
       | log format in each major system. Simple example here:
       | https://github.com/multiprocessio/datastation/blob/master/sh....
       | 
       | I use this in the app to be able to quickly pull info out of
       | access logs for further analysis a la OP's app and GoAccess but
       | in a GUI where you can also do further processing.
       | 
       | Demo video of this here:
       | https://www.youtube.com/watch?v=sCx2mF2jyUQ&t=9s.
        
         | linuxdude314 wrote:
         | You can find a very comprehensive list of regex patterns
         | looking at the logstash's grok definitions:
         | 
         | https://github.com/logstash-plugins/logstash-patterns-core/t...
        
       | eu wrote:
       | To be fair, GoAccess does a bit more (is has that websockets live
       | view)
        
         | edoceo wrote:
         | That's not in the parse loop - where comparison is happening.
        
           | joshyi wrote:
           | Still, there's a lot more data outputting from goaccess with
           | support for custom logs.
        
       | Svetlitski wrote:
       | Are you certain your benchmarks are correct? The GoAccess FAQ
       | states that it parses over 100,000 lines/second [1]. While this
       | figure depends on the hardware used, this still is _massively_
       | faster than the figure quoted in the README. Benchmarking is
       | quite technical if you want consistent results, so some more
       | information on the benchmarking methodology used here would be
       | much appreciated.
       | 
       | [1] https://goaccess.io/faq#performance
        
       | makapuf wrote:
       | Im not sure its an alternative yet, functionally it seems that it
       | misses incremental parsing, live updates, interactive html and
       | tui interfaces, graphs,...
        
       | patja wrote:
       | Seems like a confusing name given that logparser for IIS log
       | files has been around for a very long time.
        
       | jerf wrote:
       | I am skeptical of those benchmarks. This is written in Python,
       | and, looking at the core loop, yes, it really is Python, not
       | Python wrapped around C or some other acceleration technology.
       | For pure Python to come out appearing to get four times the
       | through put of a C program is pretty dubious. That would have to
       | be one crappy C program. GoAccess looks like it ought to be far
       | enough along that somebody has at least taken a bit of a crack at
       | optimization, but, perhaps not. C ought to be able to smoke pure
       | Python at this task. (Possibly, you know, _unsafely_ , where a
       | crafted referrer may get to arbitrary code execution or
       | something, but still it ought to be _way faster_.)
        
         | [deleted]
        
         | messe wrote:
         | > This is written in Python, and, looking at the core loop,
         | yes, it really is Python, not Python wrapped around C or some
         | other acceleration technology
         | 
         | It seems to use a library clfparser to parse apache common log
         | format logs; internally that uses Python's regex engine which
         | is written in C.
         | 
         | 6000 line/s seems incredibly slow to me for a C program parsing
         | a log file. I'm seeing a lot or strstr's, strlen's, strdup's,
         | and strchr's in GoAccess's parse.c, all of which are O(n) per
         | line and, while fine in isolation, could be causing GoAccess to
         | do quite a bit more work per line than just using an optimized
         | regex engine.
        
           | brundolf wrote:
           | I wonder what percentage of real-world C programs are
           | exponentially slower than they could be because of the str
           | functions
        
           | jerf wrote:
           | Thank you. That does sound like something that could resolve
           | my skepticism into concrete facts.
        
       ___________________________________________________________________
       (page generated 2021-09-23 23:01 UTC)