hngopher.com

       [HN Gopher] Feature comparison of ack, ag, Git-grep, grep and ri...
       ___________________________________________________________________
        
       Feature comparison of ack, ag, Git-grep, grep and ripgrep
        
       Author : Amorymeltzer
       Score  : 115 points
       Date   : 2021-10-24 10:44 UTC (12 hours ago)
        
 (HTM) web link (beyondgrep.com)
 (TXT) w3m dump (beyondgrep.com)
        
       | [deleted]
        
       | waynesonfire wrote:
       | Apparently,
       | 
       | > Print lines by number
       | 
       | is not supported by GNU grep???                 $ grep --version
       | grep (GNU grep) 2.25       $ cat world       hello       $ grep
       | -Hn hello world       world:1:hello
        
         | tialaramex wrote:
         | "Print lines by number" is a vague thing to say, particularly
         | since the comparison later includes, "Print specific lines by
         | number".
         | 
         | However, the grep -Hn feature is described in this comparison
         | as, "Prefix the line number to matching lines"
         | 
         | One thing that can help people to compare this sort of tool is
         | to pair _technical_ descriptions like command line parameters
         | with the natural language explanation. If tool foo has a
         | feature you 're describing as "Prevent cheesecake" then I have
         | no idea if my tool bar can do that, whereas if you say this is
         | -Xqm then I can read the documentation and discover that I call
         | this "disable refrigerated dessert" and it's -VQb so yes, my
         | tool does this too.
         | 
         | I spent some time recently reading the proposals to fix/ extend
         | C++ ranges P2214 - and because this general idea is _very
         | common_ they often discuss Haskell, Rust or even Python. If you
         | 're experienced in a language you already know whether it would
         | spell something FlatMap, flat_map, or flatMap but you might not
         | guess that C++ people would call your filter_map by the name
         | transform_maybe, or as a C++ programmer who has barely dipped
         | their toe in Haskell you wouldn't know that Haskell doesn't use
         | the word "transform" in this context and without being told
         | what it's called you won't find the relevant documentation let
         | alone be able to try it for yourself and appreciate what it's
         | for.
        
           | petdance wrote:
           | The "print lines by number" was there because earlier
           | versions of ack had a `--line=N` feature, where you could say
           | "ack --line=15-18" and print those four lines. I dropped it
           | because it was hardly any better than using sed.
           | 
           | If you've got suggestions on improvements, please submit an
           | issue. I'd love to hear them.
           | 
           | https://github.com/beyondgrep/website/issues
        
         | dotancohen wrote:
         | You can update the page with pull requests to this repo:
         | https://github.com/beyondgrep/website
        
         | blablabla123 wrote:
         | Or "Limit length of output lines"... how about
         | $ grep ... | head -n ...
         | 
         | I think this feature comparison misses how grep is supposed to
         | be used. (See also "Pipe output through a pager or other
         | command")
        
           | waynesonfire wrote:
           | I agree. I use grep in conjunction with find and parallel to
           | achieve a number of features not native to grep that are
           | built-in to these other tools.
           | 
           | I prefer tools designed with does one thing well philosophy.
           | It lets me scale my knowledge. I can solve many problems with
           | find and parallel not supported by these grep clones.
        
             | burntsushi wrote:
             | ripgrep degrades just fine to a normal grep tool. And you
             | can use it in `find` pipelines.
             | 
             | > I prefer tools designed with does one thing well
             | 
             | This is pretty unlikely. For example, you probably use a
             | grep tool that will also do recursive directory traversal
             | for you. It probably even has flags for defining filters on
             | that traversal. Why use such a tool when `find` already
             | does recursive directory traversal for you?
        
           | ptomato wrote:
           | that limits the number of output lines, not the length.
        
           | petdance wrote:
           | "grep | head" doesn't limit the length of output lines, but
           | "grep | cut" would.
           | 
           | However, ack and ripgrep's default unpiped output is grouped
           | by file, and if you pipe the output, it doesn't do the
           | grouped output.
           | 
           | The idea of "supposed to be used" is also different for ack
           | than it is for grep. ack is specifically less of a general-
           | use tool than grep. It's meant for searching source code.
           | This is also why I have never said that ack is a replacement
           | for grep.
        
       | petdance wrote:
       | Here's another site of mine y'all may be interested in:
       | https://altbox.dev/
       | 
       | It's a collection of improved shell tools, organized by the tool
       | they supplement.
       | 
       | As with this feature comparison chart, patches and suggestions
       | are welcome: https://github.com/petdance/altbox
        
       | infocollector wrote:
       | When I need a faster grep, I love ugrep
       | (https://github.com/Genivia/ugrep - especially when I am
       | searching compressed logs for debugging).
        
         | petdance wrote:
         | What do you love about ugrep?
        
         | hahfjjjshj wrote:
         | This is preferred over ripgrep? I've not used ugrep before
        
       | dotancohen wrote:
       | I would absolutely love to see examples of each tools' syntax for
       | each use case.
       | 
       | In particular "Show proximity of matches to other matches" would
       | be a huge boon to replace `grep -C 5 foo | grep bar`.
        
       | gnfargbl wrote:
       | I occasionally find myself wanting to search a data stream using
       | a large-ish set (a few tens or hundreds of thousands) of regexes.
       | This is very slow with a backtracking engine like PCRE, but ought
       | to be pretty fast with a DFA-based engine like re2.
       | 
       | So far, I have been unsuccessful in finding a grep replacement
       | that can read patterns from a file, and which also uses a DFA
       | engine. Does one exist? From the table, it looks like ripgrep
       | might be suitable. Is it?
        
         | burntsushi wrote:
         | Precisely speaking, no, I don't know of any grep tools that use
         | a DFA engine. However, both ripgrep and GNU grep use a hybrid
         | NFA/DFA engine (also known as a "lazy DFA") for some subset of
         | regexes. I'm not too familiar with all of GNU grep's
         | strategies, but for ripgrep, it will fall back to an NFA
         | engine. (And I don't mean Friedl's bastardization of the term
         | "NFA engine.") For ripgrep, see the --dfa-size-limit flag to
         | try to let it use the hybrid NFA/DFA engine for bigger regexes.
         | Whether it helps or not depends on your situation.
         | 
         | Now, this will do much better than a backtracking engine, but
         | if you get up into the tens of thousands or hundreds of
         | thousands of regexes, it's going to get pretty painful. Finite
         | automata just doesn't scale that well. At that point, you
         | really start wanting a more specialized solution. Probably the
         | best answer to that that I know of is Hyperscan. And you're in
         | luck; someone maintains a fork of ripgrep with support for
         | Hyperscan: https://sr.ht/~pierrenn/ripgrep/
         | 
         | (A special case is tens of thousands of literal patterns.
         | ripgrep will notice that and should use Aho-Corasick. It
         | doesn't help so much with search time since it's just a NFA or
         | a DFA like with regexes, but the machine itself is constructed
         | much more quickly.)
        
           | gnfargbl wrote:
           | What an incredibly helpful reply. Thank you!
           | 
           | It sounds like either plain ripgrep, or ripgrep+hyperscan, is
           | pretty much exactly what I'm looking for. Next time I have
           | this problem, I'll certainly be reaching for it.
        
           | vlovich123 wrote:
           | Out of curiosity, what prevents the hyperscan support from
           | being mainlined?
        
             | burntsushi wrote:
             | Too much of a weighty dependency and too much of a niche
             | IMO. For example, the last time I tried to build Hyperscan,
             | I failed and gave up after 15 minutes of trying.
        
         | jemfinch wrote:
         | I've written this code (in C++) for an employer. RE2 scaled
         | fine to hundreds of thousands of regexes. You'll want to use
         | RE2::Set, which compiles multiple regexes into a single DFA,
         | and probably the "Filter" functionality (whose name I don't
         | precisely remember and am too lazy to look up) which uses an
         | Aho-Corasick tree to subset the potential matches. One thing
         | you'll have to watch out for is RE2's maximum DFA size; if
         | compilation of your RE2::Set fails, just split your set of
         | regexes in half and compile again.
         | 
         | You could probably do some fun optimizations by grouping the
         | regexes which depend on the same literals into their own sets,
         | but I never needed to.
        
           | burntsushi wrote:
           | This is basically what ripgrep will do for you automatically.
           | (ripgrep uses Rust's regex engine, which is a descendant of
           | RE2.) But when you get up into hundreds of thousands of
           | regexes, the NFA (and the resulting DFA) get _really_ big.
           | And things generally don 't scale that well. Here's a good
           | example: http://web.archive.org/web/20210302010420/https://01
           | .org/hyp...
           | 
           | The problem is that for a big enough NFA, you'll wind up
           | spending most of your search doing powerset construction to
           | build the DFA.
           | 
           | > One thing you'll have to watch out for is RE2's maximum DFA
           | size
           | 
           | You can configure this in ripgrep with the --dfa-size-limit
           | flag. (See also --regex-size-limit.)
        
       | spinax wrote:
       | Some of these line items are really obtuse and in some cases just
       | not right, "Don't search in binary files" and "Treat binary files
       | as if they were text" for example caught my eye - GNU grep has
       | the `--binary-files` option which supports both of these
       | features. Others like "can pipe output to a pager" seem like a
       | half-hearted attempt to give a +1 to a specific tool while
       | ignoring that you can... pipe the output of _any_ of them
       | using... a pipe.
        
         | opencl wrote:
         | If you follow the "If you have updates to the chart, please
         | submit as a GitHub issue." link, you can see that there are a
         | few dozen open issues and the page was last updated 2 years
         | ago.
        
           | spinax wrote:
           | Ahhh, I did not (my bad) - I don't track grep features, not a
           | hobby :) this chart could have been more truthy 2 years ago
           | for all I know. Thanks.
        
             | petdance wrote:
             | If there are things you think are confusing or inaccurate,
             | please do make GitHub issues for them. Thanks.
        
           | loeg wrote:
           | Maybe we need a (2019) in the title.
        
             | petdance wrote:
             | Or maybe I just need to move it up in my priority stack.
        
       | TacticalCoder wrote:
       | I use ripgrep and more typically rigrep from within Emacs, thanks
       | to "counsel-rg". I configured counsel-rg (as suggested) to not
       | display very long matching lines (for Emacs doesn't like lines
       | that are too long).
       | 
       | It is really _very_ fast.
        
         | beermonster wrote:
         | Doom emacs uses it if it's installed
        
       | mi_lk wrote:
       | Nice, would be better if there are examples of each feature to
       | show how their cli flags map to others
        
         | petdance wrote:
         | That's how I had it at first.
         | 
         | I originally had it as a "phrasebook" of how to do the same
         | thing in the different tools, but it was really ugly and took
         | up a lot of horizontal space, and I figured it was more useful
         | as a chart of yes/no. Also, there were cases where two tools
         | had pretty much the same feature, but not exactly, so just
         | listing flags didn't make sense.
         | 
         | I've still got a lot of the data of the switches in the JSON
         | file that I build the chart from.
         | https://github.com/beyondgrep/website/blob/dev/features.json If
         | you've got ideas on how to bring back the phrasebook format,
         | either integrated into this page, or as a separate standalone
         | page, I'd love to hear them. Maybe the phrasebook isn't best
         | done as a table like this, for example. Open a ticket in GitHub
         | and let me know your thoughts.
        
       | oweiler wrote:
       | ripgrep supports the most common features while still being much
       | faster than ack.
       | 
       | So it's no comparison for me.
        
         | loeg wrote:
         | Ag (the_silver_searcher) has performance closer to ripgrep and
         | a similar feature set. But it's tough to beat ripgrep both for
         | performance and reliability. Rust is great in this application.
        
       | asicsp wrote:
       | I wonder how long would the documentation for GNU grep continue
       | to say this:
       | 
       | > _PCRE support is here to stay, but consider this option
       | experimental when combined with the -z (--null-data) option, and
       | note that 'grep -P' may warn of unimplemented features._
       | 
       | I did come across a few issues mentioned with -z on
       | unix.stackexchange a few years back but they have been fixed as
       | far as I know.
        
       | jll29 wrote:
       | The next step is _Extended_ regular expressions that characterise
       | Regular Relations (RR) and that specify Finite Strate Transducers
       | (FSTs). See:
       | 
       | http://users.itk.ppke.hu/~sikbo/nytech/gyak/05_morfo/xfst/bo...
       | 
       | Advances:
       | 
       | - symmetry input:output (reversable)
       | 
       | - readable/maintainable expressions due to _naming_ of sub-
       | expression
       | 
       | Implementations:
       | 
       | - Xerox XRCE XFST/lexc/twolc compilers
       | 
       | - FOMA - https://fomafst.github.io/
        
       | john-tells-all wrote:
       | I _adore_ Ripgrep and use it dozens of times a day, and have for
       | years. It 's extremely fast, does the right thing most of the
       | time, and has a useful featureset.
       | 
       | Ack is also nice, I've used that quite a bit too. It has the
       | advantage of being in Perl, so if you're on a "secure" computer
       | (no compiler), you can still use a fast + featureful search tool.
        
         | petdance wrote:
         | I'm glad you appreciate that you can install ack anywhere. It's
         | exactly for that use case that I have kept ack able to be a
         | single text file download. Also, it only requires Perl 5.10.1,
         | so it's OK if you're using an old Perl.
        
         | sva_ wrote:
         | I feel the same way about ripgrep, and also fzf - especially
         | both in combination. I only started using them both a few
         | months ago, yet it feels like a fundamental way of doing
         | computing.
        
           | bloopernova wrote:
           | fzf is wonderful. I feel like we're only scratching the
           | surface of its utility. Using it with git add is so fast,
           | just "ga" (an alias), hit tab (for each file) then enter.
        
             | hahfjjjshj wrote:
             | I find interactive git add more useful most of the time,
             | 'git add -i' then select what's needed from the menu
        
               | oconnor663 wrote:
               | Figuring out how to integrate _that_ with FZF would be
               | really something. Being able to easily go up and down the
               | list, and visualize the whole thing, would make things a
               | lot smoother.
        
           | hahfjjjshj wrote:
           | Any examples of using them in combination? I've only recently
           | started using ripgrep despite being aware of it for a while.
        
             | petdance wrote:
             | There are a ton of articles out there about it:
             | 
             | https://duckduckgo.com/?q=ripgrep+%2B+fzf
        
       ___________________________________________________________________
       (page generated 2021-10-24 23:01 UTC)