[HN Gopher] Feature comparison of ack, ag, Git-grep, grep and ri...
___________________________________________________________________
Feature comparison of ack, ag, Git-grep, grep and ripgrep
Author : Amorymeltzer
Score : 115 points
Date : 2021-10-24 10:44 UTC (12 hours ago)
(HTM) web link (beyondgrep.com)
(TXT) w3m dump (beyondgrep.com)
| [deleted]
| waynesonfire wrote:
| Apparently,
|
| > Print lines by number
|
| is not supported by GNU grep??? $ grep --version
| grep (GNU grep) 2.25 $ cat world hello $ grep
| -Hn hello world world:1:hello
| tialaramex wrote:
| "Print lines by number" is a vague thing to say, particularly
| since the comparison later includes, "Print specific lines by
| number".
|
| However, the grep -Hn feature is described in this comparison
| as, "Prefix the line number to matching lines"
|
| One thing that can help people to compare this sort of tool is
| to pair _technical_ descriptions like command line parameters
| with the natural language explanation. If tool foo has a
| feature you 're describing as "Prevent cheesecake" then I have
| no idea if my tool bar can do that, whereas if you say this is
| -Xqm then I can read the documentation and discover that I call
| this "disable refrigerated dessert" and it's -VQb so yes, my
| tool does this too.
|
| I spent some time recently reading the proposals to fix/ extend
| C++ ranges P2214 - and because this general idea is _very
| common_ they often discuss Haskell, Rust or even Python. If you
| 're experienced in a language you already know whether it would
| spell something FlatMap, flat_map, or flatMap but you might not
| guess that C++ people would call your filter_map by the name
| transform_maybe, or as a C++ programmer who has barely dipped
| their toe in Haskell you wouldn't know that Haskell doesn't use
| the word "transform" in this context and without being told
| what it's called you won't find the relevant documentation let
| alone be able to try it for yourself and appreciate what it's
| for.
| petdance wrote:
| The "print lines by number" was there because earlier
| versions of ack had a `--line=N` feature, where you could say
| "ack --line=15-18" and print those four lines. I dropped it
| because it was hardly any better than using sed.
|
| If you've got suggestions on improvements, please submit an
| issue. I'd love to hear them.
|
| https://github.com/beyondgrep/website/issues
| dotancohen wrote:
| You can update the page with pull requests to this repo:
| https://github.com/beyondgrep/website
| blablabla123 wrote:
| Or "Limit length of output lines"... how about
| $ grep ... | head -n ...
|
| I think this feature comparison misses how grep is supposed to
| be used. (See also "Pipe output through a pager or other
| command")
| waynesonfire wrote:
| I agree. I use grep in conjunction with find and parallel to
| achieve a number of features not native to grep that are
| built-in to these other tools.
|
| I prefer tools designed with does one thing well philosophy.
| It lets me scale my knowledge. I can solve many problems with
| find and parallel not supported by these grep clones.
| burntsushi wrote:
| ripgrep degrades just fine to a normal grep tool. And you
| can use it in `find` pipelines.
|
| > I prefer tools designed with does one thing well
|
| This is pretty unlikely. For example, you probably use a
| grep tool that will also do recursive directory traversal
| for you. It probably even has flags for defining filters on
| that traversal. Why use such a tool when `find` already
| does recursive directory traversal for you?
| ptomato wrote:
| that limits the number of output lines, not the length.
| petdance wrote:
| "grep | head" doesn't limit the length of output lines, but
| "grep | cut" would.
|
| However, ack and ripgrep's default unpiped output is grouped
| by file, and if you pipe the output, it doesn't do the
| grouped output.
|
| The idea of "supposed to be used" is also different for ack
| than it is for grep. ack is specifically less of a general-
| use tool than grep. It's meant for searching source code.
| This is also why I have never said that ack is a replacement
| for grep.
| petdance wrote:
| Here's another site of mine y'all may be interested in:
| https://altbox.dev/
|
| It's a collection of improved shell tools, organized by the tool
| they supplement.
|
| As with this feature comparison chart, patches and suggestions
| are welcome: https://github.com/petdance/altbox
| infocollector wrote:
| When I need a faster grep, I love ugrep
| (https://github.com/Genivia/ugrep - especially when I am
| searching compressed logs for debugging).
| petdance wrote:
| What do you love about ugrep?
| hahfjjjshj wrote:
| This is preferred over ripgrep? I've not used ugrep before
| dotancohen wrote:
| I would absolutely love to see examples of each tools' syntax for
| each use case.
|
| In particular "Show proximity of matches to other matches" would
| be a huge boon to replace `grep -C 5 foo | grep bar`.
| gnfargbl wrote:
| I occasionally find myself wanting to search a data stream using
| a large-ish set (a few tens or hundreds of thousands) of regexes.
| This is very slow with a backtracking engine like PCRE, but ought
| to be pretty fast with a DFA-based engine like re2.
|
| So far, I have been unsuccessful in finding a grep replacement
| that can read patterns from a file, and which also uses a DFA
| engine. Does one exist? From the table, it looks like ripgrep
| might be suitable. Is it?
| burntsushi wrote:
| Precisely speaking, no, I don't know of any grep tools that use
| a DFA engine. However, both ripgrep and GNU grep use a hybrid
| NFA/DFA engine (also known as a "lazy DFA") for some subset of
| regexes. I'm not too familiar with all of GNU grep's
| strategies, but for ripgrep, it will fall back to an NFA
| engine. (And I don't mean Friedl's bastardization of the term
| "NFA engine.") For ripgrep, see the --dfa-size-limit flag to
| try to let it use the hybrid NFA/DFA engine for bigger regexes.
| Whether it helps or not depends on your situation.
|
| Now, this will do much better than a backtracking engine, but
| if you get up into the tens of thousands or hundreds of
| thousands of regexes, it's going to get pretty painful. Finite
| automata just doesn't scale that well. At that point, you
| really start wanting a more specialized solution. Probably the
| best answer to that that I know of is Hyperscan. And you're in
| luck; someone maintains a fork of ripgrep with support for
| Hyperscan: https://sr.ht/~pierrenn/ripgrep/
|
| (A special case is tens of thousands of literal patterns.
| ripgrep will notice that and should use Aho-Corasick. It
| doesn't help so much with search time since it's just a NFA or
| a DFA like with regexes, but the machine itself is constructed
| much more quickly.)
| gnfargbl wrote:
| What an incredibly helpful reply. Thank you!
|
| It sounds like either plain ripgrep, or ripgrep+hyperscan, is
| pretty much exactly what I'm looking for. Next time I have
| this problem, I'll certainly be reaching for it.
| vlovich123 wrote:
| Out of curiosity, what prevents the hyperscan support from
| being mainlined?
| burntsushi wrote:
| Too much of a weighty dependency and too much of a niche
| IMO. For example, the last time I tried to build Hyperscan,
| I failed and gave up after 15 minutes of trying.
| jemfinch wrote:
| I've written this code (in C++) for an employer. RE2 scaled
| fine to hundreds of thousands of regexes. You'll want to use
| RE2::Set, which compiles multiple regexes into a single DFA,
| and probably the "Filter" functionality (whose name I don't
| precisely remember and am too lazy to look up) which uses an
| Aho-Corasick tree to subset the potential matches. One thing
| you'll have to watch out for is RE2's maximum DFA size; if
| compilation of your RE2::Set fails, just split your set of
| regexes in half and compile again.
|
| You could probably do some fun optimizations by grouping the
| regexes which depend on the same literals into their own sets,
| but I never needed to.
| burntsushi wrote:
| This is basically what ripgrep will do for you automatically.
| (ripgrep uses Rust's regex engine, which is a descendant of
| RE2.) But when you get up into hundreds of thousands of
| regexes, the NFA (and the resulting DFA) get _really_ big.
| And things generally don 't scale that well. Here's a good
| example: http://web.archive.org/web/20210302010420/https://01
| .org/hyp...
|
| The problem is that for a big enough NFA, you'll wind up
| spending most of your search doing powerset construction to
| build the DFA.
|
| > One thing you'll have to watch out for is RE2's maximum DFA
| size
|
| You can configure this in ripgrep with the --dfa-size-limit
| flag. (See also --regex-size-limit.)
| spinax wrote:
| Some of these line items are really obtuse and in some cases just
| not right, "Don't search in binary files" and "Treat binary files
| as if they were text" for example caught my eye - GNU grep has
| the `--binary-files` option which supports both of these
| features. Others like "can pipe output to a pager" seem like a
| half-hearted attempt to give a +1 to a specific tool while
| ignoring that you can... pipe the output of _any_ of them
| using... a pipe.
| opencl wrote:
| If you follow the "If you have updates to the chart, please
| submit as a GitHub issue." link, you can see that there are a
| few dozen open issues and the page was last updated 2 years
| ago.
| spinax wrote:
| Ahhh, I did not (my bad) - I don't track grep features, not a
| hobby :) this chart could have been more truthy 2 years ago
| for all I know. Thanks.
| petdance wrote:
| If there are things you think are confusing or inaccurate,
| please do make GitHub issues for them. Thanks.
| loeg wrote:
| Maybe we need a (2019) in the title.
| petdance wrote:
| Or maybe I just need to move it up in my priority stack.
| TacticalCoder wrote:
| I use ripgrep and more typically rigrep from within Emacs, thanks
| to "counsel-rg". I configured counsel-rg (as suggested) to not
| display very long matching lines (for Emacs doesn't like lines
| that are too long).
|
| It is really _very_ fast.
| beermonster wrote:
| Doom emacs uses it if it's installed
| mi_lk wrote:
| Nice, would be better if there are examples of each feature to
| show how their cli flags map to others
| petdance wrote:
| That's how I had it at first.
|
| I originally had it as a "phrasebook" of how to do the same
| thing in the different tools, but it was really ugly and took
| up a lot of horizontal space, and I figured it was more useful
| as a chart of yes/no. Also, there were cases where two tools
| had pretty much the same feature, but not exactly, so just
| listing flags didn't make sense.
|
| I've still got a lot of the data of the switches in the JSON
| file that I build the chart from.
| https://github.com/beyondgrep/website/blob/dev/features.json If
| you've got ideas on how to bring back the phrasebook format,
| either integrated into this page, or as a separate standalone
| page, I'd love to hear them. Maybe the phrasebook isn't best
| done as a table like this, for example. Open a ticket in GitHub
| and let me know your thoughts.
| oweiler wrote:
| ripgrep supports the most common features while still being much
| faster than ack.
|
| So it's no comparison for me.
| loeg wrote:
| Ag (the_silver_searcher) has performance closer to ripgrep and
| a similar feature set. But it's tough to beat ripgrep both for
| performance and reliability. Rust is great in this application.
| asicsp wrote:
| I wonder how long would the documentation for GNU grep continue
| to say this:
|
| > _PCRE support is here to stay, but consider this option
| experimental when combined with the -z (--null-data) option, and
| note that 'grep -P' may warn of unimplemented features._
|
| I did come across a few issues mentioned with -z on
| unix.stackexchange a few years back but they have been fixed as
| far as I know.
| jll29 wrote:
| The next step is _Extended_ regular expressions that characterise
| Regular Relations (RR) and that specify Finite Strate Transducers
| (FSTs). See:
|
| http://users.itk.ppke.hu/~sikbo/nytech/gyak/05_morfo/xfst/bo...
|
| Advances:
|
| - symmetry input:output (reversable)
|
| - readable/maintainable expressions due to _naming_ of sub-
| expression
|
| Implementations:
|
| - Xerox XRCE XFST/lexc/twolc compilers
|
| - FOMA - https://fomafst.github.io/
| john-tells-all wrote:
| I _adore_ Ripgrep and use it dozens of times a day, and have for
| years. It 's extremely fast, does the right thing most of the
| time, and has a useful featureset.
|
| Ack is also nice, I've used that quite a bit too. It has the
| advantage of being in Perl, so if you're on a "secure" computer
| (no compiler), you can still use a fast + featureful search tool.
| petdance wrote:
| I'm glad you appreciate that you can install ack anywhere. It's
| exactly for that use case that I have kept ack able to be a
| single text file download. Also, it only requires Perl 5.10.1,
| so it's OK if you're using an old Perl.
| sva_ wrote:
| I feel the same way about ripgrep, and also fzf - especially
| both in combination. I only started using them both a few
| months ago, yet it feels like a fundamental way of doing
| computing.
| bloopernova wrote:
| fzf is wonderful. I feel like we're only scratching the
| surface of its utility. Using it with git add is so fast,
| just "ga" (an alias), hit tab (for each file) then enter.
| hahfjjjshj wrote:
| I find interactive git add more useful most of the time,
| 'git add -i' then select what's needed from the menu
| oconnor663 wrote:
| Figuring out how to integrate _that_ with FZF would be
| really something. Being able to easily go up and down the
| list, and visualize the whole thing, would make things a
| lot smoother.
| hahfjjjshj wrote:
| Any examples of using them in combination? I've only recently
| started using ripgrep despite being aware of it for a while.
| petdance wrote:
| There are a ton of articles out there about it:
|
| https://duckduckgo.com/?q=ripgrep+%2B+fzf
___________________________________________________________________
(page generated 2021-10-24 23:01 UTC)