[HN Gopher] Replace std:find_if in 80% of the cases
       ___________________________________________________________________
        
       Replace std:find_if in 80% of the cases
        
       Author : todsacerdoti
       Score  : 32 points
       Date   : 2021-10-01 18:27 UTC (4 hours ago)
        
 (HTM) web link (www.sandordargo.com)
 (TXT) w3m dump (www.sandordargo.com)
        
       | [deleted]
        
       | quietbritishjim wrote:
       | I disagree with the article's conclusion. The evidence in it
       | proves that it's better _not_ to use the specialised functions
       | (like any_of).
       | 
       | It's immediately clear to me what each of the find_if snippets
       | does, but I haven't heard of the other functions - and the
       | evidence from the article is that the vast majority of other
       | programmers haven't either (in fact that's out of the subset that
       | bother to use find_if, which I would say means even less of
       | general coders). Sure, it's fairly obvious what they do from the
       | name, but I'd still bother to check their signature in case
       | there's a subtlety. In other words, there may be fewer lines of
       | code, but it's still more cognitive overhead.
       | 
       | In fact I'd go one step further and say it's better to use raw
       | primitives, like for loops (range based where appropriate), where
       | they don't need too much extra code.
        
         | leetcrew wrote:
         | it seems like your argument boils down to "I prefer a for loop
         | because I'm more familiar with it". if that's true of you and
         | most of the people you work with, that's certainly a valid
         | argument. the opposite happens to be true on my team. we
         | understand and prefer the functions from the algorithms header
         | in most cases.
         | 
         | one argument in favor of functions like any_of is that, once
         | you get familiar with them, they provide additional
         | documentation of intent. I'll concede this is a pretty minor
         | benefit; most uses of any_of replace a pretty simple loop.
        
       | yellow_lead wrote:
       | > So I went through all our usages of find_if and I found that it
       | was only used in a proper way in about 20% of all the cases.
       | 
       | > This means that the Pareto principle applies here too. In 80%
       | of the cases, std::find_if should not have been used.
       | 
       | I don't think this is the pareto principle. Its a rule of
       | probabilities, i.e 1-X.
        
         | tomjakubowski wrote:
         | The Pareto principle can be stated thus:                   When
         | the percentage sign's         89 minus nine         That's
         | Pareto
        
         | kragen wrote:
         | Agreed. 80% of the `find_if` cases require 20% of the what,
         | exactly? The functions? But he's using four times as many
         | functions now.
        
       | toxik wrote:
       | Marginally more readable, but is it performing better?
        
         | secondcoming wrote:
         | No, the codegen is the same
         | 
         | https://godbolt.org/z/eebE9eYP6
        
         | twoodfin wrote:
         | Indeed, outside of corner cases where iterators are expensive
         | to copy or compare, I'd be surprised if there were any
         | measurable advantage beyond clarity.
        
           | dataflow wrote:
           | [edit: whoops, never mind]
        
             | kragen wrote:
             | There's no difference in how many times the predicate is
             | called or with which elements.
        
       | felixguendling wrote:
       | When reading the title I thought that it is about complexity of
       | linear search vs. better alternatives like hash sets/maps, or at
       | least sorted data structures (binary search).
       | 
       | But if having linear complexity is fine,
       | std::find_if/any_of/none_of/all_of/etc. are of course fine and
       | you should prefer the version that's most expressive.
        
         | kragen wrote:
         | Of course; as Larry Wall said, doing a sequential search over
         | the keys of a hash table is like clubbing someone to death with
         | a loaded Uzi.
        
           | mywittyname wrote:
           | It's safer, more reliable, and easier to clean up after?
        
       | [deleted]
        
       | malkia wrote:
       | Wondering if something like this can catch these -
       | https://twitter.com/_fel1x/status/1443613096820543492
        
       | sillysaurusx wrote:
       | Is "using namespace std" still considered heresy? I rather liked
       | it. It'd make a lot of these examples a little more readable.
        
         | eska wrote:
         | Definitely don't use it in headers, because it pollutes the
         | namespace of files including it. In .cpp files it's more
         | subjective. You can also resort to only using it in functions,
         | to limit it's scope to code where you make heavy use of std.
        
         | riskneutral wrote:
         | I don't like it, clarity is better in C++. Things are
         | complicated enough as it is.
        
           | kragen wrote:
           | How about `using std::find_if`?
        
             | dataflow wrote:
             | Unqualified calls change the behavior. You end up doing ADL
             | instead of a normal lookup and could potentially end up
             | calling a different function than the one you intended
             | entirely. Sometimes that's intentional (as with swap), but
             | often it's a bug.
        
               | sillysaurusx wrote:
               | I've been a C++ dev for longer than I care to admit. The
               | idea that you could call the wrong function after putting
               | "using std::string" at the top of your .cpp file doesn't
               | reflect my experience.
               | 
               | You might have a point if people are putting that in
               | header files. But even in header files, "using
               | std::string" is often a standard thing to do in e.g. game
               | engines. It's far easier to change out your string
               | implementation if you're writing "string" instead of
               | "std::string".
        
               | dataflow wrote:
               | > using std::string
               | 
               | The question was about find_if, which is a function, not
               | string.
        
         | [deleted]
        
       | kragen wrote:
       | Reading these I'm struck by the non-orthogonality and "mental
       | compilation" on display here. In Python instead of
       | return numbers.end()                !=
       | std::find_if(numbers.begin(), numbers.end(),
       | [](int number) { return number % 2 == 1; });
       | 
       | I would write                   return any(number % 2 == 1 for
       | number in numbers)
       | 
       | which, similarly, stops examining numbers once it finds an odd
       | one, but works by transforming the sequence of numbers into a
       | lazy sequence of booleans, then applying the generic boolean-
       | sequence-reducer `any` to it. I think this more clearly says what
       | is desired (the C++ version strikes me as something a compiler
       | might output given the Python version, but in this case you had
       | to run the compiler in your mind), and additionally is a better
       | orthogonal decomposition of the problem than computing a first-
       | hit iterator from a sequence of numbers and a number-to-boolean
       | transformation, then comparing it against the end of the
       | sequence.
       | 
       | Sandor's suggested replacement solves the first problem of
       | clearly saying what is desired:                   return
       | std::any_of(numbers.begin(), numbers.end(),
       | [](int number) { return number % 2 == 1; });
       | 
       | But the penalty is even worse orthogonality!
       | 
       | Because of Python's better orthogonality, you can factor out the
       | lazy boolean sequence generator into a function of its own if
       | that's a useful thing to do:                   def
       | oddnesses(numbers):             return (number % 2 == 1 for
       | number in numbers)              return any(oddnesses(numbers))
       | 
       | In this particular case the extra degree of composability you get
       | from it is not very useful, but then again, neither is the
       | original code. You could do things like this with that sequence
       | with equal plausibility to the original code:
       | all(oddnesses(numbers))  # addressed by all_of in OP
       | zip(numbers, oddnesses(numbers))         sum(oddnesses(numbers))
       | # punning booleans as ints         sum(map(operator.mul, numbers,
       | oddnesses(numbers)))
       | 
       | Of course, Python is not really a viable alternative to C++ in
       | most of the places where people are using C++, mostly because it
       | uses 40 times as much CPU and energy and 10 times as much memory,
       | but also because large Python codebases get unmaintainable.
       | 
       | But we could wish for a language with the efficiency and in-the-
       | large maintainability of C++ and the clarity and generality of
       | Python.
        
         | eska wrote:
         | Are you aware of ranges?
         | std::ranges::any_of(numbers, [](auto i){ return i % 2 == 0; }))
         | 
         | There's also stuff like this for orthogonality
         | https://en.cppreference.com/w/cpp/ranges/transform_view
        
           | kragen wrote:
           | Yes, which is why I complained about how nonorthogonal the
           | C++ was instead of just how verbose :)
        
             | eska wrote:
             | See my edit. You're probably interested in views, with
             | which you can do stuff similar to lazy functional languages
             | v | std::views::reverse | std::views::drop(2);
             | 
             | Here's an intro
             | https://hannes.hauswedell.net/post/2019/11/30/range_intro/
        
               | blt wrote:
               | Sadly the C++20 ranges library does not include any
               | facility to create views by writing Python-like generator
               | loops (coroutine loops with "yield"). Creating new views
               | is about as difficult as creating iterators before - i.e.
               | too difficult.
        
         | jrimbault wrote:
         | Because I love Python, and Rust is applicable is many places
         | where C++ is, here's a Rust version :                   //
         | assuming `numbers` is an iterator         numbers.any(|i| i % 2
         | == 1)
         | 
         | Rust's iterators generally compile down very well, and I think
         | this exactly as expressive as Python. And I also wished C#'s
         | Enumerable were as performant as Rust's iterators.
        
           | Doctor_Fegg wrote:
           | That's basically borrowed off Ruby. (Which isn't a bad
           | thing.)
        
             | kragen wrote:
             | It's not that simple.
             | 
             | Ruby's Enumerable is a concrete class; Rust's
             | std::iter::Iterator is a _trait_ , which has some aspects
             | in common with Ruby classes and other aspects in common
             | with Python protocols such as the Python iterator protocol.
             | std::iter::Iterator is (primarily, though not in this case)
             | an _external_ iterator like Python iterators, STL
             | iterators, and ranges in C++20 or D, not an _internal_
             | iterator like (the original version of) Ruby Enumerable.
             | Rust 's type-checking strategy is more like ML than like
             | Ruby, Python, C++, or D, while its compilation strategy is
             | more like C++'s or D's compilation strategy than those of
             | Ruby, Python, or ML.
        
           | mbrubeck wrote:
           | And if you write the equivalent to the `find_if` version in
           | Rust, then the standard linter Clippy will automatically
           | suggest the "good" version:                   warning: called
           | `is_some()` after searching an `Iterator` with `find`
           | --> src/lib.rs:2:9           |         2 | numbers.find(|i| i
           | % 2 == 1).is_some()           |
           | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use `any()` instead:
           | `any(|i| i % 2 == 1)`           |
        
           | kragen wrote:
           | IMHO it still suffers from less orthogonality; it's precisely
           | equivalent to the C++ std::ranges::any_of version, though
           | pleasantly less verbose.
           | 
           | I think I'm going to have to get serious about Rust. I had a
           | dismaying epiphany writing
           | https://news.ycombinator.com/item?id=28660097 the other day:
           | like Perl, Python _just isn 't saving me very much effort_
           | except for quick hacks, maybe a factor of 2-4 over C (if we
           | discount CVEs, heh). It's not the order-of-magnitude
           | development speed improvement I was hoping for, and now that
           | I've been using it for 20 years I know it never will be.
           | Python starts to buckle under the strain of large projects,
           | they totally fucked up Unicode handling to an epic degree
           | never seen in any other programming language, and it
           | typically costs you orders of magnitude in performance.
           | 
           | And--this is the worst of all--at best a library written in
           | Python is only usable from Python and maaybe the JVM
           | languages, and you can probably call it from a shell script.
           | (I don't know, maybe the CIL languages too.) By contrast, if
           | I write a library in Rust or C or C++ I can use it in Rust,
           | C, C++, Python, Perl, Ruby, LuaJIT, Julia, PHP, node.js,
           | Scheme, Common Lisp, OCaml, Golang, the JVM languages, the
           | CIL, and maybe even Arduino. So the effort is greater
           | (though, as you point out in this case, not necessarily
           | _much_ greater, though you didn 't mention compile time,
           | which sucks pretty bad for both Rust and C++) but the benefit
           | is _much_ greater.
           | 
           | So I think from now on I'll try to start writing prototypes
           | in Python or OCaml but, if the prototype seems worthwhile,
           | writing the Real Thing in Rust. Which is gonna be pretty
           | rough since I don't know Rust. :)
        
         | jackewiehose wrote:
         | in C++ I would write                   for (int i: numbers)
         | if (i % 2 == 1)             return true;         return false;
         | 
         | find_if() etc. is fine if you already have the right predicate
         | function at hand but it becomes quite ridiculous with lambda
         | expressions.
        
           | kragen wrote:
           | I probably would too, and composability would suffer further.
           | The concern you raise about lambda expressions is basically
           | one of syntax rather than semantics. Syntax is important, but
           | as an unrepentant user and occasional implementor of Lisps
           | and Forths, I am in no position to criticize C++ :)
        
           | secondcoming wrote:
           | This interestingly generates shorter code as the compiler
           | doesn't attempt any loop unrolling
           | 
           | https://godbolt.org/z/eebE9eYP6
        
         | tomsmeding wrote:
         | > But we could wish for a language with the efficiency and in-
         | the-large maintainability of C++ and the clarity and generality
         | of Python.
         | 
         | What about Haskell? Maintainability yes; efficiency mostly. It
         | has a GC that can be a pain for high-memory scenarios if you
         | don't spend a lot of effort optimising, but on the whole
         | Haskell is surprisingly fast for its expressivity.
         | 
         | Any odd? `any odd numbers`. Or equivalently, `or (map odd
         | numbers)`, which is compositional in that `map odd numbers` is
         | just a list again -- no distinction between lazy and strict
         | lists in Haskell. Your other four examples are `and (map odd
         | numbers)`, `zip numbers (map odd numbers)`, `sum (map (fromEnum
         | . odd) numbers)`, `sum (zipWith (*) numbers (map (fromEnum .
         | odd) numbers))`. Note I used `map odd` instead of defining
         | `oddnesses = map odd` and using `oddnesses`, because the
         | definition is shorter than the name. :)
         | 
         | This changes somewhat when you use arrays instead of lists:
         | Haskell's lists are in terms of behaviour mostly "generators
         | that are implicitly stored as a lazy linked list if necessary".
         | So if you want to store stuff more compactly, or you want to
         | index into the list (as opposed to looping over it), you need
         | arrays. But even for arrays the story doesn't change much!
         | Using the 'vector' library, actually all of the above examples
         | work just fine, with no unnecessary intermediate vectors
         | because of some intelligent fusion rules, if you put `V.`
         | before the list-related function names.
        
           | kragen wrote:
           | Haskell is #1 for generality but #0 for clarity, though I
           | have to admit that some of my Python examples aren't exactly
           | shining exemplars either. But if I write code in Haskell I
           | can only call it from Haskell (which wasn't something I
           | mentioned in my original comment, but is important).
        
         | secondcoming wrote:
         | > Because of Python's better orthogonality, you can factor out
         | the lazy boolean sequence generator into a function of its own
         | if that's a useful thing to do:
         | 
         | You can do the same in C++:                   static bool
         | isEven(int x) { return x % 2 == 0; }              bool
         | foo(const std::vector<int>& v)         {             return
         | v.end()                 != std::find_if(v.begin(), v.end(),
         | isEven);         }
        
           | kragen wrote:
           | Not the same; no sequence.
        
       | mywittyname wrote:
       | Shouldn't this be titled, "use std::find_if where appropriate"?
       | 
       | The whole point of this is to find every element in the list
       | which matches the lambda. Which is why it returns an iterator:
       | you're supposed to use that value, increment it, then call
       | std::find_if again with it as the first parameter.
       | 
       | Granted, the documentation is failing here, since it doesn't
       | provide an example of this use case.                   for(auto i
       | = begin(v), e = end(v); find_if(i, e, lambda_func) != e; ++i) {
       | cout << "Found a match: " << i         }
       | 
       | If you don't need a list of matching elements, but instead want
       | to know if there was any match, then use a different function
       | designed for that specific use.
        
       | josephcsible wrote:
       | tl;dr: if the only thing you do with the result of find_if or
       | find_if_not is compare it to .end(), then you should use all_of,
       | any_of, or none_of instead.
        
       ___________________________________________________________________
       (page generated 2021-10-01 23:02 UTC)