hngopher.com

       [HN Gopher] Power of small optimizations
       ___________________________________________________________________
        
       Power of small optimizations
        
       Author : stacyz
       Score  : 136 points
       Date   : 2024-02-09 18:40 UTC (1 days ago)
        
 (HTM) web link (maksimkita.com)
 (TXT) w3m dump (maksimkita.com)
        
       | optimalsolver wrote:
       | This guy took one look at some scientists' code and instantly got
       | a 14000x speedup:
       | 
       | http://james.hiebert.name/blog/work/2015/09/14/CS-FTW.html
        
         | darylteo wrote:
         | With a bit more digging, but very similar
         | https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...
        
         | eru wrote:
         | Interesting article!
         | 
         | > Furthermore, there is literally no way to tell whether your
         | program will ever actually terminate without actually executing
         | it. And there are many, many, many, ways to write a program
         | which make it "slow" to execute. "Slow" being... like really
         | slow.
         | 
         | Nitpick: there are some languages where you can tell whether
         | your program will terminate without running it. There are even
         | languages where you can tell whether your program will be slow.
         | 
         | And there are even more specialised languages where any program
         | will be 'fast'.
         | 
         | Obviously, these languages aren't Turing complete. It's an
         | interesting exercise to explore the design space to find
         | languages that are both useful and avoid being Turing complete.
         | 
         | Agda is an example of a language that allows you to run
         | approximately any practical computation, but it's not Turing
         | complete. Oversimplified: in Agda a program is only considered
         | valid by the compiler, if it comes with a computer-checkable
         | proof of halting on all inputs.
        
           | towawy wrote:
           | You can also construct programs in Turing complete languages
           | to always halt (e.g. by bounding the number of iterations of
           | all loops, usually with the tradeoff of limited input size).
        
             | eru wrote:
             | Yes, you can program in a subset of your language that is
             | not Turing complete.
        
             | The_Colonel wrote:
             | You'd also need to forbid (even indirect) recursion, i. e.
             | one function cannot appear in a callstack twice.
        
               | zmgsabst wrote:
               | I think you can allow finite recursions, eg, a maximum
               | stack depth.
        
               | trealira wrote:
               | Isn't that overly restrictive? Some programs are
               | guaranteed to terminate even with mutual recursion.
               | 
               | For example, these (obviously inefficient) functions
               | always terminate for all values of x, since both
               | terminate given an input of 0, and 0 is the minimum
               | possible input to these functions.                 bool
               | even(unsigned int x)       {         return (n == 0) ?
               | true : odd(n - 1);       }            bool odd(unsigned
               | int x)       {         return (n == 0) ? false : even(n -
               | 1);       }
        
         | brabel wrote:
         | I remember when I was in COMPSCI-101 or something like that and
         | was introduced to sorting, and after coming up with a basic
         | bubble sort algorithm, thinking that there was no way that
         | could be improved because it looked self-evident that you
         | needed to do all the work the algorithm was doing or you
         | couldn't be sure the list was really sorted. Then, of course, I
         | was introduced to multiple better algorithms like merge sort
         | and quick sort which showed just how wrong I was...
         | 
         | Perhaps, for the "scientist" writing that initial code, their
         | algorithm was also already the best that could be done because
         | they just don't know much about algorithms and techniques to
         | make them faster?! But once you learn about things like divide-
         | and-conquer, work avoidance etc. it becomes pretty much self-
         | evident when your algorithm sucks... it's really not that hard
         | - definitely not for people whose job is stuff like climate
         | science who have a very good understanding of maths.
         | 
         | Maybe by teaching scientists the basic "smart" algorithms and
         | just a little about algorithm analysis (Big O notation),
         | massive computation improvements could be had.
        
           | munch117 wrote:
           | For you, me and James Hiebert, this sort of optimisation is
           | exhilarating. Getting a half-hour job down to milliseconds?
           | Awesome!
           | 
           | But for the scientists that wrote the original code, maybe
           | not. Maybe they think of this sort of thing as drudge work,
           | something that doesn't really hold their interest. Their fun
           | is in designing the mathematical concept, and turning it into
           | code is just a chore.
           | 
           | So yeah, we could teach scientists. But even better would be
           | if we could provide scientists with tools that are just
           | naturally fast when expressing problems on their own terms.
        
             | zmgsabst wrote:
             | Also, spending three hours optimizing away thirty minutes
             | you only run four times is a net negative of one hour.
             | 
             | You have to optimize the system, as a whole.
        
             | brabel wrote:
             | > Their fun is in designing the mathematical concept, and
             | turning it into code is just a chore.
             | 
             | It's not about fun. Sometimes it can be the difference
             | between something being even possible or not. In this case,
             | the author said they ran this algorithm hundreds of times.
             | So changing it from 30 mins to 0.1 second makes things that
             | were impossible before, possible. I don't find it fun at
             | all to optimise, but I can see where I may need to in order
             | to make things better, or possible at all... what I am
             | suggesting is that anyone writing code, scientist or not,
             | need to be aware of this - and know when they just MUST
             | optimise.
        
             | onos wrote:
             | As an ex scientist, I think basic algorithms theory should
             | be incorporated into scientific computing classes. I took a
             | few of these but none of the concepts from this area was
             | covered. I remember well discovering some code of mine was
             | slowing down with system size and finally realizing it was
             | because "append" was creating a new array each step... had
             | no clue that would happen. Was enthralled by the online
             | algorithms course when I finally discovered it - hash
             | tables!!!
        
         | EasyMark wrote:
         | I once took some matlab code and got a 200x+ improvement for a
         | couple of days work, the code was atrocious. So it really
         | matters to understand your tool. I think the person I took it
         | from was afraid to change anything that worked after it was
         | prototyped.
        
         | NooneAtAll3 wrote:
         | "Someone improved my code by 40,832,277,770%"
         | 
         | https://www.youtube.com/watch?v=c33AZBnRHks
        
           | ykonstant wrote:
           | Thanks for the link, this was very entertaining.
        
         | dgacmu wrote:
         | The irony is that his solution has an unnecessary log(n) term,
         | because it's finding each point independently instead of taking
         | advantage of the fact that the input points are also a dense
         | grid. ;) so it can probably run 2-10x faster than the optimized
         | version, assuming that the function call to bin search is
         | probably a lot more expensive than filling in the grid, which R
         | can do internally.
         | 
         | (You don't have to binary search each time, you have a known
         | starting point from the previous point. So you binary search
         | once to find the top-left point and the rest is just checking
         | if you moved into the next global grid cell.)
        
         | wredue wrote:
         | Yeah. We had an in house report generator that was epicly slow
         | (about 7 minutes to gen a single report)
         | 
         | We frequently have strange issue come in cause fucking vendors
         | can't stop making changes, then saying "nobody changed anything
         | that would cause that"!
         | 
         | Anyway. I told my boss I'm going to spend a day optimizing
         | this, and got back "we had people try, it's not really a
         | valuable use of time".
         | 
         | Long story short, I know that processing 10,000 lines of data
         | should absolutely not take 7 minutes and now it doesn't. Now we
         | can run our entire day in 30 seconds. No threads. No async.
         | Half a day of optimization.
         | 
         | I think part of the problem with optimization today is that
         | people are completely clueless about how fast computers are,
         | and thus stay happy with being slow just as a matter of
         | ignorance. The FP community pushing that all optimization is
         | premature optimization doesn't help either.
        
           | chriswarbo wrote:
           | Urgh, reminds me of a production error that was caused by a
           | job running on a timer, assuming that some other job would
           | have finished by then. That lead to two obvious questions:
           | 
           | - Why is it on a timer, instead of being triggered when the
           | first job was done?
           | 
           | - Why was that first job taking over an hour to process XML
           | data?
           | 
           | Turns out the second one was accidentally-quadratic, since it
           | was parsing a whole document to process its first record;
           | then re-parsing the whole document to process its second
           | record; etc. and this made it very slow when the XML was
           | reaching several hundred MB.
           | 
           | Unfortunately, the solution we went with was to set the timer
           | to run 20 minutes later...
        
       | wiseowise wrote:
       | Awaiting "something, something premature optimization" person to
       | show up in the comments.
        
         | hinkley wrote:
         | You'd think they would catch on. But then a foolish consistency
         | _is_ the hobgoblin of little minds.
        
         | zogrodea wrote:
         | "In established engineering disciplines a 12% improvement,
         | easily obtained, is never considered marginal; and I believe
         | the same viewpoint should prevail in software engineering." -
         | Also Knuth
        
           | eru wrote:
           | A 12% overall improvement might be worthwhile. But you have
           | to look at the bigger picture: if part A of your project
           | takes 99.9% of the time, and part B takes 0.1% of the time,
           | even a 100% improvement in part B (ie making it take 0 time)
           | might not be worthwhile.
        
             | jackthetab wrote:
             | a.k.a. [Amdahl's
             | Law](https://en.wikipedia.org/wiki/Amdahl%27s_law)
        
           | jononor wrote:
           | I agree. But it is critical to ensure that what one is
           | improving is something worthwhile. 12% improvement in compute
           | costs for a project with lots of compute spend = massive win.
           | 12% improvement in compute costs for a hobby project running
           | on a single VM = probably negligible. 12% improvement in
           | function runtime some arbitrary place in the stack = possibly
           | net negative, taking into account developer opportunity
           | costs. It is almost always useful to talk in terms of
           | outcomes, ie benefits to customer/users/team/company.
        
           | erk__ wrote:
           | The original quote is not by Knuth, though he made it well
           | known, it is by Tony Hoare.
        
         | cogman10 wrote:
         | Lol, yeah I get this quoted sometimes merely for suggesting the
         | right data structure to use.
         | 
         | "Use a set instead of a list, it's faster". "that's premature
         | optimization!"
        
           | saagarjha wrote:
           | Usually this criticism doesn't even hold water if you put
           | performance aside, since a set will often have nicer
           | ergonomics if you find yourself reaching for it. I see far
           | too much code that is like items.remove((item) => item ==
           | value) and if you're using the proper datastructure you don't
           | have to handroll your implementations each time, because they
           | are provided by default.
        
           | _dain_ wrote:
           | Premature pessimization.
        
           | The_Colonel wrote:
           | The real premature optimization is using List instead of Set
           | for guaranteed small sizes, since List is going to outperform
           | Set in those cases.
        
           | neandrake wrote:
           | Except in your example I've had numerous cases where bugs
           | have been introduced because the developer's primary thinking
           | to pick a set was "fast" and "no duplicates", only to later
           | realize that order of items is important. And looking back at
           | their choice, dealing with 5-20 items, using a list and
           | checking contains() not only had no performance impact but
           | better reflects the intent of "no duplicates". Instead the
           | set is left in place and the dev fixing it usually copies the
           | results in a list to be sorted the same order prior to being
           | in the set...
        
             | cogman10 wrote:
             | I've never seen this example. There are set implementations
             | which maintain order, but in my experience it's fairly rare
             | to both need "no duplicates" and "items inserted maintain
             | order".
             | 
             | It's fairly easy to remove the set and replace it with
             | something else if this does come up.
        
               | jwells89 wrote:
               | Maybe I'm misunderstanding, but isn't it pretty common to
               | e.g. have a list of non-duplicate items on screen which
               | have a specific order they're supposed to be displayed in
               | which can be changed as a result of user actions (like
               | adding new items or editing existing ones)?
        
         | _dain_ wrote:
         | People always seem to think "premature optimization" means
         | "premature optimization _for speed_ ". But there are lots of
         | otherwise-good things that you can prematurely optimize for:
         | - modularity         - readability         - abstraction
         | - maintainability         - scalability         - reusability
         | - testability         - cost
         | 
         | We've all seen codebases that are over-abstracted in the name
         | of code reuse, or to adhere to this or that software design
         | philosophy, and are nightmares as a result. "You Ain't Gonna
         | Need It" is a restatement of the premature-optimization adage,
         | just for a different metric.
        
           | whstl wrote:
           | And the "premature optimization" excuse is used almost every
           | time someone argues that over-abstraction can cause
           | performance issues.
        
             | hinkley wrote:
             | Or other forms of laziness.
        
           | munch117 wrote:
           | "Premature optimisation" refers to speed or size. Using the
           | phrase to talk about those other things is a stretch. (Except
           | maybe cost.)
           | 
           | You can _design for_ modularity, readability etc., but
           | _optimising for_ them doesn 't sound right. Something to do
           | with them not having well-defined, agreed-upon definitions of
           | where the optimum is.
        
             | hinkley wrote:
             | I work with a Principle of Maximum Power guy. He's
             | insufferable. Too smart to browbeat into behaving, and too
             | old to chalk it up to youthful exuberance or lack of
             | experience. He's made big chunks of the project comfortable
             | for him and miserable for everyone else, so of course he'll
             | never leave, and people who know they deserve better do.
        
             | chriswarbo wrote:
             | I think it's fair to talk about optimisation (premature or
             | otherwise) for anything fitting Blum's axioms. The classic
             | ones are number of instructions executed, wall-clock time,
             | memory usage, network requests, user interactions, calls to
             | some slow function (e.g. the comparison function, for
             | sorting algorithms), etc. but I think it's also applicable
             | to fuzzier concepts.
             | 
             | For example, modularity could be modelled as the time/edit-
             | distance/etc. it would take to swap-out N out of C
             | components, where C >> N. If we're defining things
             | centrally and passing them around via dependency-injection
             | then it's O(N) (each component is defined once, so change N
             | of them); if we're hard-coding references everywhere then
             | it's O(N*C) since we'll have to check for references in all
             | of the components, and each one needs to be checked for the
             | N possible swaps.
             | 
             | On the other hand, dependency injection is less optimal
             | than hard-coding for a metric like API size: for N
             | components, the number of references grows as O(N^2) (since
             | new components need to reference existing ones, and
             | existing ones will need to reference them); so APIs using
             | DI will have O(N^2) arguments.
        
               | munch117 wrote:
               | Fair enough. You can optimise for lots of things.
               | 
               | But when people warn against "premature optimisation",
               | specifically, they are advising you to give _higher_
               | priority to things like modularity, readability,
               | abstraction and maintainability, and not sacrifice them
               | all for a singular goal, typically speed.
        
           | hinkley wrote:
           | People deal with the cognitive dissonance by calling these
           | optimizations other things, like "architecture". But at the
           | end of the day, if you're changing your architecture for any
           | reason other than correctness, then you're optimizing. And if
           | you're having a hard time doing it, then you probably waited
           | too long.
           | 
           | "The Last Responsible Moment" is subtle. You have to include
           | user benefit (they may not care yet) versus snowballing
           | consequences of delay, versus having more and better
           | information later on.
           | 
           | One of the things that drew me to CI/CD was the notion that
           | you're not going to get better at handling your problems by
           | avoiding them. You can't learn to pick your optimization
           | battles if you never fight them. You won't learn something
           | I've been saying for almost 25 years now and I still haven't
           | heard come out of anyone else's mouth: there are code changes
           | that improve readability and performance. Making those
           | changes as refactors is hardly ever premature.
        
       | agilob wrote:
       | 10 years ago SQLite made a release dedicated to lots of
       | microoptimisations that resulted in massive speed ups
       | https://sqlite-users.sqlite.narkive.com/CVRvSKBs/50-faster-t...
       | 
       | Don't underestimate accumulation of small improvements.
        
         | andrepd wrote:
         | Hmm, but HN assured me that "premature optimisation is le root
         | of all evil" (nevermind that that's an incomplete quote) :)
        
           | forbiddenlake wrote:
           | What part of SQlite 3.8.7 are you saying is premature?
        
             | trealira wrote:
             | I don't think they were saying that SQLite's optimizations
             | were premature (since clearly they benefited the program as
             | a whole), but that people conflate micro-optimization with
             | premature optimization.
        
           | Bjartr wrote:
           | That optimization can be effective doesn't change that
           | spending time on it too early is often problematic
        
           | sfn42 wrote:
           | Sadly, HN was not able to help you understand what premature
           | optimization is.
           | 
           | What the SQLite team did was systematically measure the
           | performance of their system, identify potential
           | bottlenecks/inefficiencies, create and measure improvements
           | to these bottlenecks and choose the better option.
           | 
           | This is what we call optimization. Premature optimization is
           | when you have no idea how your system really performs, but
           | spend time trying to make it "better" without having a way to
           | determine whether your change made any difference.
        
           | moffkalast wrote:
           | Yeah, premature as in when you're designing the first
           | prototype and haven't yet had real life usage data on what's
           | the bottleneck.
        
           | mratsim wrote:
           | Premature slogans are the root of all evil.
        
       | kvemkon wrote:
       | From the article: "SerializationString improve performance"
       | (10.12.2023) https://github.com/ClickHouse/ClickHouse/pull/57717.
        
       | secondcoming wrote:
       | I love this stuff. My time is split 50/50 between my IDE and
       | Compiler Explorer.
       | 
       | Unfortunately not a lot of employers consider this a good use of
       | time.
        
         | quadrature wrote:
         | Thats pretty cool, what kind of stuff do you work on ?.
        
         | Sesse__ wrote:
         | That's the sad side. The flipside is that for the employers
         | that _do_ care, being able to use a profiler and repeatedly
         | identify optimizations is considered some sort of superpower
         | that very few of your coworkers possess. :-)
        
       | maksimur wrote:
       | Maybe it's because I am experimenting with improving my posture
       | through small and gradual optimizations, but I expected the
       | article to be about small optimizations in your life, rather than
       | code.
        
         | zogrodea wrote:
         | Small but consistent improvements in general seem like a great
         | idea! Wishing you the best.
        
       | 1970-01-01 wrote:
       | Look at video codecs if you want examples of small optimizations
       | adding together.
        
       | andsoitis wrote:
       | The deeper truth is more that impact derives from _leverage_.
       | 
       | Leverage might reside in architectural changes, but one can also
       | imagine other dimensions (not just altitude in application level)
       | that _also_ confer leverage.
        
       | danielovichdk wrote:
       | Am I misunderstanding this or is this an article about how
       | Chars[].resize works in a library they wrote by themselves ? Or
       | is it a misunderstanding of how resize works in general ?
       | 
       | I am not trying to take anything away from the author but digging
       | into assembly instead of simply looking up the docs seems like a
       | long road to go for calling it an optimization.
       | 
       | Another thing is that why does this even exist ? The tests are
       | showing some sql executions but is that real production
       | executions or simply to test a possible optimization in a
       | different context.
       | 
       | I am not mentioning in as a premature optimization because I am
       | not sure whether I am bright enough to tell what is going on.
       | 
       | Please clarify
        
       | Karellen wrote:
       | > Ratio of speedup(-) or slowdown(+): -1.662x
       | 
       | Wait, what? negative 1.6x?
        
       ___________________________________________________________________
       (page generated 2024-02-10 23:01 UTC)