[HN Gopher] Power of small optimizations
___________________________________________________________________
Power of small optimizations
Author : stacyz
Score : 136 points
Date : 2024-02-09 18:40 UTC (1 days ago)
(HTM) web link (maksimkita.com)
(TXT) w3m dump (maksimkita.com)
| optimalsolver wrote:
| This guy took one look at some scientists' code and instantly got
| a 14000x speedup:
|
| http://james.hiebert.name/blog/work/2015/09/14/CS-FTW.html
| darylteo wrote:
| With a bit more digging, but very similar
| https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...
| eru wrote:
| Interesting article!
|
| > Furthermore, there is literally no way to tell whether your
| program will ever actually terminate without actually executing
| it. And there are many, many, many, ways to write a program
| which make it "slow" to execute. "Slow" being... like really
| slow.
|
| Nitpick: there are some languages where you can tell whether
| your program will terminate without running it. There are even
| languages where you can tell whether your program will be slow.
|
| And there are even more specialised languages where any program
| will be 'fast'.
|
| Obviously, these languages aren't Turing complete. It's an
| interesting exercise to explore the design space to find
| languages that are both useful and avoid being Turing complete.
|
| Agda is an example of a language that allows you to run
| approximately any practical computation, but it's not Turing
| complete. Oversimplified: in Agda a program is only considered
| valid by the compiler, if it comes with a computer-checkable
| proof of halting on all inputs.
| towawy wrote:
| You can also construct programs in Turing complete languages
| to always halt (e.g. by bounding the number of iterations of
| all loops, usually with the tradeoff of limited input size).
| eru wrote:
| Yes, you can program in a subset of your language that is
| not Turing complete.
| The_Colonel wrote:
| You'd also need to forbid (even indirect) recursion, i. e.
| one function cannot appear in a callstack twice.
| zmgsabst wrote:
| I think you can allow finite recursions, eg, a maximum
| stack depth.
| trealira wrote:
| Isn't that overly restrictive? Some programs are
| guaranteed to terminate even with mutual recursion.
|
| For example, these (obviously inefficient) functions
| always terminate for all values of x, since both
| terminate given an input of 0, and 0 is the minimum
| possible input to these functions. bool
| even(unsigned int x) { return (n == 0) ?
| true : odd(n - 1); } bool odd(unsigned
| int x) { return (n == 0) ? false : even(n -
| 1); }
| brabel wrote:
| I remember when I was in COMPSCI-101 or something like that and
| was introduced to sorting, and after coming up with a basic
| bubble sort algorithm, thinking that there was no way that
| could be improved because it looked self-evident that you
| needed to do all the work the algorithm was doing or you
| couldn't be sure the list was really sorted. Then, of course, I
| was introduced to multiple better algorithms like merge sort
| and quick sort which showed just how wrong I was...
|
| Perhaps, for the "scientist" writing that initial code, their
| algorithm was also already the best that could be done because
| they just don't know much about algorithms and techniques to
| make them faster?! But once you learn about things like divide-
| and-conquer, work avoidance etc. it becomes pretty much self-
| evident when your algorithm sucks... it's really not that hard
| - definitely not for people whose job is stuff like climate
| science who have a very good understanding of maths.
|
| Maybe by teaching scientists the basic "smart" algorithms and
| just a little about algorithm analysis (Big O notation),
| massive computation improvements could be had.
| munch117 wrote:
| For you, me and James Hiebert, this sort of optimisation is
| exhilarating. Getting a half-hour job down to milliseconds?
| Awesome!
|
| But for the scientists that wrote the original code, maybe
| not. Maybe they think of this sort of thing as drudge work,
| something that doesn't really hold their interest. Their fun
| is in designing the mathematical concept, and turning it into
| code is just a chore.
|
| So yeah, we could teach scientists. But even better would be
| if we could provide scientists with tools that are just
| naturally fast when expressing problems on their own terms.
| zmgsabst wrote:
| Also, spending three hours optimizing away thirty minutes
| you only run four times is a net negative of one hour.
|
| You have to optimize the system, as a whole.
| brabel wrote:
| > Their fun is in designing the mathematical concept, and
| turning it into code is just a chore.
|
| It's not about fun. Sometimes it can be the difference
| between something being even possible or not. In this case,
| the author said they ran this algorithm hundreds of times.
| So changing it from 30 mins to 0.1 second makes things that
| were impossible before, possible. I don't find it fun at
| all to optimise, but I can see where I may need to in order
| to make things better, or possible at all... what I am
| suggesting is that anyone writing code, scientist or not,
| need to be aware of this - and know when they just MUST
| optimise.
| onos wrote:
| As an ex scientist, I think basic algorithms theory should
| be incorporated into scientific computing classes. I took a
| few of these but none of the concepts from this area was
| covered. I remember well discovering some code of mine was
| slowing down with system size and finally realizing it was
| because "append" was creating a new array each step... had
| no clue that would happen. Was enthralled by the online
| algorithms course when I finally discovered it - hash
| tables!!!
| EasyMark wrote:
| I once took some matlab code and got a 200x+ improvement for a
| couple of days work, the code was atrocious. So it really
| matters to understand your tool. I think the person I took it
| from was afraid to change anything that worked after it was
| prototyped.
| NooneAtAll3 wrote:
| "Someone improved my code by 40,832,277,770%"
|
| https://www.youtube.com/watch?v=c33AZBnRHks
| ykonstant wrote:
| Thanks for the link, this was very entertaining.
| dgacmu wrote:
| The irony is that his solution has an unnecessary log(n) term,
| because it's finding each point independently instead of taking
| advantage of the fact that the input points are also a dense
| grid. ;) so it can probably run 2-10x faster than the optimized
| version, assuming that the function call to bin search is
| probably a lot more expensive than filling in the grid, which R
| can do internally.
|
| (You don't have to binary search each time, you have a known
| starting point from the previous point. So you binary search
| once to find the top-left point and the rest is just checking
| if you moved into the next global grid cell.)
| wredue wrote:
| Yeah. We had an in house report generator that was epicly slow
| (about 7 minutes to gen a single report)
|
| We frequently have strange issue come in cause fucking vendors
| can't stop making changes, then saying "nobody changed anything
| that would cause that"!
|
| Anyway. I told my boss I'm going to spend a day optimizing
| this, and got back "we had people try, it's not really a
| valuable use of time".
|
| Long story short, I know that processing 10,000 lines of data
| should absolutely not take 7 minutes and now it doesn't. Now we
| can run our entire day in 30 seconds. No threads. No async.
| Half a day of optimization.
|
| I think part of the problem with optimization today is that
| people are completely clueless about how fast computers are,
| and thus stay happy with being slow just as a matter of
| ignorance. The FP community pushing that all optimization is
| premature optimization doesn't help either.
| chriswarbo wrote:
| Urgh, reminds me of a production error that was caused by a
| job running on a timer, assuming that some other job would
| have finished by then. That lead to two obvious questions:
|
| - Why is it on a timer, instead of being triggered when the
| first job was done?
|
| - Why was that first job taking over an hour to process XML
| data?
|
| Turns out the second one was accidentally-quadratic, since it
| was parsing a whole document to process its first record;
| then re-parsing the whole document to process its second
| record; etc. and this made it very slow when the XML was
| reaching several hundred MB.
|
| Unfortunately, the solution we went with was to set the timer
| to run 20 minutes later...
| wiseowise wrote:
| Awaiting "something, something premature optimization" person to
| show up in the comments.
| hinkley wrote:
| You'd think they would catch on. But then a foolish consistency
| _is_ the hobgoblin of little minds.
| zogrodea wrote:
| "In established engineering disciplines a 12% improvement,
| easily obtained, is never considered marginal; and I believe
| the same viewpoint should prevail in software engineering." -
| Also Knuth
| eru wrote:
| A 12% overall improvement might be worthwhile. But you have
| to look at the bigger picture: if part A of your project
| takes 99.9% of the time, and part B takes 0.1% of the time,
| even a 100% improvement in part B (ie making it take 0 time)
| might not be worthwhile.
| jackthetab wrote:
| a.k.a. [Amdahl's
| Law](https://en.wikipedia.org/wiki/Amdahl%27s_law)
| jononor wrote:
| I agree. But it is critical to ensure that what one is
| improving is something worthwhile. 12% improvement in compute
| costs for a project with lots of compute spend = massive win.
| 12% improvement in compute costs for a hobby project running
| on a single VM = probably negligible. 12% improvement in
| function runtime some arbitrary place in the stack = possibly
| net negative, taking into account developer opportunity
| costs. It is almost always useful to talk in terms of
| outcomes, ie benefits to customer/users/team/company.
| erk__ wrote:
| The original quote is not by Knuth, though he made it well
| known, it is by Tony Hoare.
| cogman10 wrote:
| Lol, yeah I get this quoted sometimes merely for suggesting the
| right data structure to use.
|
| "Use a set instead of a list, it's faster". "that's premature
| optimization!"
| saagarjha wrote:
| Usually this criticism doesn't even hold water if you put
| performance aside, since a set will often have nicer
| ergonomics if you find yourself reaching for it. I see far
| too much code that is like items.remove((item) => item ==
| value) and if you're using the proper datastructure you don't
| have to handroll your implementations each time, because they
| are provided by default.
| _dain_ wrote:
| Premature pessimization.
| The_Colonel wrote:
| The real premature optimization is using List instead of Set
| for guaranteed small sizes, since List is going to outperform
| Set in those cases.
| neandrake wrote:
| Except in your example I've had numerous cases where bugs
| have been introduced because the developer's primary thinking
| to pick a set was "fast" and "no duplicates", only to later
| realize that order of items is important. And looking back at
| their choice, dealing with 5-20 items, using a list and
| checking contains() not only had no performance impact but
| better reflects the intent of "no duplicates". Instead the
| set is left in place and the dev fixing it usually copies the
| results in a list to be sorted the same order prior to being
| in the set...
| cogman10 wrote:
| I've never seen this example. There are set implementations
| which maintain order, but in my experience it's fairly rare
| to both need "no duplicates" and "items inserted maintain
| order".
|
| It's fairly easy to remove the set and replace it with
| something else if this does come up.
| jwells89 wrote:
| Maybe I'm misunderstanding, but isn't it pretty common to
| e.g. have a list of non-duplicate items on screen which
| have a specific order they're supposed to be displayed in
| which can be changed as a result of user actions (like
| adding new items or editing existing ones)?
| _dain_ wrote:
| People always seem to think "premature optimization" means
| "premature optimization _for speed_ ". But there are lots of
| otherwise-good things that you can prematurely optimize for:
| - modularity - readability - abstraction
| - maintainability - scalability - reusability
| - testability - cost
|
| We've all seen codebases that are over-abstracted in the name
| of code reuse, or to adhere to this or that software design
| philosophy, and are nightmares as a result. "You Ain't Gonna
| Need It" is a restatement of the premature-optimization adage,
| just for a different metric.
| whstl wrote:
| And the "premature optimization" excuse is used almost every
| time someone argues that over-abstraction can cause
| performance issues.
| hinkley wrote:
| Or other forms of laziness.
| munch117 wrote:
| "Premature optimisation" refers to speed or size. Using the
| phrase to talk about those other things is a stretch. (Except
| maybe cost.)
|
| You can _design for_ modularity, readability etc., but
| _optimising for_ them doesn 't sound right. Something to do
| with them not having well-defined, agreed-upon definitions of
| where the optimum is.
| hinkley wrote:
| I work with a Principle of Maximum Power guy. He's
| insufferable. Too smart to browbeat into behaving, and too
| old to chalk it up to youthful exuberance or lack of
| experience. He's made big chunks of the project comfortable
| for him and miserable for everyone else, so of course he'll
| never leave, and people who know they deserve better do.
| chriswarbo wrote:
| I think it's fair to talk about optimisation (premature or
| otherwise) for anything fitting Blum's axioms. The classic
| ones are number of instructions executed, wall-clock time,
| memory usage, network requests, user interactions, calls to
| some slow function (e.g. the comparison function, for
| sorting algorithms), etc. but I think it's also applicable
| to fuzzier concepts.
|
| For example, modularity could be modelled as the time/edit-
| distance/etc. it would take to swap-out N out of C
| components, where C >> N. If we're defining things
| centrally and passing them around via dependency-injection
| then it's O(N) (each component is defined once, so change N
| of them); if we're hard-coding references everywhere then
| it's O(N*C) since we'll have to check for references in all
| of the components, and each one needs to be checked for the
| N possible swaps.
|
| On the other hand, dependency injection is less optimal
| than hard-coding for a metric like API size: for N
| components, the number of references grows as O(N^2) (since
| new components need to reference existing ones, and
| existing ones will need to reference them); so APIs using
| DI will have O(N^2) arguments.
| munch117 wrote:
| Fair enough. You can optimise for lots of things.
|
| But when people warn against "premature optimisation",
| specifically, they are advising you to give _higher_
| priority to things like modularity, readability,
| abstraction and maintainability, and not sacrifice them
| all for a singular goal, typically speed.
| hinkley wrote:
| People deal with the cognitive dissonance by calling these
| optimizations other things, like "architecture". But at the
| end of the day, if you're changing your architecture for any
| reason other than correctness, then you're optimizing. And if
| you're having a hard time doing it, then you probably waited
| too long.
|
| "The Last Responsible Moment" is subtle. You have to include
| user benefit (they may not care yet) versus snowballing
| consequences of delay, versus having more and better
| information later on.
|
| One of the things that drew me to CI/CD was the notion that
| you're not going to get better at handling your problems by
| avoiding them. You can't learn to pick your optimization
| battles if you never fight them. You won't learn something
| I've been saying for almost 25 years now and I still haven't
| heard come out of anyone else's mouth: there are code changes
| that improve readability and performance. Making those
| changes as refactors is hardly ever premature.
| agilob wrote:
| 10 years ago SQLite made a release dedicated to lots of
| microoptimisations that resulted in massive speed ups
| https://sqlite-users.sqlite.narkive.com/CVRvSKBs/50-faster-t...
|
| Don't underestimate accumulation of small improvements.
| andrepd wrote:
| Hmm, but HN assured me that "premature optimisation is le root
| of all evil" (nevermind that that's an incomplete quote) :)
| forbiddenlake wrote:
| What part of SQlite 3.8.7 are you saying is premature?
| trealira wrote:
| I don't think they were saying that SQLite's optimizations
| were premature (since clearly they benefited the program as
| a whole), but that people conflate micro-optimization with
| premature optimization.
| Bjartr wrote:
| That optimization can be effective doesn't change that
| spending time on it too early is often problematic
| sfn42 wrote:
| Sadly, HN was not able to help you understand what premature
| optimization is.
|
| What the SQLite team did was systematically measure the
| performance of their system, identify potential
| bottlenecks/inefficiencies, create and measure improvements
| to these bottlenecks and choose the better option.
|
| This is what we call optimization. Premature optimization is
| when you have no idea how your system really performs, but
| spend time trying to make it "better" without having a way to
| determine whether your change made any difference.
| moffkalast wrote:
| Yeah, premature as in when you're designing the first
| prototype and haven't yet had real life usage data on what's
| the bottleneck.
| mratsim wrote:
| Premature slogans are the root of all evil.
| kvemkon wrote:
| From the article: "SerializationString improve performance"
| (10.12.2023) https://github.com/ClickHouse/ClickHouse/pull/57717.
| secondcoming wrote:
| I love this stuff. My time is split 50/50 between my IDE and
| Compiler Explorer.
|
| Unfortunately not a lot of employers consider this a good use of
| time.
| quadrature wrote:
| Thats pretty cool, what kind of stuff do you work on ?.
| Sesse__ wrote:
| That's the sad side. The flipside is that for the employers
| that _do_ care, being able to use a profiler and repeatedly
| identify optimizations is considered some sort of superpower
| that very few of your coworkers possess. :-)
| maksimur wrote:
| Maybe it's because I am experimenting with improving my posture
| through small and gradual optimizations, but I expected the
| article to be about small optimizations in your life, rather than
| code.
| zogrodea wrote:
| Small but consistent improvements in general seem like a great
| idea! Wishing you the best.
| 1970-01-01 wrote:
| Look at video codecs if you want examples of small optimizations
| adding together.
| andsoitis wrote:
| The deeper truth is more that impact derives from _leverage_.
|
| Leverage might reside in architectural changes, but one can also
| imagine other dimensions (not just altitude in application level)
| that _also_ confer leverage.
| danielovichdk wrote:
| Am I misunderstanding this or is this an article about how
| Chars[].resize works in a library they wrote by themselves ? Or
| is it a misunderstanding of how resize works in general ?
|
| I am not trying to take anything away from the author but digging
| into assembly instead of simply looking up the docs seems like a
| long road to go for calling it an optimization.
|
| Another thing is that why does this even exist ? The tests are
| showing some sql executions but is that real production
| executions or simply to test a possible optimization in a
| different context.
|
| I am not mentioning in as a premature optimization because I am
| not sure whether I am bright enough to tell what is going on.
|
| Please clarify
| Karellen wrote:
| > Ratio of speedup(-) or slowdown(+): -1.662x
|
| Wait, what? negative 1.6x?
___________________________________________________________________
(page generated 2024-02-10 23:01 UTC)