[HN Gopher] Making Code Faster
___________________________________________________________________
Making Code Faster
Author : zdw
Score : 82 points
Date : 2022-06-12 16:16 UTC (1 days ago)
(HTM) web link (www.tbray.org)
(TXT) w3m dump (www.tbray.org)
| mathiasrw wrote:
| You cant make code faster. You can only make it do less things.
| dr-detroit wrote:
| 1vuio0pswjnm7 wrote:
| Can anyone state the JSON parsing problem more concisely. For
| example,
|
| 1. Here is the input, e.g.,
| https://data.sfgov.org/api/views/acdm-wktn/rows.json?accessT...
|
| 2. Here is the desired output, e.g., a sample showing what the
| output is supposed to look like
| lumost wrote:
| I'm honestly skeptical if
|
| > Make it work, then make it right, then make it fast.
|
| Applies to core language choice, I used to hear a lot about
| rewriting interpreted langs to something faster... but the
| reality is that a team that's just spent 1 year making an app in
| python aren't going to pivot to writing Java, Go, or Rust one
| day. The new team isn't going to be building something new and
| exciting, they are going to be on a tight timeline to deliver a
| port of something which already exists.
| [deleted]
| secondcoming wrote:
| Maybe, I mainly use python as a more sophisticated bash script,
| or to prototype something I'll port to a performant language
| later.
| welder wrote:
| Just did this, rewrote a deployed cli from Python[1] to Go[2].
| It delivered on all the expectations. Less CPU & RAM usage,
| less dependence on system stuff like openssl, more diverse
| platform support.
|
| [1] https://github.com/wakatime/legacy-python-cli
|
| [2] https://github.com/wakatime/wakatime-cli
| ChrisMarshallNY wrote:
| In my case, the choice is made for me (I write Apple native, so
| Swift, it is...).
|
| I have found that his advice on using profilers was very
| important. I _thought_ I knew where it would be slow, but the
| profiler usually stuck its tongue out at me and laughed.
|
| When I ran a C++ shop, optimization was a black art. Things
| like keeping code and pipelines in low-level caches could have
| 100X impact on performance, so we would do things like optimize
| to avoid breaking cache. This often resulted in things like
| copying and pasting lines of code to run sequentially, instead
| of in a loop, or as a subroutine/method.
|
| It's difficult. There's usually some "low-hanging fruit" that
| will give _awesome_ speedups, then, it gets difficult.
| mrfox321 wrote:
| Do your tricks still apply to modern c++ compilers?
| moonchild wrote:
| I cannot speak for c++, but I recently rewrote an optimised
| c routine in assembly. A 2-4x speedup obtained. I might
| have gotten partway with careful tuning of the source, the
| way the parent suggests, but I could have not gotten all
| the way.
| favorited wrote:
| [Not GP] Modern optimizers are great, but writing cache-
| efficient code is still up to the programmer.
| jandrewrogers wrote:
| Modern C++ compilers are mixed bag when it comes to
| optimization, brilliant in some areas and inexplicably
| obtuse in others requiring the programmer to be quite
| explicit. This has improved significantly with time but I
| am still sometimes surprised by the code the compiler has
| difficulty optimizing, or which are only recognized if
| written in very narrow ways. As a practical matter,
| compilers have a limited ability to reason over large areas
| of code in the way a programmer can but even very local
| optimizations are sometimes missed.
|
| This is why it is frequently helpful to look at the code
| generated by the C++ compiler. It gives you insight into
| the kinds of optimizations the compiler can see and which
| ones it can't, so you can focus on the ones it struggles
| with. This knowledge becomes out-of-date on the scale of
| years, so I periodically re-check what I think I know about
| what the compiler can optimize.
|
| For some things, like vectorization, the compiler's
| optimizer almost never produces a good result for non-
| trivial code and you'll have to do it yourself.
| orangepurple wrote:
| I recommend using Godbolt (https://godbolt.org/) to view
| the assembly output of your compiled language (C++, etc)
| jandrewrogers wrote:
| I should add that some types of optimizations (cache
| efficiency being a big one) are outside the scope of the
| compiler because it is implicitly part of the code
| specification that the compiler needs to faithfully
| reproduce e.g. data structure layout is required to be a
| certain way for interoperability reasons.
| colechristensen wrote:
| I don't know, I've been places where select pieces of
| infrastructure were being rewritten from language X to Y for
| performance reasons, and it seemed to be going just fine. It
| wasn't "we're rewriting everything now" but finding performance
| bottlenecks and fixing them by rewriting pieces in the new
| language.
|
| It works if you have lots of things communicating over APIs.
| anothernewdude wrote:
| My team literally do this all the time. Python for everything,
| Rust for the places where it doesn't keep up.
| edflsafoiewq wrote:
| IME fast software doesn't actually use this "waterfall" model.
| There needs to be feedback from performance considerations into
| semantics.
| ayberk wrote:
| This definitely doesn't apply to developer tools in general
| (and sometimes also to infrastructure).
|
| Anyone who had the "pleasure" of using Amazon's internal tools
| can talk about how "Make it work ASAP" attitude has worked out
| for their internal tooling :) LPT anyone?
| djmips wrote:
| premature optimization is the root of all evil.
|
| "I approve too! But... Sometimes it just has to be fast, and
| sometimes that means the performance has to be designed in.
| Nelson agrees and has smart things to say on the subject."
|
| - so... you're saying premature optimization isn't the root of
| all evil. Maybe it's time to retire this tired 'conventional
| wisdom'
| [deleted]
| runevault wrote:
| I hate "premature optimization" so much.
|
| The only optimization type I understand that with is very
| isolated but convoluted fixes that buy performance at the cost
| of readability/etc. Intelligent architecture that is fast, so
| long as it is does not completely obfuscate the intent of the
| code, is not premature.
| jaywalk wrote:
| If you already know that something has to be fast, then it's
| not really _premature_ optimization.
| trashtester wrote:
| I would say, in most cases it pays to make the overall design
| of the code in a way that enables fast execution. This
| includes selection of languages and libraries.
|
| Identifying loops or recursion with many iterations (high N)
| with a high Big O order may also be avoided during the design
| stage, of you KNOW the N and O-order at design time. (And
| don't worry about the O-order for low N problems, that is
| premature optimization most of the time.).
|
| On the other hand, tweaking every single statement or test in
| ways that save 1-2 cycles is rarely worth it. Often you end
| up with obfuscated code that may even make it harder for the
| compiler to optimize for. This is the 97% of the code where
| premature optimization makes things harder.
|
| Some programmers may turn this on its head. They don't
| understand the implications of the design choices, but try to
| make up for that by employing all sorts of tricks (that may
| or may not provide some small benefit) in the main body of
| the code.
| bee_rider wrote:
| I don't think this follows. If premature optimization is the
| root of all evil, essentially, we're saying "for all evil,
| there exists a premature optimization which leads to it." If
| evil, then sourced in a premature optimization.
|
| Even if we parse his "But..." as saying, "there exists some
| premature optimization which is not the root of an evil"
| (ignoring probably valid quibbles about whether the
| optimizations he's talking about truly are premature), this
| doesn't contradict the original statement.
|
| In fact, "the root of all evil" seems to be an expression which
| invites us to commit the fallacy of denying the antecedent --
| if premature optimization, then evil, in this case -- because
| it is almost always used to indicate that the first thing is
| bad.
| dahart wrote:
| > you're saying premature optimization isn't the root of all
| evil. Maybe it's time to retire this tired 'conventional
| wisdom'
|
| Funny you mention it. I have a bit of a habit now of reminding
| people what the remainder of Knuth's quote actually was.
|
| "We should forget about small efficiencies, say about 97% of
| the time: premature optimization is the root of all evil. Yet
| we should not pass up our opportunities in that critical 3%."
|
| The irony of walking away with the impression that Knuth was
| saying to not do optimization is that his point was the exact
| opposite of that, he was emphasizing the word _premature_ , and
| then saying we absolutely should optimize, after profiling.
|
| This all agrees completely with the article's takeaways: build
| it right first, then _measure_ , and then optimize the stuff
| that you can see is the slowest.
| astrange wrote:
| The 97% does have its own performance opportunities; rather
| than making it faster you want to stop it from getting in the
| way of the important stuff, by reducing code size or tendency
| to stomp on all the caches or things like that.
|
| Anyone optimizing via microbenchmarks or wall time
| exclusively isn't going to see this.
| dralley wrote:
| Not to mention the paragraph that immediately follows the more
| famous one, in Djiksta's essay:
|
| >> Yet we should not pass up our opportunities in that critical
| 3%. A good programmer will not be lulled into complacency by
| such reasoning, he will be wise to look carefully at the
| critical code; but only after that code has been identified.
| doodpants wrote:
| No, they're saying that not all optimization is premature. Or
| that upfront performance considerations in the design are not
| necessarily a case of premature optimization.
| huachimingo wrote:
| Which one is faster? (C code) Return: see if abs(num) > x.
|
| / _logical comparison_ / int greater_abs(int num, int x){ return
| (num > x) || (num+x < 0); }
|
| / _squared approach_ / int greater_abs2(int num, int x){ return
| num*num > x; }
|
| See it by yourself, with (and without) optimizations:
| https://godbolt.org/
|
| What would happen if x is a compile-time constant?
| dahart wrote:
| Math & logic are rarely the bottleneck over memory & allocation
| bottlenecks, right? Does Godbolt assume x86? Does the answer
| change depending on whether you're using an AMD or NVIDIA GPU,
| or an Apple, ARM or Intel processor? Does it depend on which
| instruction pipelines are full or stalled from the surrounding
| code, e.g., logic vs math? Hard to say if one of these will
| always be better. There are also other alternatives, e.g.
| bitmasking, that might generate fewer instructions... maybe
| "abs(num) > x" will beat both of those examples?
| masklinn wrote:
| > Does Godbolt assume x86?
|
| Godbolt uses whatever compilers, targets, and optimisations
| you ask it to.
|
| It is, in fact, a very useful tool for comparing different
| compilers, architectures, and optimization settings.
| astrange wrote:
| Most questions like these have no answer because if any of the
| parameters is known (which it usually is) it'll get folded away
| to nothing.
| WalterGR wrote:
| No idea.
|
| How frequently am I calling `abs`?
| [deleted]
| jjice wrote:
| If you write math heavy code, probably a lot more than if
| you're writing web apps. Depends on what kind of software you
| write.
| WalterGR wrote:
| Got it.
|
| Well if that were the case, I'd use a profiler to see if
| spending time on optimizing 'abs' would realistically be
| worth it.
| throwaway744678 wrote:
| I don't know which one is faster, but I know that one is not
| correct (squared approach).
| saghm wrote:
| Wouldn't the second one also potentially be incorrect due to
| overflow?
| pjscott wrote:
| Yes. Suppose that both numbers are positive, that x>num,
| and that x+num is bigger than INT_MAX. In that case we hit
| signed integer overflow, which is undefined behavior. If
| signed integer overflow happens to wrap around, which it
| might, then the result could be negative and the function
| would return the wrong result. Or anything else could
| happen; undefined behavior is undefined.
|
| In practice, just writing "abs(num) > x" gives quite good
| machine code, and it does so without introducing hard-to-
| see bugs.
| [deleted]
| zasdffaa wrote:
| Depends. In the first it will depend on the branch predictor
| which will depend on the relative expected magnitudes of num
| and x
|
| In the 2nd, which I assume should be { return
| num*num > x * x; }
|
| then it depends on the micro-arch, as it's one basic block so
| no branches and assuming a deep pipeline on x64, one multiplier
| (pipelined), probably this is faster for 'random-ish' num and
| x.
| [deleted]
| dhosek wrote:
| Your squared approach is wrong: greater_abs2(3, 4) returns true
| but should return false.
| [deleted]
| jbverschoor wrote:
| > CPU time is always cheaper than an engineer's time.
|
| I hate this quote, but less than the "Memory is cheap" mantra..
|
| For "CPU time", if it's a critical path for something with a lot
| of users and/or where performance is key, the engineer's time is
| just a fraction.
| [deleted]
| [deleted]
| bagels wrote:
| The critical factor is scale. Thousands of servers replaced
| with hundreds by more efficient code can be worthwhile.
| djmips wrote:
| That's not the only factor. Sometimes it's weight or power or
| a fixed system like embedded, console or a particular product
| that's not easily upgradable.
| trashtester wrote:
| You don't need thousands of servers to make it worthwhile. If
| you have code that runs constantly on 11 servers (possibly in
| k8s) that each cost $1k/month in AWS, and you spend a few
| weaks optimizing it down to needing 1 server, those weeks
| just generated the equivalent of a YEARLY $120k revenue
| stream.
|
| If you require an ROI of 5 years, no interest included, you
| just created $600k in value over a few weeks.
| jcalabro wrote:
| Yeah for sure. One thing I think about often is if you're
| writing a compiler and it's slow (say it takes 10s per
| build), if you have 1000's of engineers running it 100x per
| day, that starts to add up quick. If you could get it down to
| 1 second, then you'd save a lot of actual engineer time, just
| not your own.
|
| Scale is key here, but fast software is always a much better
| user experience than slow stuff as well. With the compiler
| example, if it takes 1s as opposed to 1h, then users can
| iterate much more quickly and get a lot of flexibility.
| [deleted]
| TimPC wrote:
| This means working at companies with massive scale can be far
| better because you get to make code performant and optimize
| things rather than just focus on getting features out the
| door.
| jbverschoor wrote:
| Yup. It should, and in general it is.. Look at operating
| systems, compared to <insert random app>.
|
| Somehow programmers care about the big O notation, but not
| when it's about other people's time.
| jacobolus wrote:
| Moreover, when some operation gets sped up by a orders of
| magnitude, it can be used for new things that you'd never
| consider when it was relatively more expensive.
|
| Something that used to be precomputed offline can be done in
| real time. Something that used to work on a small samples can
| be applied to the whole data set. Something that used to be
| coarsely approximated can be computed precisely. Something that
| used to require large clusters of machines can be handled on
| customers' client devices. Something that used to be only an
| end in itself can be used as a building block for a higher-
| level computation. Etc.
| exyi wrote:
| Yea. CPU time might be cheap but if someone is waiting for that
| CPU, you are now wasting someone's time.
| secondcoming wrote:
| It explains why the modern web experience is typically so
| crappy.
| saagarjha wrote:
| Indeed. At scale quotes like these are generally put on the
| backburner and the performance team will deliver gains that
| justify staffing the team, and then some. At work (though I'm
| not directly involved in this) there's even a little table that
| gives you a rough idea of what kind of win you need to save the
| company the equivalent of an engineer. If you do it in the
| right spot it's not really even that much (though, of course,
| finding a win there is probably going to be very difficult).
| runevault wrote:
| The part where he talks about the quality of benchmarking tools
| reminded me, a library I think worth mentioning for the .NET
| crowd is BenchmarkdotNet. Not only can it tell you time for a
| given test to run (after doing warmups to try and mitigate
| various problems with running tests cold), it also has options to
| see how the GC performed, at every generation level. Is this code
| getting to later generations, or is it staying ephemeral which is
| cheaper to use and destroy? Or can you avoid heap allocations
| entirely?
|
| Edit: Oh and I should mention if you want the method level
| tracking on a run similar to some of his screenshots, if you pay
| for a Resharper license that comes with dotTrace I believe which
| gives you that sort of tracking.
| GordonS wrote:
| dotTrace fantastic, really essential for performance profiling.
| Likewise, dotMemory is really good when trying to reduce or
| understand memory usage (tho dotTrace does have some memory
| tooling too). I've been happily paying for a JetBrains Ultimate
| personal license for a few years now.
|
| There are very few companies that I'm really rooting for, but
| JetBrains is absolutely one.
| runevault wrote:
| I have a resharper ultimate license through work and a full
| Jetbrains ultimate at home (I switched to Rider for
| C#/F#/Unity dev in the past 6 months and really liking it,
| along with CLion for the times I'm writing rust).
|
| One time at work I dug up something that removed 75% of the
| runtime of an application because it turned out taking the
| length of an image was actually a method even though it
| looked like a simple property, so I cached it at the start of
| processing each image instead of foring over it over and
| over. It was insane how much faster the code became. I
| tracked that down with dotTrace.
|
| And yeah dotMemory is also fantastic, I've dug up some GNARLY
| memory usage with it. Probably should have mentioned it since
| I was bringing up the memory portion of BenchmarkdotNet.
| cube2222 wrote:
| Go's built-in performance analysis tooling is so excellent.
|
| The profiler, which can do CPU, memory, goroutines, blocking,
| etc. and can display all of that as a graph or a flame graph, as
| well as `go tool trace` which gives you a full execution trace,
| including lots of details about the work the GC does. All that
| with interactive local web-based viewers.
|
| Performance optimization is always so fun with it.
| timbray wrote:
| Yep, but you know, I can't stand that web-based viewer, it's
| got this hair trigger zoom and if I breathe in the direction of
| the mouse the graph goes hurtling in or out. I used to look at
| the PDFs but now I just stay in GoLand, which give you
| everything you need.
| henning wrote:
| > If you're writing Ruby code to turn your blog drafts into HTML,
| it doesn't much matter if republishing takes a few extra seconds
|
| Unless your software is intended to reload blog posts live like
| Hugo/Jekyll/other static site generators and it takes ~5,000 ms
| on an i9 machine when it could take 100 ms if different languages
| and different implementation choices were made. This is the story
| of modern software: "I don't care how much time and computing
| power I waste. It's not the core app of some FAANG giant, so I'm
| not going to bother, ever."
| dwrodri wrote:
| I agree to the extent that people fail to realize how they
| could be missing out opportunities to innovate or corner a
| market when they leave performance on the table. Quite often,
| new products become possible when a basic task like rendering
| HTML goes from taking 10 seconds to 10ms. I think you can paint
| the problem in broad strokes from either side of the "how
| important is performance?" argument.
|
| From a "get the bills paid" point-of-view, any good project
| manager also has to know when to tell an engineer to focus on
| getting the product shipped instead of chasing that next 5% in
| throughput/latency reduction/overhead. I've seen my fair share
| of programmers (including myself) refuse to ship to a project
| because the pursuit of some "undesirable" latency and not
| finish more important features.
|
| For tasks like video streaming, Automation software (CI
| pipelines to robotics), video games, professional tools for
| content creation (DAWs, video editing, Blender, etc.)
| performance is the feature, but then your product is helping
| them get the bills paid faster. Medical apparatus(es?) and
| guidance software on autonomous vehicles are examples of where
| latency is a life-or-death situation.
|
| I think everyone would benefit from playing with OpenRTOS, or
| writing some code that deals with video/audio where there are
| hard deadlines on latency. But I'm never gonna hold some
| weekend-project static site generator in Ruby to the same
| standard as macOS.
| makapuf wrote:
| Agreed with what you said, but even for Blender, performance
| is important but the feature are Free software, good modeler,
| correct, feature-packed, good looking renderer AND
| performance.
| Jtsummers wrote:
| Probably why he had the next paragraph:
|
| > But for code that's on the critical path of a service back-
| end running on a big sharded fleet with lots of zeroes in the
| "per second" numbers, it makes sense to think hard about
| performance at design time.
|
| Your scenario falls under that category.
| _gabe_ wrote:
| > Unless your software is intended to reload blog posts live
| like Hugo/Jekyll/other static site generators
|
| _This_ falls into this?
|
| > But for code that's on the critical path of a service back-
| end running on a big sharded fleet with lots of zeroes in the
| "per second" numbers
|
| A static site generator is a simple utility that should take
| 100ms at most on a modern crappy laptop. It's not some
| backend service that's in a critical path, but it's also not
| a difficult engineering task. Parse input, produce output.
| This isn't something that should take seconds, which I think
| is what the OP was getting at. But because of the language
| choices made, and the millions of lines of bloated code, it
| does take seconds.
| Jtsummers wrote:
| Yes. Maybe not directly, but more like it versus a one-off
| "I just made a new blog post and can wait 10 seconds for it
| to render and deploy." If you're reloading the blog posts
| live then you're on the critical path, not a one-off
| anymore. So you need to think about performance.
|
| In the former case, the performance really doesn't matter.
| If you've got a personal blog, does it matter if your
| update takes 10 seconds or 1 second? Probably not, if it
| does it's the most important blog in the world.
|
| If you've got customers with many blogs and need to take
| their updates and render them, then the performance matters
| because it's shifted from 1 to many (hundreds? thousands?).
| And now that 10 second delay is a big issue, you're either
| using a fleet of servers to handle the load or some
| customers don't see updates for days (oops).
| paganel wrote:
| It was the "bad" untyped languages like PHP, Python and Ruby
| (Perl was too complicated for us, mere mortals) that saved the
| web from becoming a Microsoft monopoly, or, more likely, an
| oligopoly between the same MS, probably Sun, probably IBM or
| some such. I'm talking about most of the 2000s decade. True,
| the web has become sort of an oligopoly right now, but at least
| that was not caused by the programming languages and web
| frameworks that power it.
|
| What I'm trying to say is that those languages that everyone is
| quick to judge right now have given us 10, maybe 15 years of
| extra "life", a period when most of us have "made" our careers
| and, well, most of our money (those who have managed to make
| that money, that is). We wouldn't have had (what basically are)
| web companies worth tens, if not hundreds of billions of
| dollars, if the web had still meant relying on Struts or on
| whatever it is Microsoft was putting forward as a web framework
| in the mid-2000s. We wouldn't have had engineers taking home TC
| worth 500-600k and then complaining that Python or Ruby are not
| what the world needs.
| rentfree-media wrote:
| But you might have a viable visual page builder. It's
| honestly a tough choice at times...
| j-james wrote:
| BlueGriffon is pretty good.
| dahfizz wrote:
| I wonder how heavily other fields sacrifice in the name of
| "ease of implementation".
|
| Could we have houses that are 100x stronger and longer lasting
| if we allowed a few extra weeks of construction time? Could we
| 10x battery capacity with a slightly more sophisticated
| manufacturing process?
|
| I don't think many developers nowadays understand how fast
| computers are, nor how much needless bloat is in their
| software.
| ajmurmann wrote:
| For the economics of that comparison to make sense, we should
| also include material inputs to the construction. It doesn't
| matter to the cost of the final product if it was materials
| that were saved or labor cost. Software is the special case
| where cost almost entirely comes down to labor. So given
| that, we are of course constantly compromising quality. We
| realize that less though because buildings are more
| standardized.
| adamdusty wrote:
| I work in III/V semiconductors. The product development goes
| like: identify need, design, attempt a proof of concept, DoEs
| to determine manufacturing costs/viability, repeat until you
| have a statistically controlled viable process. There is
| essentially no room for technical creativity like there is in
| software. If we spent an extra year on our worst performing
| product (we've done this), we would get at best 5%
| improvement with iffy reproducibility.
|
| I dont know about construction or batteries.
| TimPC wrote:
| The big issue in other fields mostly isn't manufacturing
| design or construction process. The main bottleneck is with
| physical goods and long distances travelled you're subject to
| high shipping costs so people cut all sorts of corners to
| make things less bulky and lighter. Cheap plastics are far
| lighter than wood or metals for instance so we see more
| plastics get used.
| astrange wrote:
| You could have a lot more good enough and much cheaper houses
| in the US, but we banned manufactured/prefab houses due to
| lobbying from the "build everything individually out of wood"
| lobby, require huge setbacks due to the front lawn lobby, and
| various other things like overly wide roads because of out of
| date fire codes.
| agumonkey wrote:
| aws recently blogged about price benefits of using more
| performant compiled languages on their servers, it's coming
___________________________________________________________________
(page generated 2022-06-13 23:00 UTC)