[HN Gopher] The most copied StackOverflow snippet of all time is...
___________________________________________________________________
The most copied StackOverflow snippet of all time is flawed (2019)
Author : vinnyglennon
Score : 128 points
Date : 2021-06-16 21:27 UTC (1 hours ago)
(HTM) web link (programming.guide)
(TXT) w3m dump (programming.guide)
| bsaul wrote:
| seems that most comments here missed the end of the article ,
| where he points to the "production ready" version of the
| solution, that is indeed very close to the original one,
| including a while loop.
| t0astbread wrote:
| It's especially ironic given that this is about a StackOverflow
| code snippet that many people probably also copied without
| reading.
| eutectic wrote:
| This is why it's a good idea to have a real integer type.
| enriquto wrote:
| Isn't it impossible? Integers go arbitrarily large but
| computers don't.
| asdf3243245q wrote:
| Computers also go arbitrarily large. Not infinite, but
| arbitrarily large.
|
| A real number type could be bounded by the amount of RAM you
| have.
| jrockway wrote:
| > Sebastian then reached out to me to straighten it out, which I
| did: I had not yet started at Oracle when that commit was merged,
| and I did not contribute that patch. Jokes on Oracle. Shortly
| after, an issue was filed and the code was removed.
|
| Good thing it wasn't a range check function. I hear those are
| expensive.
| dokem wrote:
| Something about this comes off as amateurish. The obsession with
| minimization. Just use a switch statement. Now where is the bug
| going to hide? The solution doesn't need to generalize, there is
| only a small handful of different solutions. Just break them all
| out. It's more maintainable and readable and requires less
| thinking.
| stefan_ wrote:
| Why are you writing code? The question was for a static method in
| Apache Commons, not your "I'm so clever" implementation. Think
| the reading comprehension is flawed.
|
| (Of course, this static method exists in Apache Commons, going
| back at least 20 years. But the fellow "code golfers" of the
| author voted someone to the first answer who similarly had the
| irresistible urge to _try to be very clever_. It 's a scourge on
| StackOverflow.)
| [deleted]
| unwind wrote:
| I must admit I smiled at seeing that I edited the question, back
| in the day. :) Can't say I remember the question, and didn't know
| it has that epic feature of being the most-copied. Cool!
| mweberxyz wrote:
| Say what you want about the stability of the npm ecosystem, but
| if this were JS, a new SemVer patch release could be cut, and it
| would be fixed in thousands of code bases essentially instantly.
| beermonster wrote:
| > I wrote almost a decade ago was found to be the most copied
| snippet on Stack Overflow. Ironically it happens to be buggy.
|
| I don't find it ironic, I find it quite normal that even small
| snippets of code contains bugs (given the daily review requests I
| receive).
|
| I think when copying code literally from StackOverflow what's
| more important is understanding what the code does, and why ,
| rather than copying it ad-verbatim by copy & pasting it into your
| production code.
|
| I also often find on StackExchange et al that quite often the
| most upvoted is the one that 'fixes it' for 'most people' yet the
| correct answer is down at number 3 or 4. Again, understanding the
| answer and why it applies, helps give you the context to
| understand if this is _actually_ the solution to _your_ problem
| or just treats the symptom.
| megalodon wrote:
| One of the best tips I have gotten from the internet is to
| never copy and paste code you have not written yourself. Even
| rewriting it verbatim makes you think about what it is you are
| actually copying.
|
| It's a pretty neat rule to have in mind.
| AceJohnny2 wrote:
| > _Key Takeaways:_
|
| > _[...]_
|
| > _Floating-point arithmetic is hard._
|
| I have successfully avoided FP code for most of my career. At
| this point, I consider the domain sophisticated enough to be an
| independent skill on someone's resume.
| user3939382 wrote:
| There are libraries that offer more appropriate ways of dealing
| with it, but last time I ran into a FP-related bug (something
| to do with parsing xlsx into MySQL) I fixed it quickly by
| converting everything to strings and doing some unholy
| procedure on them. It worked but it wasn't my proudest moment
| as a programmer.
| tasty_freeze wrote:
| The thing that jumped out at me, as I've seen the same kind of
| thing on the job, is the assumption that, eg, log(1000)/log(10)
| is _exactly_ 3. Does the standard guarantee that the rounded
| approximation of one transcendental number by the rounded
| approximation of a related transcendental number will give 3.0
| and not 2.999999999?
| remram wrote:
| Yeah that seems like a serious flaw to me too. On my Python:
| >>> math.log(1000)/math.log(10) 2.9999999999999996
| >>> int(math.log(1000)/math.log(10)) 2
|
| But I don't know about the guarantees provided in the
| JavaScript standard (or more importantly those offered by
| actual browsers).
| danellis wrote:
| > almost no branches
|
| I wonder whether the author is suggesting that (potentially) nine
| branches is a small number, or they overlooked ternary
| expressions and function calls and are just counting the if
| statement.
| axiosgunnar wrote:
| So it's not flawed (it does compute the correct result).
|
| The author just thinks a completely unreadable (but supposedly
| faster) variant using logarithms is "better" than the simple loop
| used in the original snippet?
|
| Write your code for junior devs in their first week at your
| company, not for academic journals.
| hardwaregeek wrote:
| I think you might have misread the post. His logarithm code
| became the most used snippet and had the bug.
| [deleted]
| [deleted]
| [deleted]
| ascar wrote:
| His code snippet had rounding errors on the boundaries towards
| the next unit.
|
| However he notes:
|
| > FWIW, all 22 answers posted, including the ones using Apache
| Commons and Android libraries, had this bug (or a variation of
| it) at the time of writing this article.
| phist_mcgee wrote:
| You should almost _always_ focus on code readability and
| simplicity over inventiveness and cleverness.
|
| Very few people I have encountered have complained about code
| being 'too simple' or 'too readable', but the opposite happens
| on a near daily/weekly basis.
|
| Write comments, use a for loop, avoid global state, keep your
| nesting limited to 2-3 levels, be kind to your junior devs.
| jka wrote:
| There might be an opportunity somewhere around this area to
| combine the versioning, continuous improvement, and dependency
| management of package repositories with the Q&A format of
| StackOverflow.
|
| Something like "cherry pick this answer, with attribution, and
| notifications when flaws and/or improvements are found".
|
| Maybe that's a terrible idea (there's definitely risk involved,
| and the potential to spread and create bad software), but equally
| I don't know why it would be significantly worse than
| unattributed code snippets and trends towards single-function
| libraries.
| fennecfoxen wrote:
| NodeJS did something a lot like this by having packages that
| are just short snippets, but half the ecosystem flipped out
| when someone messed up `leftpad`.
| [deleted]
| DylanSp wrote:
| Not sure if it's quite what you had in mind, but SO is starting
| to address the issue of updating old answers with the Outdated
| Answers Project:
| https://meta.stackoverflow.com/questions/405302/introducing-...
| pkaye wrote:
| Now the new code is unreadable.
| ape4 wrote:
| Its as easy as "KMGTPE"
| penteract wrote:
| This is a bit of a tangent, but while it may be conventional to
| round to the value with the smallest difference, is that
| convention good? In a case such as this where it's fine for the
| prescision to vary with magnitude, then I'd argue it makes sense
| to round to the value with the smallest ratio.
| bla3 wrote:
| > At the very least, the loop based code could be cleaned up
| significantly.
|
| Seems like the loop based code wasn't so bad after all...
| meetups323 wrote:
| Loop code has the same bug.
| bla3 wrote:
| This is Java, not JavaScript. The exponents table was likely
| of integer type. Then it works.
| spkm wrote:
| This! If I had to choose between the two snippets I would have
| taken the loop based one without a second though, because of
| its simplicity. The second snippet is what usually happens when
| people try to write "clever" code.
| dataflow wrote:
| The loop by itself isn't entirely clear on what it's doing.
| Stuff like the direction of the > comparison and what to do
| vs. >= and the byteCount / magnitudes[i] at the end really do
| require you to pause & do mental analysis to check
| correctness. I think the real solution here is to define an
| integer log (ilog()?) function based on division and use that
| in the same manner as the log(). That way you only do do the
| analysis the first time you write that function, and after
| that you just call the function knowing that it's correct.
| twobitshifter wrote:
| Premature optimization strikes again.
| amelius wrote:
| Wouldn't it be cool if you could call stack overflow answers
| directly from your code?
| hardwaregeek wrote:
| Floating point is really really hard to get right, especially if
| you want the numbers to be stable. Which begs the question, why
| the heck does JavaScript, the most used language in the world,
| not have an integer type? Sure, there's BigInt but that's quite
| clunky to use. I know it's virtually impossible to add by now,
| but I'd love a integer type for all my bit twiddling, byte
| munching needs.
| ascar wrote:
| I just feel if you have bit twiddling, byte munching needs
| JavaScript shouldn't be the language of choice. Doing that is a
| rather rare edge case and if you're doing it for performance
| reason, working in Javascript is the much bigger performance
| problem.
| colejohnson66 wrote:
| What's wrong with a simple loop (like the one near the top)? Why
| does it _have_ to branchless? Wouldn't the IO take longer than
| missed branches /pipeline flushes?
|
| Not to mention that the fixed version now has branches as well...
| MauranKilom wrote:
| The irony is that a single log computation is going to take
| longer than the loop. (No idea if implementing a log
| approximation involves loops either.)
| [deleted]
| bottled_poe wrote:
| Sounds like textbook example of when theory is misaligned
| with reality.
| xxpor wrote:
| the original version had branches too, in fact a majority of
| the lines had them! ? is just shorthand for if.
| enedil wrote:
| This isn't true, this form of conditionals can be compiled
| into cmov type of instructions, which is faster than regular
| jump if condition.
| dataflow wrote:
| > This isn't true, this form of conditionals can be
| compiled into cmov type of instructions, which is faster
| than regular jump if condition.
|
| IIRC cmov is actually quite slow. It's just faster than an
| unpredictable branch. Most branches have predictability so
| you generally don't want a cmov.
| ncann wrote:
| If the if/else is simple the compiler should be able to
| optimize that anyway.
| kmote00 wrote:
| Update title: this is from 2019
| mjevans wrote:
| The author's lookup table is incorrect.
|
| The question being answered clearly wanted base2 engineering
| prefix units, rather than the standard base10 engineering prefix
| units.
|
| suffixes = [ "EB", "PB", "TB", "GB", "MB", "KB", "B" ]
|
| magnitudes = [ 2^60, 2^50, 2^40, 2^30, 2^20, 2^10, 2^0 ] //
| Pseudocode, also 64 bit integers required. (Compilers might
| assume unsigned 32 for int)
| returningfory2 wrote:
| That code snippet is explicitly introduced in the article as
| _not_ the author 's.
| asdf3243245q wrote:
| That is not the author's code. That is pseudocode for one of
| the example answers that he is improving on.
|
| The author's code gives an option for the units:
|
| int unit = si ? 1000 : 1024;
___________________________________________________________________
(page generated 2021-06-16 23:00 UTC)