[HN Gopher] Timsort (2019)
___________________________________________________________________
Timsort (2019)
Author : akbarnama
Score : 121 points
Date : 2022-07-29 16:00 UTC (7 hours ago)
(HTM) web link (skerritt.blog)
(TXT) w3m dump (skerritt.blog)
| pizlonator wrote:
| "you've never heard of" is a bit presumptuous. I've heard of it a
| lot, nothing but good things.
| hangonhn wrote:
| I get annoyed with titles like that. "Never heard of", "Didn't
| know.", etc. I know people want clicks but it sometimes feel
| normal person to person courtesy aren't valued when the medium
| is the Internet.
|
| It's especially "dangerous" in this case because the people who
| would even care about sorting algorithms are going to contain a
| subset of people who knows a lot about sorting. Those people
| are likely to know about Timsort. It's like going to a heart
| surgery conference and having a presentation about "This one
| heart surgery method you never knew about." Good chance at
| least a few people in the audience would know and may have
| invented it.
| Beldin wrote:
| Then I have an interesting tidbit for you: its sorting wasn't
| 100% correct. Should be fixed now though.
|
| http://www.envisage-project.eu/proving-android-java-and-pyth...
| pizlonator wrote:
| I had heard of that! I just don't count a bug as a bad thing.
| Maybe the opposite. Software that has no bugs usually
| achieves this feat by not having any users. Timsort has tons
| of users, hence tons of attention, hence someone found a bug.
| That's great!
| [deleted]
| lupire wrote:
| dktalks wrote:
| This blog steals content from other sites, original is at
| https://hackernoon.com/timsort-the-fastest-sorting-algorithm....
|
| Unfortunately I cannot downvote.
|
| Please don't link plagiarized content. This guy also linked to
| his own "Big O Notation" in his page, where he says O(n) is
| polynomial
|
| >>>In our shopping list example, in the worst-case of our
| algorithm it prints out every item in the list sequentially.
| Since there are n items in the list, it takes O(n) polynomial
| time to complete the algorithm.
| TameAntelope wrote:
| There are no downvotes on submissions, FWIW.
| rat9988 wrote:
| O(n) is polynomial and it's the same author.
| caslon wrote:
| It's the same author; compare the URL to your link's author's
| last name.
| thebooktocome wrote:
| Well, linear functions are still technically polynomials...
| haha!
| [deleted]
| indigo945 wrote:
| This may be the same author ... Anyway f(n)=n is definitely a
| polynomial and O(n) is a class of algorithms that run in
| polynomial time (i.e. are in P).
| Beldin wrote:
| Interestingly, Timsort used to be broken [1]. They found that by
| trying to prove correctness. I recall Stijn de Gouw remarking
| that they wanted to have their proof artefact evaluated, but
| you'd need a seriously beefy machine to do so.
|
| [1] http://www.envisage-project.eu/proving-android-java-and-
| pyth...
| masklinn wrote:
| Timsort was not broken, the implementations were, which is a
| slightly (but crucially) different issue: the implementations
| don't / didn't uphold the algorithm's invariants at the upper
| edges.
|
| Not only that, but the cpython's implementation's misbehaviour
| effectively couldn't be triggered because you couldn't create
| an array large enough to reach it.
| _old_dude_ wrote:
| There is a follow up, the fix used in Java was not good enough
| [2].
|
| [2] https://bugs.openjdk.org/browse/JDK-8203864
| gavinray wrote:
| I thought it was widely known, IIRC a number of languages have
| adopted it as the stdlib default sorting algorithm for either
| unstable/stable sort (can't remember which is it?)
|
| I'm not big into algorithms but I believe that pdqsort and
| timsort were supposed to be the best two general sorts.
| [deleted]
| [deleted]
| masklinn wrote:
| > either unstable/stable sort (can't remember which is it?)
|
| Stable. It's a hybrid mergesort.
|
| Some of the langages which adopted it after Python are Java,
| Rust, or Swift.
| rufus_foreman wrote:
| Java uses it in Arrays.sort: * <p>The
| implementation was adapted from Tim Peters's list sort for
| Python * (<a href="http://svn.python.org/projects/pyth
| on/trunk/Objects/listsort.txt"> * TimSort</a>). It
| uses techniques from Peter McIlroy's "Optimistic *
| Sorting and Information Theoretic Complexity", in Proceedings
| of the * Fourth Annual ACM-SIAM Symposium on Discrete
| Algorithms, pp 467-474, * January 1993.
| valbaca wrote:
| Java and Python IIRC
| linsomniac wrote:
| Among the previous discussions of timsort:
|
| https://news.ycombinator.com/item?id=21196555
|
| https://news.ycombinator.com/item?id=17436591
|
| https://news.ycombinator.com/item?id=17436591
| g42gregory wrote:
| Apparently a lot of people heard of it. :-) I just love these
| headlines.
| linsomniac wrote:
| Heard of it? I've bought tim an Icelandic hamburger!
| eunoia wrote:
| Story time, please :)
| dang wrote:
| Thanks! Macroexpanded:
|
| _Beating TimSort at Merging_ -
| https://news.ycombinator.com/item?id=27823180 - July 2021 (69
| comments - including
| https://news.ycombinator.com/threads?id=tim-peters - maybe luck
| will strike again...)
|
| _Timsort, the Python sorting algorithm_ -
| https://news.ycombinator.com/item?id=21196555 - Oct 2019 (131
| comments)
|
| _On the Worst-Case Complexity of TimSort_ -
| https://news.ycombinator.com/item?id=17883461 - Aug 2018 (74
| comments)
|
| _Timsort is a sorting algorithm that is efficient for real-
| world data_ - https://news.ycombinator.com/item?id=17436591 -
| July 2018 (77 comments)
|
| _Functional verification with mechanical proofs of TimSort
| [pdf]_ - https://news.ycombinator.com/item?id=9778243 - June
| 2015 (1 comment)
|
| _Timsort_ - https://news.ycombinator.com/item?id=3214527 - Nov
| 2011 (27 comments)
|
| _Visualising Sorting Algorithms: Python 's timsort_ -
| https://news.ycombinator.com/item?id=2092594 - Jan 2011 (3
| comments)
|
| _Java has switched from Mergesort to TimSort_ -
| https://news.ycombinator.com/item?id=752677 - Aug 2009 (13
| comments)
|
| _Visualizing Sorting Algorithms_ -
| https://news.ycombinator.com/item?id=750858 - Aug 2009 (3
| comments)
| CJefferson wrote:
| Closely related is pattern defeating quicksort (
| https://github.com/orlp/pdqsort ), which adapts quicksort to take
| advantage of sorted runs. I've adapted a few quicksorts to
| pdqsort and seen good speedups (as people were often sorting
| partially sorted data)
|
| Basically: Timsort is to mergesort as pdqsort is to quicksort
| masklinn wrote:
| > Timsort is to mergesort as pdqsort is to quicksort
|
| It's the other way around, since timsort is a decade older than
| pdqsort.
| dragonwriter wrote:
| Age has nothing to do with the accuracy of the analogy, which
| says nothing about the temporal relationship between timsort
| and pdqsort.
| BizarroLand wrote:
| My favorite oddball sorting algorithm is StalinSort. It runs in
| (O)1 time, making it probably the fastest sorting algorithm in
| the world.
|
| It iterates through the list, and deletes every item in the
| list that isn't ordered. The remaining list is therefore
| automatically ordered.
| magicalhippo wrote:
| Hadn't heard of that one, and looking it up I stumbled upon
| this[1] delightful collection of esoteric sorting algorithms.
| Perfect for a Friday evening...
|
| [1]: https://towardsdatascience.com/esoteric-sort-algorithms-
| and-...
| jsnk wrote:
| I was hoping to see some benchmark numbers
| varenc wrote:
| Creating a single benchmark seems tricky since Timsort's key
| strength is it performs particularly well on real world data
| where subsequences of the input are often already sorted
| (runs). With random data it should have O(n log n) time like
| any other comparison sort.
|
| So with benchmarking the challenge is coming up with a set of
| real world-like test data that feels representative but isn't
| just excessively playing to Timsort's strengths. Not everyone's
| "real world" data is the same.
| hinkley wrote:
| I'm trying to recall what the expected level of clustering is
| even for properly random inputs.
|
| A few years before Timsort there was someone randomly
| shuffling the input in order to avoid the reverse-order
| corner case that kills so many sort algorithms.
|
| For truly random inputs, about half of the entries should be
| pairs that are partially ordered, and 25% runs of 3, no? And
| if memory serves Timsort also has a fast path for strictly
| decrementing runs as well, where it amortizes some of the
| comparisons done in the scan phase to do a blind reverse()
| call.
| sshine wrote:
| This kind of almost sorted data is easily synthesised with
| any degree of already-sorted-ness.
| varenc wrote:
| Certainly! But then how do you decide what degree of
| already-sorted-ness fairly represents real world data?
| not2b wrote:
| By collecting a lot of real world data?
| lifthrasiir wrote:
| CPython source code contains a separate documentation [1] for a
| detailed description and discussion for the algorithm. TimSort
| optimizes for the number of comparisons (which are expensive in
| Python since each comparison can call a Python function) so all
| numbers are given as such.
|
| [1]
| https://github.com/python/cpython/blob/main/Objects/listsort...
| hinkley wrote:
| Number of comparisons is a particular sore point for me.
| Nobody is sorting integers. This is the goddamned 21st
| century, not 1980's era video games. Just stop.
|
| Show me a sort algorithm that is still efficient with five
| pointer indirections in compare() and I'm satisfied. Show me
| primitive data types and I start to wonder what you're hiding
| (read: lying about).
| elcomet wrote:
| What are you talking about? I sort integers every day
| Rapzid wrote:
| Needs a 2005 in the title.
| tpoacher wrote:
| > built for the real world -- not constructed in academia.
|
| Why would lacking in evidence and rigor be considered a plus?
|
| Does Timsort pride itself of being the homeopathy of sorting
| algorithms?
|
| What a bizzare brag to make.
| masklinn wrote:
| > Why would lacking in evidence and rigor be considered a plus?
|
| The point it's making is about heuristics rather than
| theoretical complexity. Tim Peters provided plenty of evidence
| and rigorous work when he proposed it.
|
| That you interpret it as "lack of evidence and rigor" is really
| about you.
| jamestimmins wrote:
| I actually found this valuable, without being a "brag". The
| best type of sort for real-world problems is build with an
| understanding of real-world datasets. The types of questions
| and research that interest academia, while often valid, are
| distinct from those that concern practical engineers.
| scrozier wrote:
| I had the same thought at first. But I think the point is that
| this sort of heuristically cobbled together sort wouldn't
| typically be the stuff of academic research, brilliant and
| practical though it is.
| threatofrain wrote:
| Also interesting is pdqsort, which will become Go's standard
| sort.
|
| https://news.ycombinator.com/item?id=31106157
|
| https://news.ycombinator.com/item?id=31098822
| 2143 wrote:
| It's well known.
|
| Nevertheless, the article was good.
|
| If the author is reading this, consider dropping the "you've
| never heard of" from the title; you don't want to assume things
| about your reader.
| [deleted]
| lifthrasiir wrote:
| "lesser-known" is probably more appropriate choice, given that
| it is not a kind of algorithm you'd heard from CS curricula.
| scrozier wrote:
| Sometimes a little colorful language makes dry topics a bit
| more fun. I don't think the title was meant to be statistically
| analyzed. :-)
| pclmulqdq wrote:
| I've done this too with blog posts, and it works okay, although
| many readers don't like it. I wanted a snappier way to say "you
| didn't learn about in school," and I think this author did too.
| earthicus wrote:
| Here [1] is a great (short, readable) paper from 2015 that
| explains Timsort and similar stack-merge sorting algorithms. It
| also gives a runtime analysis, and a simplified version of the
| algorithm that they call 'alpha-sort'.
|
| [1] https://hal-upec-upem.archives-
| ouvertes.fr/hal-01212839/file...
___________________________________________________________________
(page generated 2022-07-29 23:00 UTC)