[HN Gopher] Fluxsort: A stable adaptive partitioning comparison ...
___________________________________________________________________
Fluxsort: A stable adaptive partitioning comparison sort
Author : signa11
Score : 58 points
Date : 2021-07-25 12:03 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| mettamage wrote:
| Perhaps others think differently about this but after all the
| interview prep I have done, I can appreciate posts like this a
| lot.
|
| It seems I have gained a new appreciation of a topic! Which is
| something I am grateful for :)
| gigatexal wrote:
| Hmm I kinda want to create a Python package for this.
| Blikkentrekker wrote:
| > _Fluxsort starts out with an analyzer that detects fully sorted
| arrays and sorts reverse order arrays using n comparisons. It
| also obtains a measure of presortedness and switches to quadsort
| if the array is less than 25% random._
|
| This makes me wonder? is it worth it to do a neural a.i. sort by
| simply training it?
|
| I assume it would be hard to prove that the resulting a.i. is
| actually infallible and always produces a sorted output, but
| perhaps it could for practical measures sort faster than most
| current algorithms.
| somethingAlex wrote:
| There's no neural network you could train using supervised
| learning to solve sorting in the general case.
|
| To train a neural network, the training set and test set need
| to come from a common probability distribution but the inputs
| of sorting algorithms can be random. There's no relation
| between what you could possibly train your network on and what
| you'd ask it to do.
|
| Put another way - the input and output of your model would have
| zero correlation. There's nothing to train the model to pick up
| on.
|
| Interesting related work: Graves[0] trained a "neural Turing
| machine" to encode instructions for a basic sorting algorithm.
|
| [0] https://arxiv.org/abs/1410.5401
| stingraycharles wrote:
| You mean using a neural network for sorting? I think it will be
| very expensive, as sorting algorithms usually take low level
| performance optimizations such as cache locality into
| consideration, and neural networks are a poor fit for this type
| of thing.
| HelloNurse wrote:
| The currently en vogue deep and large neural networks are not
| only designed to have an affordable but significant
| evaluation cost (probably not competitive with really cheap
| techniques, like the partitioning and medians of this
| Fluxsort) but also to justify this cost with the recognition
| of fairly subtle patterns: wasted on rough and general worst-
| case guarantees, positively dangerous with arbitrarily
| crafted inputs.
| stingraycharles wrote:
| Do you have an example of a sorting algorithm that uses a
| neural network that's on par with other sorting algorithms
| in terms of performance?
|
| It just seems so counter-intuitive to me that neural
| networks are an efficient way to handle this, but I would
| be happy if proven wrong.
| mlochbaum wrote:
| This is BlockQuicksort with stable partitioning, pumped up with
| selective benchmarks (if I sound irritated about this, it's
| because I've just called out the author[0] on this exact issue).
| I understand that worse performance should be expected for a
| stable sort, but what possible reason is there to leave out
| benchmarks against the most closely related algorithm?
|
| Worse, a stable version of BlockQuicksort is pointless, which is
| to say that it is actually impossible for there to be a practical
| use case for Fluxsort. The branchless partitioning technique is
| relevant only if comparison doesn't depend on branching, which
| means the input array needs to consist machine-native types, and
| in fact Fluxsort limits itself to integers and long double. On
| these inputs stability isn't an observable property! The result
| of Fluxsort will be indistinguishable from that of any other
| quicksort (okay, I guess it will keep negative and positive zeros
| in the original order for floats, but an integer-based comparison
| could also accomplish this). So all you get is a slower sort that
| uses more memory. There _are_ use cases for stable partitioning,
| such as sort-by or arg-sort, but as written Fluxsort can 't do
| these.
|
| The statistical method to decide whether to use quicksort or
| Quadsort--checking the ratio of comparisons between adjacent
| pairs of elements that are smaller versus greater--is really
| interesting. It applies to the mergesort versus quicksort problem
| in general.
|
| The use of the ptx pointer is definitely not novel and I think
| it's a stretch to even describe it as a brain-twister. It's the
| obvious way to do branchless stable partitioning.
|
| This is good work! It's well-written and well-explained, and
| clearly points in the direction of practical applications.
| However, it's presented in a way that suggests that it's a good
| drop-in sorting algorithm, when instead it's more of a research
| effort. Beware!
|
| [0] https://news.ycombinator.com/item?id=27793636
| mlochbaum wrote:
| Okay, with scandum's prodding I've looked into this more deeply
| and need to add much more context. Firstly, I was very much
| wrong and apologies are in order. What I wrote was based on the
| assumption that Fluxsort trades speed for stability. It
| doesn't, apparently: the partitioning seems to be faster and
| _also_ stable. Possibly more so than the benchmarks show. It 's
| a crude measure, but I ran perf top during benchmarks, and
| pdqsort spent significantly more time in its partition than
| Flux did in its own. So the only real drawback is O(n) rather
| than O(log(n)) memory usage. Memory's cheap, although I guess
| some users will decide this isn't acceptable.
|
| A fast partition that _happens_ to be stable is great to have.
| The in-place scheme used by pdqsort has one advantage, which
| that it naturally puts reverse-sorted data in order. Other than
| that it tends to mangle everything. Stable partitioning doesn
| 't do this, and might even be able to draw out order from
| interleaved sequences that a merge sort (or insertion sort)
| could later take advantage of.
|
| BlockQuicksort uses a fast index generation method to produce
| indices for elements that are then swapped. Fluxsort skips all
| this and moves an element to _both_ possible positions
| immediately; it _then_ uses the comparison result to increment
| one partition index allowing the other position to be
| overwritten. It 's the simpler organization of stable
| partitioning that allows this to be fast. There's also some
| cleverness in not immediately moving the second partition to
| put it after the first, but instead partitioning it from where
| it sits. I've looked into stable partitioning, and I know
| branchless algorithms well enough that I could have written
| this. But I didn't!
| scandum wrote:
| I get the impression you wrote your response rather hastily.
|
| Here's a bench of fluxsort vs pdqsort on strings:
| | Name | Items | Type | Best | Average | Compares
| | Samples | Distribution | | --------- | -------- |
| ---- | -------- | -------- | --------- | ------- |
| ---------------- | | fluxsort | 100000 | 64 |
| 0.011804 | 0.012044 | 2003293 | 100 | random string |
| | pdqsort | 100000 | 64 | 0.012423 | 0.012512 | 1859192
| | 100 | random string |
|
| If you can sort strings you can sort tables, where stability
| matters.
|
| Here's a graph of relative performance on 100K 32 bit integers.
|
| https://media.discordapp.net/attachments/737457171377160203/...
|
| As for the ptx pointer's use, I'm not aware of a prior
| implementation.
|
| As for selective benchmarking, that's a selective accusation.
| mlochbaum wrote:
| Gah, I did misunderstand some things. The picture is now
| worse than I realized! EDIT: and yet also better; see
| https://news.ycombinator.com/item?id=27951017 . It's hard to
| get it working right but this _is_ a really fast algorithm.
|
| fluxsort takes a function pointer and _has no way to inline
| it_. The fluxsort.c is included from fluxsort.h (this is not
| standard style by the way; template files should have a .h
| suffix), so the compiler _might_ choose to inline it, and the
| benchmarks use noinline on the comparison functions to avoid
| this.
|
| BlockQuicksort makes no sense without inlined comparisons.
| Yes, pdqsort allows a comparison function in order to be a
| general-purpose sort, but this isn't the case that it
| targets.
|
| Are these benchmarks run with inline comparisons? Seems
| pretty hard to get bench.c to inline anything since test_sort
| takes a comparison function as input (though I'm working on
| it).
| scandum wrote:
| You can uncomment //#define cmp(a,b)
| (*(a) > *(b))
|
| in quadsort.h for primitive inline comparisons.
| acmj wrote:
| My biggest complaint about all scandum's sorting algorithms
| is that it doesn't support guaranteed inlined comparisons.
| Without this feature, the result depends on the compiler
| behavior, making the results hard to interpret. This point
| has been raised many times at reddit/hacker news but the
| dev is strong minded.
| mlochbaum wrote:
| I think I mistook the README's comment about ptx as being
| about the branchless partition logic when it's talking about
| the overall recursive structure (i.e. which arguments are
| passed to flux_partition calls). That could be new although I
| kind of doubt it. The way it uses the two buffers is fairly
| similar to what I've seen called a "ping-pong merge".
| mlochbaum wrote:
| Got the benchmarks working with a hardcoded 4-byte int
| pdqsort. pdqsort is of course faster, but this doesn't tell
| us anything about how good fluxsort really is because it
| doesn't inline the comparison (I've removed __attribute__
| ((noinline)) and comparisons++ as bench.c suggests, which
| improves the speed but doesn't inline it).
| | Name | Items | Type | Best | Average |
| Loops | Samples | Distribution | | --------- |
| -------- | ---- | -------- | -------- | --------- | ------- |
| ---------------- | | qsort | 100000 | 32 |
| 0.013530 | 0.013901 | 1 | 10 | random order
| | | fluxsort | 100000 | 32 | 0.004837 | 0.005009
| | 1 | 10 | random order | |
| pdqsort | 100000 | 32 | 0.003661 | 0.003745 | 1 |
| 10 | random order |
|
| Your graph doesn't show how fast pdqsort orders 4-byte
| integers. It shows how fast it orders them when required to
| use a comparison function, which is an arbitrary restriction
| and not what pdqsort is designed for.
| scandum wrote:
| The graph is with //#define cmp(a,b) ( _(a) > _(b)) in
| quadsort.h uncommented.
|
| Looking forward to your next spin.
| mlochbaum wrote:
| You're right, with that setting fluxsort beats pdqsort
| slightly for either 1e5 or 1e6 random 4-byte ints. I'm
| really sorry; I shouldn't have assumed I knew what your
| benchmark looked like. Although I guess now I'm even more
| confused about why there are no pdqsort benchmarks in the
| repository? Beating it is really impressive!
|
| I'll be running my own timings, but do you know where the
| improvement comes from? Is the base case faster than
| insertion sort, is the partition faster, or both? I
| wasn't expecting a stable partition to beat an unstable
| one because it does twice as much data movement, but it
| wouldn't be the first time an algorithm using more memory
| beats one using less.
| scandum wrote:
| Main reasons for not benching against pdqsort:
|
| 1. stability 2. worse performance on long doubles, and I
| don't know why 3. A variety of hard to explain
| performance differences. 4. pdqsort does better on
| generic data, which can be very important.
|
| So it's tricky to present a fair benchmark when two sorts
| behave very differently.
|
| As to performance advantages of fluxsort:
|
| 1. It has a faster insertion sort. 2. Branchless
| pseudomedian of 15 gives an advantage. 3. Partial loop
| unrolling with: while (ptx + 8 < pte) 4. Data movement
| should be nearly identical, if not better, with the
| recursive calls through the ptx pointer. In the optimal
| case the memcpy only triggers when the partition shrinks
| below 24 elements, and in half of those cases the
| partition will already be in main memory.
|
| So on random you could expect n / 2 extra data movements
| on top of ~ n log n moves.
___________________________________________________________________
(page generated 2021-07-25 23:01 UTC)