[HN Gopher] Sorting networks and their applications (1968) [pdf]
___________________________________________________________________
Sorting networks and their applications (1968) [pdf]
Author : tjalfi
Score : 25 points
Date : 2021-07-02 13:02 UTC (1 days ago)
(HTM) web link (www.cs.kent.edu)
(TXT) w3m dump (www.cs.kent.edu)
| WJW wrote:
| Sorting networks are one of those cool algorithms I learned about
| once and then never used again despite some of the theoretical
| advantages. (This is true about most sorting algorithms FWIW)
|
| Are they used anywhere in modern applications? I can't think of a
| lot of applications that needs sufficient amount of sorting but
| perhaps somewhere in GPUs or NICs?
| justincormack wrote:
| Yes, I think most GPU sorting is based on sorting networks as
| there are no data dependent branches (I mean, if you exclude
| the exchange)
| dragontamer wrote:
| Only within a CUDA block / OpenCL workgroup of ~1024 items
| maximum.
|
| Beyond 1024 "threads", its difficult and inefficient for
| communication to take place.
|
| -------------
|
| GPU Mergepath seems like the coolest algorithm for sorting
| more than 1024 elements in 1024-way parallel.
|
| In practice, I think only the 32-way or 64-way optimal
| sorting network was calculated, and then its all merge-path
| beyond that. (Merge 64 + 64 sorted lists == 128 sorted list)
|
| Mergepath is a very efficient means of parallelizing the
| "Merge" step of mergesort's algorithm. There's a little
| branch divergence: but even with the branch divergence you're
| at O(n + p*log(n)) overall work per merge (assuming constant
| "p", number of processors, where p is much smaller than
| "n"... its effectively a O(n) algorithm and therefore
| asymptotically work efficient with original O(n) mergesort.
| Parallel Bitonic sort fails in this test)
|
| IIRC, CUDA Thrust's Mergepath algorithm is designed to scale
| beyond one SM (so it can compute in parallel larger than 1024
| elements), because the Mergepath algorithm has very little
| communication that gets the job done.
|
| In contrast, an optimal sorting network of large size (ex:
| size 1-million) would require arbitrary communication between
| elements, which is just not how modern computers / GPUs work.
| touisteur wrote:
| Please if you know where to find the _optimal_ 64-way
| sorting network) or even better, selecting network, I 'm
| yugely interested.
| brianpursley wrote:
| I find sorting networks to be fascinating.
|
| Beyond 8 inputs, there isn't a pattern to follow to construct the
| optimal sorting network, and there are only "best known" sorting
| networks (but not proven to be optimal) for networks beyond 11
| inputs.
|
| Unintuitively, the optimal sorting network for, let's say, 16
| inputs is not just an iteration on the optimal sorting network
| for 15 inputs. So you can't just build upon what was optimal for
| a smaller sorting network to get what is optimal for a larger
| sorting network.
|
| I've spent more time than I probably should have several years
| ago experimenting with various genetic algorithms to construct
| sorting networks for 14, 15, and 16 inputs. It was a fun
| challenge and learning experience though.
|
| Here is a little utility I wrote in Python to check whether a
| comparison network is a sorting network and can generate diagrams
| for them in svg format.
|
| https://github.com/brianpursley/sorting-network
___________________________________________________________________
(page generated 2021-07-03 23:02 UTC)