[HN Gopher] I can't believe that I can prove that it can sort
___________________________________________________________________
I can't believe that I can prove that it can sort
Author : Raphael_Amiard
Score : 229 points
Date : 2022-07-04 10:24 UTC (12 hours ago)
(HTM) web link (blog.adacore.com)
(TXT) w3m dump (blog.adacore.com)
| donatj wrote:
| Is this not a standard sorting algorithm? If I'm not mistaken
| this is my coworkers go to sort when writing custom sorts.
| Jtsummers wrote:
| It manages to be more inefficient than most, and will even
| disorder and then reorder an already sorted input. Bubble sort
| (a very low bar to beat) doesn't even do that. If you've only
| got a small number of items, it doesn't matter unless you're
| sorting a small number of items a large number of times.
| dgan wrote:
| So, _Lionel_ knew how to implement a 3D Renderer, but had no clue
| about O(n) complexity neither other sorting algorithms? I am
| buffled
| im3w1l wrote:
| That's self-taught for you. Will know some advanced stuff, and
| miss some basics depending on what they found interesting or
| useful.
| Agingcoder wrote:
| The exact same thing happened to me, and this is how I
| discovered algorithmic complexity. Sorting my triangles took
| forever (I saw it in the profile, taking a whopping 90% of the
| time ) and I eventually figured there might be a proper sorting
| algorithms out there.
|
| I was at the same time happily churning out assembly code,
| talking to the vga card, writing a dos extender, etc.
|
| You can actually do quite a few things without formal
| education!
| touisteur wrote:
| Well, Hi there, it me. Back then I was 17 and wading through
| software rendering so (light) mesh tessellation, triangle
| projection (twice, for a stereoscopic anaglyph renderer),
| triangle filling, texture mapping, with at most 300 objects,
| all in crappy crashy C (first with VESA, then SDL eased my
| life...) with lots of misunderstandings about pointers.
|
| I was welllll over my head with complex stuff, and sorting
| didn't appear in the profiles, so... I guess you can call that
| profile-guided learning? I had the formal training, later on,
| and even then it didn't stick until I faced the actual
| complexity problem head-on.
|
| I'll never forget that whole weekend with my AMD Athlon 900 at
| 100% CPU sorting a 200 000 words dictionary... It was still not
| finished on Monday. Implemented (a very primitive) insertion
| sort and it was done in less than 2 minutes...
|
| That was my 10000-a-day day https://xkcd.com/1053
| fn_t_b_j wrote:
| OJFord wrote:
| Why is this algorithm at all surprising?
|
| (I'm really not trying to brag - I assume I must be missing
| something, that I'm naive to think it obviously works, and it
| actually works for a different reason.)
|
| In plain English - 'for every element, compare to every element,
| and swap position if necessary' - .. of course that works? It's
| as brute force as you can get?
| Asooka wrote:
| The comparison is not the way you'd expect. Index j goes past
| index i, but the comparison doesn't change. What you propose is
| a condition more like if ((a[i] < a[j]) != (i
| < j)) swap(a, i, j) ;
|
| Which obviously works.
|
| The surprising algorithm sorts even though it swaps elements
| that are already ordered.
| OJFord wrote:
| Ok, I think I see why it's a bit weird '1,2,3 when i=1 and
| j=3 it swaps them anyway' sort of thing?
|
| But i-loop comes through 'afterwards', so when i=3 (value now
| 1) and j=1 (3) it sets them straight.
|
| It still seems quite intuitive, but I think I cheated by
| skimming over it initially and thinking it was clear, whereas
| actually I've thought about it more now.
|
| (Not to compare myself to anything like him, I'm no
| mathematician at all, but I'm reminded of an amusing Erdos
| anecdote in which he apparently paused mid-sentence in some
| lecture, left the room, and came back some time later
| continuing 'is _clearly_ [...] ' or similar!)
| YetAnotherNick wrote:
| If the second loop is from j = i to n, it is easy to see that
| it will sort in decreasing order. But notice j = 1 to n, then
| suddenly it will sort in increasing order
| shp0ngle wrote:
| look again at the comparison direction.
|
| It is opposite!
|
| It works but not as you think it does.
| smusamashah wrote:
| Here is how it looks.
| https://xosh.org/VisualizingSorts/sorting.html#IYZwngdgxgBAZ...
|
| If you compare it with both Insertion and Bubble sort. You can
| see it looks more like insertion sort than bubble sort.
| bjterry wrote:
| The video from the article
| (https://www.youtube.com/watch?app=desktop&v=bydMm4cJDeU) is
| much better because it highlights the index of the outer loop,
| which is unclear from the cascading visualization there. By
| seeing the indexes it becomes clear that (1) in the area before
| the outer index, every value gets swapped in and out of the
| outer loop location to put them in order, and (2) at the end of
| the first outer loop iteration, the largest element will be at
| the location of the outer loop index, and so everything to the
| right is unchanged in each iteration.
| smusamashah wrote:
| This is nice. I had a way to highlight indexes at one point
| in this tool. Got rid of it when I was trying to add a way to
| visualize any custom algo. I should add it back. It adds more
| value/info to the visualization.
| calmingsolitude wrote:
| And here's a version I wrote that visualizes it on your
| terminal https://github.com/radiantly/simplest-sort
| iib wrote:
| I guess it can be thought of as an unoptimized insertion or
| bubble sort.
|
| I think it is very possible to write this algorithm by mistake
| in intro compsci classes when you try to code a bubble sort by
| heart. I would think TAs may have many such instances in their
| students' homework.
| naniwaduni wrote:
| There's a surprisingly large class of "sorts people
| accidentally write while intending to write a bubble sort".
|
| This one is kind of special, though, since it's somehow
| _more_ offensive to intuition than bubble sort itself.
| mmcgaha wrote:
| I am guilty. I wrote this sort for a gnu screen session menu
| years ago and even named my function bubsort.
| raldi wrote:
| Is the visualization supposed to animate? I can't figure out
| how to make it start.
| Jtsummers wrote:
| There is a "start" button at the end of the list of sorts.
| Avamander wrote:
| If you click it multiple times it's wonderfully chaotic.
| FrozenVoid wrote:
| Combsort is far more elegant and faster algorithm. I've wrote a
| type-generic combsort a while ago here:
| https://github.com/FrozenVoid/combsort.h
|
| (Combsort as well as mentioned algorithm also consists of two
| loops and a swap)
| JadeNB wrote:
| I don't think that the goal here is to show a fast and elegant
| sort, but rather to show that a sorting algorithm that seems
| like it can't possibly work actually does. That is, probably
| no-one will learn from this article how to sort better, but
| hopefully people will learn from this article how to formally
| prove things (e.g., about sorting) better.
| touisteur wrote:
| Yes (co-author here) that was exactly the point. Thanks for
| putting it clearly.
| fweimer wrote:
| Isn't the proof incomplete because it does not ensure that the
| result is a permutation of the original array contents? Just
| overwriting the entire array with its first element should still
| satisfy the post-condition as specified, but is obviously not a
| valid sorting implementation.
| touisteur wrote:
| It was touched upon in the post: 'Tip: Don't prove what you
| don't need to prove' (and surrounding text). With a link on
| another proof for another sorting algorithm.
| yannickmoy wrote:
| and there are actually multiple ways to prove that the result
| is a permutation of the entry, either by computing a model
| multiset (a.k.a. bag) of the value on entry and exit and
| showing they are equal, or by exhibiting the permutation to
| apply to the entry value to get the exit value (typically by
| building this permutation as the algorithm swaps value, in
| ghost code), etc.
|
| but none of this makes a good read for people not already
| familiar with program proof tools.
| carlsborg wrote:
| Vaguely remember seeing this in the first few chapters somewhere
| of Spirit of C by Mullish and Cooper.
| ermir wrote:
| Isn't the sorting algorithm in question the famous BubbleSort? I
| understand the value of formally proving it works, but why is the
| name mentioned nowhere?
| [deleted]
| justusthane wrote:
| Nope - here's the original paper on the algorithm in question:
| https://arxiv.org/pdf/2110.01111.pdf
| palotasb wrote:
| No, it's not. Previously discussed on HN here:
| https://news.ycombinator.com/item?id=28758106
| (https://arxiv.org/abs/2110.01111)
| smcl wrote:
| It feels similar at first blush but it's not really. In bubble
| sort you compare/swap adjacent elements, and exit if you make a
| pass through the collection without making any changes. Whereas
| this will compare/swap the element at every index to every
| other index, and just exits when it's done performing all those
| comparisons.
| ufo wrote:
| It's actually closer to insertion sort.
| hobo_mark wrote:
| Wow, I wish we had these built-in provers in VHDL (which is
| basically Ada in sheep's clothing).
| freemint wrote:
| They are there. You just have to pay a lot to the right
| companies.
| nabla9 wrote:
| >which is basically Ada in sheep's clothing
|
| Only the syntax is similar.
| Stampo00 wrote:
| I've been watching those mesmerizing YouTube videos visualizing
| sorting algorithms lately. The header of this article uses a
| screen cap from one of them.
|
| Them: So what shows do you watch?
|
| Me: ... It's complicated.
|
| There are a lot of different sorting algorithms. Like, _a lot_ ,
| a lot.
|
| As I watch them, I try to figure out what they were optimizing
| for. Some only scan in one direction. Some only use the swap
| operation. Some seem to do the minimum number of writes. Some are
| incremental performance improvements over others.
|
| When I see an algorithm like this, I don't assume the person who
| wrote it was an idiot. I assume they were optimizing for
| something that's not obvious to me. Its only modifying operation
| is swap, so maybe that operation is faster than an arbitrary
| insert for whatever system or data structure they're using. There
| are no temporary variables besides loop counters, so maybe
| they're on a memory-constrained environment. There's barely any
| code here, so maybe this is for a microcontroller with precious
| little ROM. Or maybe they're applying this as a binary patch and
| they have a strict ceiling to the number of ops they can fit in.
|
| Or maybe it's just the first sorting algorithm they could think
| of in an environment that doesn't ship with one and the
| performance is adequate for their needs. In that case, it's
| optimized for developer time and productivity. And honestly, it's
| a far more elegant algorithm than my "naive" version would be.
|
| These are all valid reasons to use a "naive" sorting algorithm.
| ki_ wrote:
| a "new sorting algorithm"? Pretty sure it existed 50 years ago.
| fdupress wrote:
| Every time somebody proves that a sorting algorithm sorts, they
| forget to prove that the algorithm doesn't just drop values or
| fill the entire array with 0s (with fixed-length arrays, as
| here).
|
| On paper: "it's just swaps" Formally: "how do I even specify
| this?"
|
| (For every value e of the element type original array and the
| final array have the same number of occurrences of e. Show that
| that's transitive, and show it's preserved by swap.)
| touisteur wrote:
| Hi, co-author here. We actually touched upon this subject in
| the article: 'Tip: Don't prove what you don't need to prove'
| and surrounding text. Also previous comment by Yannick:
| https://news.ycombinator.com/item?id=31980126
| fdupress wrote:
| Looked for it and missed it.
|
| And thanks for writing this up, by the way. Having people
| less familiar with formal methods try and write up their
| experiences is, I think, much more effective at letting
| people in than having experts try and write tutorials.
| MrYellowP wrote:
| > published at the end of 2021
|
| I'm so confused. That "new" algorithm is just BubbleSort?
| Jtsummers wrote:
| As addressed in other comments in this thread and past threads
| on this sort (when the paper came out), it is _not_ bubble
| sort. Bubble sort only compares adjacent items and swaps them
| if they are in the wrong order. This has the effect of
| "bubbling" the largest value to the top (the way it's often
| written, could also reverse it to push the smallest down
| first).
|
| This sort is like a worse performance version of insertion
| sort. But insertion sort creates a partition (and in doing this
| performs a sort) of the first _i+1_ items, with _i_ increasing
| until the entire sequence is sorted. This one will scan the
| entire sequence every time on the inner loop instead of just a
| subset of it and usually perform more swaps.
| prionassembly wrote:
| I taught myself this exact algorithm at 8 or 10 by sorting
| playing cards (we established an order for the kind, diamond <
| leaf < spades < heart) in a very very very long plane trip where
| I had neglected to bring comic books or something.
| ufo wrote:
| The comments so far have tended to focus on the proof itself, but
| for me the coolest part of the blog post were the formal methods.
|
| Does anyone here also use SPARK for this sort of thing? Are there
| other formal methods tools you'd use if you had to prove
| something like this?
| im3w1l wrote:
| Yeah, I see this as a very interesting reply to the TLA+ post.
| Might spend my evening diving into Ada and Spark.
|
| Edit: Some things I noticed. The package gnat-12 does not have
| gnatprove. Ada mode for emacs requires a compilation step that
| failes with gnat community edition. With alire there is no
| system gnat so it cannot compile it (quite possible I'm missing
| something). In the end I gave up on using emacs. Gnatstudio
| wouldn't run for me until I realized it needed ncurses. It also
| had some unhandled exceptions for me (raised PROGRAM_ERROR :
| adjust/finalize raised GNATCOLL.VFS.VFS_INVALID_FILE_ERROR:
| gnatcoll-vfs.adb:340), but in the end I managed to get it up
| and running.
|
| Edit2: After playing around with it, I'm extremely impressed
| with what spark can do. I made a function to add 7 to numbers.
| Tried putting a post condition that the return value is bigger
| than input. "Nuh uh, it could overflow". Ok so I add a pre
| condition that numbers must be less than 100. "Uhm, buddy you
| are passing in 200 right there". This is useful stuff for real
| and easy to write too.
| yannickmoy wrote:
| you can ask Alire to install the latest GNAT it built, or to
| use another version installed on your machine, see
| https://alire.ada.dev/transition_from_gnat_community.html
|
| It also explains how to install GNATprove: ``alr with
| gnatprove``
| im3w1l wrote:
| Sorry if it was confusing I kind of jumped between the
| issues I had with various approaches. I did manage to get
| gnatprove through alire through just that command, it was
| the apt gnat that didnt have gnatprove. What I wasn't sure
| how to correctly do with the alire install was
| cd ~/.emacs.d/elpa/ada-mode-i.j.k ./build.sh
| ./install.sh
|
| Actually I didn't get that working with any of the options
| I tried I guess.
| layer8 wrote:
| I predict that one day it will be common for this to be a
| builtin language feature.
| [deleted]
| twawaaay wrote:
| I actually used this algorithm a decade ago to implement a log-
| based, transactional database system for an embedded system with
| very low amount of memory and requirement that all memory be
| statically allocated.
|
| To the frustration of the rest of the development team who first
| called me an idiot (I was new) then they could not make quicksort
| run as fast on inputs that were capped at something like 500
| items. Apparently, algorithmic complexity isn't everything.
| delecti wrote:
| I fee like anyone who was surprised that algorithmic complexity
| isn't everything, probably didn't totally understand it. The
| assumptions (like ignoring constants) are straight out of
| calculus limits. That (+10000) on the end doesn't mean anything
| if you're sorting an infinite list, but it means a lot if
| you're sorting 15 (or in your case 500) entries.
| twawaaay wrote:
| Well, it actually is kinda worse (or better, depends how you
| look at it).
|
| It is not necessarily +10000, it can also be something like
| x5000.
|
| Because CPUs really, really, really like working short,
| simple, predictable loops that traverse data in a simple
| pattern and hate when it is interrupted with something like
| dereferencing a pointer or looking up a missing page.
|
| So your super complex and super intelligent algorithm might
| actually be only good on paper but doing more harm to your
| TLB cache, prefetcher, branch predictor, instruction
| pipeline, memory density, etc.
|
| So there is this fun question:
|
| "You are generating k random 64 bit integers. Each time you
| generate the integer, you have to insert it in a sorted
| collection. You implement two versions of the algorithm, one
| with a singly-linked list and one with a flat array. In both
| cases you are not allowed to cheat by using any external
| storage (indexes, caches, fingers, etc.)
|
| The question: in your estimation, both algorithms being
| implemented optimally, what magnitude k needs to be for the
| linked list to start being more efficient than array list."
|
| The fun part of this question is that the answer is: NEVER.
| simiones wrote:
| > The fun part of this question is that the answer is:
| NEVER.
|
| Are you claiming that it is faster in practice to insert an
| element at the beginning of an array with 1G items, than it
| is to insert it at the beginning of a linked list with 1G
| items?
|
| Or that, on average, the time spent traversing the linked
| list (assuming an average number of cache misses) the time
| taken to move all elements in the array one to the right?
|
| I am certain the first item will not be true, and the
| second may or may not be true depending on many many other
| details (e.g., in a language with a compacting GC, the
| linked list elements will often be stored approximately
| like in the array case, and you may not pay too high a cost
| for cache misses).
| twawaaay wrote:
| > Are you claiming that it is faster in practice to
| insert an element at the beginning of an array with 1G
| items, than it is to insert it at the beginning of a
| linked list with 1G items?
|
| You missed that insertion point is selected at random and
| must be found as part of the operation. But in practice
| if you know data will be inserted preferentially at the
| wrong end -- you can just easily reverse the data
| structure to make insertions prefer the better end.
|
| > I am certain the first item will not be true
|
| I am used to it. Everybody gets it wrong until they spend
| some time thinking about it, get it explained and/or run
| a real life test.
|
| Hint: think about the cost of iterating over half of the
| linked list versus performing binary search over the
| array AND moving half of it.
|
| The cost of binary search goes to negligible pretty fast
| and cost of moving 8 bytes of memory is always going to
| be lower than the cost of iterating over one entry (16
| bytes) of linked list. And you have statistically equal
| number of both assuming you are selecting values at
| random with equal chance for each 64 integer to be next
| choice.
|
| CPU can be moving a lot of bytes at the same time but it
| has to dereference the pointer before it can get to
| perform comparison before it can dereference the next
| pointer...
|
| Actual algorithmic complexity of both algorithms is
| exactly the same. But an array is much more efficient
| (instructions per item).
| kmonsen wrote:
| You are right and this is will know, but not only for the
| reason you state. Cache locality is so much better on an
| array/vector compared to linked list and that is really
| important.
| twawaaay wrote:
| Good point. Cache locality is another important reason
| that I should have mentioned. I have not run tests, but
| it may possibly be the most important reason of them all.
|
| In many linked data structures you can mitigate a lot of
| this problem. For example, in the past I have used
| contiguous arrays where I would allocate linked elements,
| then I would try to exploit how the elements are produced
| to get them placed one after another, if possible.
|
| When data is produced randomly you can try to create
| "buckets", where each bucket is backed by an array and
| attempts to keep consecutive elements relatively close to
| each other. You could then try to rewrite those buckets
| in a way that gets amortised. Or you can just straight
| write it as if it was an array of sorted linked list
| elements where on each delete or insert you have to move
| some of the elements, but you only need to reorganise one
| bucket and not others.
|
| All a bit cumbersome and at the end of it you are still
| wasting at least half of the cache on pointers.
| jasonwatkinspdx wrote:
| A fun related thing I first learned about years ago due
| to a Dr Dobbs article: some strided access patterns are
| FAR worse than random access patterns. This is because
| certain strides hit the worst case behavior of limited
| associativity caches, dramatically reducing the effective
| cache size.
| kmonsen wrote:
| Here is a video by Bjarne himself explaining this:
| https://youtu.be/YQs6IC-vgmo
|
| Note there is a use case for linked list,
| inserting/deleting one or few items once in a while. But
| in general you can just use vector for anything that is
| not a leetcode problem.
| jwilk wrote:
| I think you misunderstood the question.
|
| If you insert an element at the beginning, there's no
| list iteration cost.
| twawaaay wrote:
| Please, read the question.
|
| It stipulates that elements are selected at random.
|
| You select k integers at random.
|
| Some of them might be inserted faster into linked list,
| but when you average it over k integers you will still be
| slower because inserting into array will be faster, on
| average.
|
| Just think what happens if the integer is inserted at the
| END of the list (same probability...) You need to slowly
| iterate over entire linked list. While you quickly land
| at the result with a binary search on an array and then
| pay almost no cost inserting it there.
|
| If you think about it, ON AVERAGE, you have to search
| through half of the list (if you are linked list) or move
| half of the array (if you are an array).
| jwilk wrote:
| The parts of simiones's message you quoted are not about
| the average case. They are about the corner case when an
| item would be added the beginning, meaning the linked
| list has the biggest advantage.
| twawaaay wrote:
| You don't get to change the rules of the game while the
| game is being played.
|
| The original problem, as stated by me, was:
|
| "You are generating k random 64 bit integers. Each time
| you generate the integer, you have to insert it in a
| sorted collection.
|
| (...)
|
| The question: in your estimation, both algorithms being
| implemented optimally, what magnitude k needs to be for
| the linked list to start being more efficient than array
| list."
|
| The cost is not evaluated for any individual insertion,
| only for the total cost of inserting k integers.
|
| For some of the integers the cost of inserting to linked
| list will be higher, and for some the cost of inserting
| to array will be higher. It does not matter. We do not
| care. We just care about what is the cost of inserting k
| integers thus randomly generated a) into sorted linked
| list and b) into sorted array list.
| [deleted]
| layer8 wrote:
| Well, there is one case where a linked list may be faster:
| when due to memory fragmentation you are unable to allocate
| contiguous _k_ elements (hence efficiency would drop to
| zero) but you would still able to allocate _k_ nodes (or
| _k_ / _n_ nodes with _n_ elements each). It can therefore
| make sense to use a linked list of arrays with some
| sensible limit on the array size (e.g. in the gigabyte
| range). As long as a smaller number of elements is used,
| the linked list will only have one node and degenerate to
| the array solution.
| twawaaay wrote:
| If that was on a very old operating system (think MS-
| DOS?)
|
| On new operating systems process address space is pieced
| together independently from physical space. So you can
| have contiguous, multi-GB array even if you don't have
| contiguous, multi-GB physical region.
|
| As you read/write your virtual address space the CPU
| detects that there is no TLB mapping and interrupts into
| operating system which behind the scenes finds a page,
| returns the mapping and continues as if nothing ever
| happened.
|
| This is why a process nowadays can allocate any amount of
| memory -- more than there is physical memory on the
| machine. It only starts grinding to a halt when you
| actually try to use it all. (This actually is one of my
| interview questions -- Can a process allocate more memory
| than is physically available on the machine? Why does it
| work?)
| texaslonghorn5 wrote:
| If I understand correctly, you would just run the n+1 th outer
| loop to insert the new item each time?
| [deleted]
| dahart wrote:
| You sure it was this algorithm and not Bubble Sort?
|
| > algorithmic complexity isn't everything
|
| Yeah very true. Or at least hopefully everyone knows that
| complexity analysis only applies to large n, and small inputs
| can change everything. In console video games it was very
| common to avoid dynamic allocation and to use bubble sort on
| small arrays. Also extremely common to avoid a sort completely
| and just do a linear search on the (small) array while
| querying, that can end up being much faster than sorting and
| binary searching, especially when the total number of queries
| or the number of queries per frame is also low.
| twawaaay wrote:
| Oh yeah, linear search I did a lot to the same consternation
| of other devs.
|
| The issue was the device was so low in memory (2MB of unified
| NV and flash for code, files and operating memory) that there
| simply did not exist enough space for a lot of things to be
| held to be any problem for the 20MHz ARM7 controller as long
| as you did not do anything stupid. 600kB of it was used by
| OpenSSL and further 300kB by operating system anyway (I am
| aware of how funny it sounds when OpenSSL takes twice as much
| space as OS).
| eranation wrote:
| That is pretty awesome if you came up with this exact
| algorithm a decade ago, ( it was published in 2021, but
| apparently was in the wild in 2015[1] and I assume
| discovered earlier, but no one thought it was interesting
| enough to publish). Why didn't you use bubble sort /
| insertion sort? (The algorithm in the paper looks like
| bubble sort at first look but is basically a clever
| insertion sort) what was the benefit of using it this way?
|
| [1] https://cs.stackexchange.com/questions/45348/why-does-
| this-s...
| twawaaay wrote:
| I am not sure how awesome it is. It looks like something
| you can stumble on your own by accident on a whiteboard
| on an interview and use successfully even if you don't
| know why it works exactly.
|
| Honestly, I always thought it is a common knowledge and
| it is just too simple and for this reason gets omitted
| from books.
|
| People in CS/IT tend to not spend a lot of time on
| algorithms with bad complexity and so I am used to people
| discounting algorithms with useful properties just
| because there is another that is a bit more efficient
| when n goes to infinity.
| yannickmoy wrote:
| For a related eye-opening result, with both practical
| application and interesting research backing it, see this
| talk on "Quicksorts of the 21st century" at Newton
| Institute on Verified Software last year:
| https://www.newton.ac.uk/seminar/30504/
|
| TLDR; practical complexity computation needs to take into
| account things like memory access latency on different
| architectures.
| Izkata wrote:
| Yeah, this sorting algorithm is something you can come up
| with by accident if you do bubble sort wrong. I wouldn't
| be surprised if a lot of beginners came up with this one
| in intro courses. I think I saw it around 2006-2008 when
| I was a TA and one of my students couldn't figure out why
| his array got sorted in the wrong order (don't remember
| for sure though).
|
| For example, from 2016, here's someone not getting the
| difference between the two:
| https://stackoverflow.com/questions/40786409/whats-the-
| diffe...
| jwilk wrote:
| The paper discussed on HN:
|
| https://news.ycombinator.com/item?id=28758106 (318 comments)
| drdrek wrote:
| He starts from the wrong axiom that its hard to prove and creates
| a lot of nonsense over that.
|
| Its requires just two induction proofs: - One that for I=1, after
| N comparisons the largest number is at position 1 (Proven with
| induction) its the base case - The other, that for any I=n+1 if
| we assume that the first n slots are ordered we can treat n+1 as
| a new array of length N-n and solve using the base case proof.
|
| Talking about computer science and not doing the bare minimum of
| computer science is peak software engineering :)
| [deleted]
| [deleted]
| tonmoy wrote:
| Note that the equality sign is less than, but the algorithm
| sorts in a descending order.
| umanwizard wrote:
| Not true.
|
| Clojure code: (defn swap [items i j]
| (assoc items i (items j) j
| (items i))) (defn weirdsort [items]
| (let [ct (count items) indices (for [i (range
| ct) j (range ct)]
| [i j])] (reduce (fn [items [i j]]
| (let [ith (nth items i) jth (nth
| items j)] (if (< ith jth)
| (swap items i j) items)))
| items indices)) )
|
| Then in the repl: my-namespace> (def items
| (shuffle (range 20))) #'my-namespace/items
| my-namespace> items [19 0 13 16 14 18 15 7 10 17 9 11
| 6 1 3 4 12 2 5 8] my-namespace> (weirdsort items)
| [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
| puffoflogic wrote:
| That is an elegant proof... for a totally different algorithm.
| J starts at 1, not at I. Every element of the loop is subject
| to be moved in every single outer iteration. Besides it gets
| the comparison backwards.
| drdrek wrote:
| You are absolutely correct, I was gun ho and got punished
| with internet shame, bellow is the correct proof. but the
| point still stands.
|
| Solution is even simpler:
|
| Empty case after 1 iteration (I=1) the largest number is at
| position 1 Base case: after 2 iterations (I=2) the 2 first
| elements are ordered, and the largest number is at position 2
|
| Assume N case: after N iterations the first N numbers are
| ordered (within the sub list, not for the entire array) and
| the largest number is at position N
|
| N+1 case (I=N+1): For Every J<N+1 and A[J]>= A[I] nothing
| will happen From the first J where J<N+1 and A[J] < A[I]
| (denote x) each cell will switch as A[J] < A[J+1] At J=N+1
| the largest number will move from A[N] to A[I] For every
| J>N+1 Nothing will happen as the largest number is at A[I]
|
| Not part of the proof but to make it clear we get: - for J<x
| A[J]<A[I] - for J=x A[J]=A[I] - for J<x A[J]>=A[I] and the
| list is ordered for the first J elements
| alex_smart wrote:
| > Solution is even simpler:
|
| Press X to doubt.
| umanwizard wrote:
| No, it works, there's just a bit of an unstated step in the
| proof.
|
| After j ranges from 1 to i, it will still be the case that
| the values from 1 to i are sorted. So you can assume that j
| starts at i without disturbing the induction condition.
|
| The reason this is true is... If A[i] is >= any element A[j]
| for j < i, then it is clearly true. Otherwise, there is some
| smallest j in 1 to i-1 inclusive such that A[i] < A[j]. Then
| A[i] will swap with that A[j], then it will swap again with
| the next A[j+1] (because the original A[j] was < A[j+1]) by
| the induction case, then with the next element, ... swapping
| with every element until j reaches i.
|
| After this is done we have maintained the condition that A[1]
| .. A[i] are sorted after j ranges from 1 to i.
| leereeves wrote:
| That is the correct inductive step all by itself.
|
| That's all we need to know: during each iteration after the
| first, the algorithm inserts the ith element into the
| previously sorted list from A[1] to A[i-1], giving a new
| sorted list from A[1] to A[i], and doesn't touch the rest
| because A[i] contains the maximum element.
|
| Then when i=n the whole list is sorted.
| phkahler wrote:
| I thought his goal was to get the prover to prove it without
| understanding it himself.
|
| By realizing the low-indexed portion is always sorted, you've
| already proved the algorithm yourself and the prover is just
| checking for bugs in your logic.
|
| I'm not saying the proof isnt valuable, just that it's not
| magical and actually requires the user to understand the
| majority of the proof already.
| leereeves wrote:
| This algorithm is surprising and interesting because it
| doesn't work at all like it at first seems to.
|
| The low-indexed portion is sorted, but isn't guaranteed to
| contain the lowest or the highest _i_ elements of the list
| (except when i=1), and the list is ultimately sorted in
| decreasing, not increasing order. The final sort doesn 't
| occur until the last iteration of the outer loop when the
| inequality is reversed (the interesting variable, j, is on
| the right).
|
| Because of that, the proof outline discussed here doesn't
| work.
|
| Consider what happens if the unique smallest element starts
| at position n. It is placed at the start of the list (the
| correct final position) in the final iteration of the outer
| loop (i=n), and not before.
|
| Proof (for simplicity the list is indexed 1 to n):
|
| Let A[n] < A[k] for all k != n in the initial list.
|
| Elements are only swapped when A[i] < A[j]. If i != n and j =
| n, then A[i] > A[n] = A[j], so A[n] is not swapped.
|
| Then, when i = n and j = 1, A[i] = A[n] < A[1] = A[j], so
| A[1] and A[n] are swapped, placing the smallest element in
| position 1 at last.
| umanwizard wrote:
| It's not true that it's sorted in decreasing order. See my
| comment here: https://news.ycombinator.com/item?id=31978942
| leereeves wrote:
| Quite right, that was a silly mistake on my part. I did
| say the smallest element was placed first, but wrote "the
| list is ultimately sorted in decreasing, not increasing
| order" backwards.
|
| I meant to contradict the top post: "the largest number
| is at position 1...we can treat n+1 as a new array of
| length N-n and solve using the base case proof", which
| would result in decreasing order (largest first).
| tromp wrote:
| If the (i+1)'st outer loop starts with the first i elements in
| sorted order, then it ends with the first i+1 elements in
| sorted order.
|
| In fact if k of the first i elements are <= A[i+1], then the
| first k iterations of the inner loop will not swap, and the
| next i-k iterations will swap. The latter can be seen to
| maintain order.
|
| The last n-i iterations of the inner loop (where j>=i) do not
| affect this reasoning and can be omitted, as can the first
| iteration of the outer loop.
| MattPalmer1086 wrote:
| Interesting. It is a bit counter intuitive at first but not too
| hard to see how it works.
|
| After the first main loop, the first item will be the biggest
| item in the list.
|
| The inner loop, as it always starts from 1 again will gradually
| replace the bigger items with smaller items, and so on.
| JadeNB wrote:
| > After the first main loop, the first item will be the biggest
| item in the list.
|
| > The inner loop, as it always starts from 1 again will
| gradually replace the bigger items with smaller items, and so
| on.
|
| I think the "and so on" is the point. To put it more precisely,
| I think that anyone (with a suitable familiarity with the
| tools) can prove formally that "after the first main loop, the
| first item will be the biggest item in the list"; but I think
| that it might take a bit more work to prove the "and so on" in
| a way that satisfies a formal prover.
|
| (As to the intuition--I can buy that the "and so on" part is
| reasonably intuitive, but only at a level of intuition that can
| also believe wrong things. For example, just always replacing
| bigger items with smaller items doesn't guarantee you'll get a
| sorted list!)
| MattPalmer1086 wrote:
| Sure, formal proof would be a lot harder. I just didn't find
| it quite as surprising that it works as the article implied.
| asrp wrote:
| After each outer loop iteration, A[:i] (the array up to i) is in
| ascending order with the max of A at A[i].
|
| This is true the first iteration since max(A) is eventually
| swapped to A[1]. This is true in subsequent iterations since
| during the ith iteration, it inserts the next element, initially
| at A[i], into A[:i-1] and shift everything up with swaps with
| A[i] so A[:i] is sorted, with the max of A moved to A[i]. After
| that no more swaps happen in the ith iteration since A[i]
| contains the max of A.
| [deleted]
| roenxi wrote:
| If it looks "obviously wrong" the quick proof that it works ...
| the j loop will move the largest unsorted element in the array
| into a sorted position. Since the i loop executes the j loop n
| times, the array must be sorted (since the n largest elements
| will be in the correct order).
|
| _EDIT_
|
| ^W^W^W
|
| Nope, I'm wrong. I did a worked example on paper. I think the
| devious thing here is the algorithm looks simple at a lazy
| glance.
|
| To sort: 4-3-6-9-1, next round pivot for the _i_ loop in [].
| [4]-3-6-9-1 9-[3]-4-6-1 3-9-[4]-6-1
| 3-4-9-[6]-1 3-4-6-9-[1] 1-2-3-6-9 & sorted
|
| I can see that it sorts everything to the left of a pivot, then
| because it does that n times it comes to a sorted list. A
| reasonable proof will be more complicated than I thought.
| phkahler wrote:
| The list on the left (index less than i) is always sorted. The
| ith element is inserted by the j loop and the rest of the list
| is shifted right by one element by repeated swapping with the
| ith position. Nothing to the right changes because the ith
| element is the max of the entire list, which seems to be a red
| herring for analysis.
| blueflow wrote:
| This somewhat looks like bubble-sort minus the early exit.
| MrYellowP wrote:
| It _is_ BubbleSort. At least that 's how I've learned it in
| the 80s.
| zajio1am wrote:
| It is just quirky insert-sort. In each outer cycle, it
| inserts value originally from the A[i] position to already
| sorted sequence A[1]..A[i-1], while using A[i] as a temporary
| variable to shift higher part of the sorted sequence one
| position up.
| jeffhuys wrote:
| Small sort implementation in ES6: const args =
| process.argv.slice(2).map(Number); for (let i = 0; i
| < args.length; i++) for (let j = 0; j < args.length;
| j++) if (args[i] < args[j]) [args[i],
| args[j]] = [args[j], args[i]];
| console.log(args.join(' '));
|
| Didn't expect it to be actually that simple, but now that I wrote
| it, it makes complete sense.
|
| To PROVE it, is another thing. Good work.
| [deleted]
| eatonphil wrote:
| This is a fantastic post! It's great to have a concrete example
| of proving a not-trivial algorithm. And somehow this Ada code
| feels more approachable to me than HOL4 and other functional
| proof assistants I've looked at.
|
| While it's been on my list for a while, I'm more curious to try
| out Ada now.
| Jtsummers wrote:
| It'll take you less than a week to get proficient enough in it
| to make something useful, or at least interesting. It's a very
| straightforward language, overall.
|
| https://learn.adacore.com/ - Pretty much the best free source
| until you're actually ready to commit to the language. The
| "Introduction to Ada" course took me maybe a week of 1-2 hours
| a day reading and practicing to go through. There's also a
| SPARK course that takes a bit longer, but is also interactive.
|
| The language reference manuals for 2012 and 202x (which should
| become Ada 2022):
|
| http://www.ada-auth.org/standards/ada12.html
|
| http://www.ada-auth.org/standards/ada2x.html
| touisteur wrote:
| Also, don't hesitate to use Rosetta Code to find useful
| little snippets. Not all perfect but a good jump-start.
| Jtsummers wrote:
| Yep, I wrote a bunch of those RC examples too. It was a
| useful exercise when I decided to learn Ada (and with other
| languages, too).
| touisteur wrote:
| Two of the most interesting things that trickled down during
| the design of Spark2014:
|
| 1- contracts are written in the same language as the code to be
| checked/proved. This made important to add expressive features
| to the language (for all / for some: quantifiers! expression
| functions, if- and case-statements and more recently the new
| delta-aggregates notation): these additions make the language
| far more expressive without too much loss of readability.
|
| 2- executable contracts: most contracts can be checked at
| runtime or proved. And the escape hatch (code only present for
| proof) is 'ghost' code which is also a nice addition.
|
| Lots of little nifty additions to the language, from just
| 'thinking in contracts' or 'designing for probability'.
| rationalfaith wrote:
___________________________________________________________________
(page generated 2022-07-04 23:00 UTC)