[HN Gopher] Deepmind Alphadev: Faster sorting algorithms discove...
___________________________________________________________________
Deepmind Alphadev: Faster sorting algorithms discovered using deep
RL
Author : anjneymidha
Score : 539 points
Date : 2023-06-07 15:31 UTC (7 hours ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| jonbaer wrote:
| "AlphaDev uncovered new sorting algorithms that led to
| improvements in the LLVM libc++ sorting library that were up to
| 70% faster for shorter sequences and about 1.7% faster for
| sequences exceeding 250,000 elements."
| dheera wrote:
| > up to 70% faster
|
| So O(0.3(N log N))? That's still O(N log N)
| dekhn wrote:
| In the real world, we care about runtime as much as, if not
| more than, computational complexity.
| boomanaiden154 wrote:
| Definitely more. There are lots of things that might be
| more optimal in terms of raw complexity but end up having
| abysmal performance due to things like cache locality.
| Anything with pointer chasing usually kills performance.
| xdavidliu wrote:
| for example, matrix multiplication has all sorts of
| algorithms with exponent significantly less than 3, but
| they are nearly all curiosities. In practice, the
| straightforward N^3 algo is used nearly everywhere
| dekhn wrote:
| I had a long discussion with Rob Pike once that ended
| with me saying "you know, I don't think people actually
| use Strassen on supercomputers" (we were discussing how
| somebody had managed to show an N*2.37 algorithm or
| something like that and "it was a new world's record").
| xdavidliu wrote:
| nit: ^, not *. ** for Python, but never *
| caturopath wrote:
| They're not going to find a general sorting algorithm faster
| than O(n log n), but that doesn't mean all O(n log n) sorts
| are created equal.
| iopq wrote:
| That's not true because Thorup's algorithm is O(n log log
| n)
| xdavidliu wrote:
| any sort that only uses comparisons is provably not
| better than n log n. Hence whatever Thorup is, it doesn't
| work for arbitrary input, and must assume something
| further, something like "all elements under 1000 or
| something"
| utopcell wrote:
| I think the parent's argument is that, since they only
| evaluated their algorithms on sorting arrays of 32-bit or
| 64-bit integers, it is fair game to compare against
| integer sorting algorithms for which we have better
| complexities than O(nlgn).
| reaperman wrote:
| This seems like a very valid point for iopq's clarifying
| point in the context of what still might exist to be
| discovered in the set of theoretically possible new
| algorithms, though doesn't change that dheera set the bar
| much, much higher than most people would agree with.
|
| Thank you.
| xdavidliu wrote:
| but the number of elements in the array is also a 32 or
| 64 bit integer, so you still cannot do better than O(n
| log n) even with fancy radix sort or whatever
| graycat wrote:
| The Gleason bound is n log(n) and says that no sorting
| algorithm based on comparing pairs of keys can be faster.
| Heap sort meets the Gleason bound so is the fastest
| possible in this context. Actually the usual versions of
| quick sort are slower. If the keys are not too long,
| radix sort is O(n) and faster. All this has been well
| known for decades. I explained a little more in another
| post in this thread.
| guyomes wrote:
| > If the keys are not too long, radix sort is O(n) and
| faster.
|
| More precisely, if the key length is w, then radix sort
| is O(w n) operations. In particular, if the n elements
| are distinct integers for example, w is greater than
| log(n).
| CyberDildonics wrote:
| I think you are getting your information mixed up. Here
| is a comparison that shows quicksort running in 1/10th
| the time as heapsort.
|
| https://johnysswlab.com/why-is-quicksort-faster-than-
| heapsor...
| xdavidliu wrote:
| we're talking about asymptotics (i.e. the exponent).
| Things like "1/10 the time" are utterly meaningless here.
| ralfn wrote:
| If the thing you have to sort is within a known domain you
| can definitely beat o(n log(n)). Just expect crazy memory
| usage.
|
| And that's only not the case in theory. But nobody owns a
| real Turing machine with infinite tape and truly infinite
| numbers. It doesn't exist in reality.
|
| You can always divide time by multiplying space with the
| same factor.
| xdavidliu wrote:
| by "general sort" your parent comment means "comparison
| sort", and the claim (which has been proved, see CLRS
| intro to algorithms) is that you cannot do it in better
| than O(n log n) comparisons.
| yunohn wrote:
| Parent said: > not going to find a general sorting
| algorithm
|
| You said: > you have to sort is within a known domain you
| can definitely beat
|
| Not sure why you framed your response this way?
| reocha wrote:
| Pretty impressive, I'm looking forward to optimizations in other
| commonly used algorithms.
| mellosouls wrote:
| Broken link. Ok it's there now, maybe a Nature snafu.
|
| News Release:
|
| https://www.deepmind.com/blog/alphadev-discovers-faster-sort...
| [deleted]
| lngnmn2 wrote:
| Not general or universal. Only for pre-trained data and with
| abysmal worst cases.
| caturopath wrote:
| I'm not positive what you mean here. Are you saying the
| discovered algorithm isn't actually good? Didn't LLVM accept
| it?
| pjscott wrote:
| Are you confusing this with another, unrelated paper? Because
| the algorithms they present here do, in fact, sort any sequence
| of 3, 4, or 5 integers in constant time with close to the
| minimal number of instructions and (in the variable-size case)
| with minimal branching. There is no abysmal worst-case time,
| and the execution time for a fixed array length does not depend
| on the data.
|
| You can then plug these into other sorting algorithms (like
| quicksort, mergesort, etc.) to handle sufficiently small
| subsequences fast. Which is exactly how LLVM's libc++ is using
| them right now.
| EGreg wrote:
| Almost all ML is basically searching through a (latent) space of
| possible answers and indexing the results. So it outperform most
| known methods. Like when AlphaGo using MCTS beat Rybka, etc.
|
| I would be interested to know if ML can be used to reverse
| hashes, such as to more quickly solve the numbers that satisfy
| the Bitcoin Hash challenge. Maybe it can even get P and NP closer
| together!
|
| Are there theoretical results on SHA-256 or something that
| preclude finding an ML algorithm that, with a billion parameters,
| can speed up the search for a Bitcoin Hash input?
| jeffbee wrote:
| The sorting seems cool but I am more interested in the varint
| decoder. Does anyone know where the source code of that landed,
| if it did?
| graycat wrote:
| "Faster sorting"?
|
| Heap sort achieves the Gleason bound and, thus, is the fastest
| possible sort by comparing pairs of keys.
|
| For keys not too long, radix sort can be faster.
|
| These facts have been very well known for decades.
|
| There might be more to do in sorting with some different records,
| keys, computer architecture, concerns about caches and _locality
| of reference_ , but otherwise, thankfully, sorting is a well
| solved problem.
| CaptainNegative wrote:
| Heap sort's worst- and average-case running time are both
| provably within a constant factor of optimality (both in
| running time and number of comparisons), but it's either likely
| or straight up provably not actually optimal in any of the four
| categories.
| CyberDildonics wrote:
| Heap sort gets beat by sorting algorithms with better locality.
| You can't be the fastest while skipping through memory for
| every operation.
| VikingCoder wrote:
| Dumb question -
|
| Where are the machine learning de-compilers?
|
| Given an executable program, give me human-readable code that
| generates the executable exactly. With machine learning guessed
| function, type, variable names...?
|
| I get that it's a very hard problem. But... Does it just not
| work? Or have people not done it yet? Or have I missed it?
| xwdv wrote:
| This is the killer app for me with AI. A way to get a non-
| copyrighted code for any program you already have binary code
| for.
| ealexhudson wrote:
| That sounds obviously transformative and on any non-trivial
| scale I can't see why copyright would be avoided.
| reaperman wrote:
| I think you mean "non-transformative", although in this
| context I can see there's a bit of an ambiguity in how
| people would use the word "transform" to mean one thing
| w.r.t copyright and the "opposite" for code (really,
| orthogonal, but boolean evaluation would yield the opposite
| true/false).
|
| In copyright, "transformative" refers to modifying or
| adapting a copyrighted work in a way that creates a new and
| original expression; resulting in a new work with a
| different purpose or meaning.
|
| In terms of code, you'd "just be transforming" the assembly
| code to a systems language of your choice.
| skeaker wrote:
| Personally I think this would be very exciting. Currently there
| are a ton of projects to decompile older games (see the Mario
| 64 and Ocarina of Time de-compilations and subsequent native PC
| ports as an example), but those projects are huge and take
| whole teams years to finish. If you could simply point an AI at
| a project like that and have usable source code in a reasonable
| amount of time, it would be huge for those scenes. You would
| have a sort of defacto-open source moment for just about all
| software. It feels like the stuff of far off sci-fi.
| VMG wrote:
| You can already do that to some extent with ChatGPT. Paste in
| the assembly and it gives pretty good results
| sl-1 wrote:
| That is a good question!
|
| Training datasets should be pretty trivial for this, all open
| source software that is buildable on the internet could provide
| training sets (source code & compiled code).
|
| But I guess it would have to be trained specifically for every
| architechture.
| pitaj wrote:
| This actually doesn't sound that difficult, given we can
| produce a practically infinite training dataset using a
| compiler with existing programs.
|
| What's more interesting to me would be a model that can
| automatically port a program between languages, such as
| converting a C program to a fairly idiomatic Rust program.
| shpx wrote:
| I (don't) want to see an agent learn to rowhammer the memory
| address storing its reward quantity.
| orlp wrote:
| You'll love this list of examples where the AI successfully
| accomplished reward hacking:
| https://docs.google.com/spreadsheets/d/e/2PACX-1vRPiprOaC3Hs...
| ComputerGuru wrote:
| RL: Reinforcement Learning
| MagicMoonlight wrote:
| Now do it for compression
| cypherpunks01 wrote:
| This was educational because I learned about Sorting Networks.
|
| I didn't hear of them in CS undergrad algorithms, perhaps because
| they can be thought of as a special case or optimization, rather
| than fundamental and generic variable-length sort algos.
|
| It's a simple concept, and forms the basis of all the sort
| optimizations described here.
|
| https://en.wikipedia.org/wiki/Sorting_network
| Havoc wrote:
| I like the high impact dynamic of this thinking.
|
| If that can be done say OS wide I could see that having an actual
| impact on global energy usage
| gfd wrote:
| Can someone summarize how it achieves "provable correctness"?
| finitestateuni wrote:
| The most interesting part of this paper to me is that they let
| the agent guess how efficient it's own solutions were and only
| had the model experimentally verify it's guesses in 0.002% of
| cases. This allowed the model to search much faster than another
| program that didn't guess and had to run every program.
| dsign wrote:
| I've been wondering about a similar approach for biomolecular
| simulations, where exact computations are still a hard
| bottleneck. I wonder if something like this could give us a few
| orders of magnitude more speed.
| Zamicol wrote:
| That sounds like intuition.
| Filligree wrote:
| Intuition is the only thing we've figured out how to
| automate. Reason turns out to be higher hanging fruit.
| [deleted]
| the_other wrote:
| Like humans' "slow" and "fast" thinking, then?
| Filligree wrote:
| There's likely a connection. Either way, I like to
| describe AIs like ChatGPT / diffusion models, etc. as
| operating 100% on intuition. It gives people a better
| intuition of their weaknesses...
|
| For GPT you can kind of prompt it to do chain-of-thought
| reasoning, but it doesn't work very well; not if you
| compare it to what humans do.
|
| Once again it seems like what we thought was hard, is
| easy; what we thought was easy and computer-like turns
| out to be hard.
| munchler wrote:
| The Google Bard folks have a blog post today making
| exactly that connection:
| https://blog.google/technology/ai/bard-improved-
| reasoning-go...
| fragmede wrote:
| If you tell ChatGPT "show your work", you get better
| answers
| blackbear_ wrote:
| This is actually quite common to optimize stuff in several
| disciplines. You essentially fit a surrogate model (keyword if
| you want to look up more) to whatever you want to optimize,
| then use the model to guide the procedure, making sure that the
| model is correct every one in a while.
| tux3 wrote:
| This is the same sort of "more guesses faster beats smarter
| guesses slower" that made afl-fuzz by far the best at exploring
| large search spaces in program fuzzing
|
| Fast search often beats accurate search. Sometimes adding
| clever heuristics or more complex scoring "works" but slows
| down the search enough that it's an overall loss. Another kind
| of a bitter lesson, perhaps
| lqr wrote:
| But why isn't the proposed method an instance of smart
| guessing? It reduces oracle complexity with heuristics. The
| heuristic is "build a machine learning model of the objective
| function and use it to fake oracle queries most of the time."
| GreedClarifies wrote:
| Deepmind shouldn't be part of a for profit entity. There I said
| it.
|
| They are clearly focused on moving technology forward and helping
| humanity. That's great. However, pulling down 1M+ salaries (for
| L6+ developers) and using hundreds of millions or billions in
| borg resources while adding _nothing_ to the bottom line is not
| in the interest of Google. Not to mention the negative effects on
| productivity as other Googlers attempt to replicate the "publish
| everything to support my brand" strategy of Deepmind.
|
| I know Google is not a normal company in that Larry and Sergey
| have complete control of the company, but sooner or later they
| have to realize that Deepmind needs to be spun off, and further
| that Demis is entirely the wrong person to run Google's AI. He
| doesn't care a whit about Google, it is only a vessel to fund his
| research.
|
| Products. A company is about products and customers not research
| papers. Academia or non-profits are about papers.
| benlivengood wrote:
| I think Google's biggest net positives in the world at this
| point are Waymo and DeepMind. Way better than spending money on
| new chat apps.
|
| Google's history has been having a very lucrative bottom line
| that allows for very beneficial research with absolutely no
| hardships on anyone. Supremely better than lining shareholder
| pockets; shareholders can't think enough quarters ahead for
| long-term human benefit. Salaries are just the way to attract
| and retain quality folks.
| GreedClarifies wrote:
| The billions who use their search everyday would likely
| disagree. The billions who use YouTube per day would likely
| disagree. The billions who use android would likely disagree.
| The hundreds of millions who use gmail, or gsuite would
| disagree.
|
| Products.
| casey2 wrote:
| Products are at best optimization of a yesterday's
| technology, in tech you realistically can't peek into the
| future without spending billions on r&d. Having smart
| people work on any of the things you listed is a (further)
| waste of their time.
| sidibe wrote:
| Both of these contribute a lot to Google's infra by giving it
| new challenges. It's not a one-way street, Google wouldn't know
| what to build towards in their data center/hardware pipeline
| without these really ambitious projects with huge requirements.
| [deleted]
| hexomancer wrote:
| > The confidence intervals are represented as latency +- (lower,
| upper), in which latency corresponds to the fifth percentile of
| latency measurements across 100 different machines. Lower and
| upper refer to the bounds of the 95% confidence interval for this
| percentile.
|
| Does anybody know why they chose fifth percentile? I though we
| should always choose the fastest time when measuring performance.
| ajuc wrote:
| Usually you discard extreme values to reduce noise, and in fact
| they wrote that's why they did it:
|
| > We then take the fifth percentile as our final measurement,
| because we assume that most noise sources are one-sided (for
| example, cache misses, pre-emptions and so on). During training
| we process the measurements across ten machines for
| computational efficiency.
|
| > I though we should always choose the fastest time when
| measuring performance.
|
| Depends. For games you usually do sth similar to what they did
| - exclude small percentage of worst results to reduce influence
| of noise and then optimize the worst scenario to make the game
| run consistent and smooth.
| gcr wrote:
| Because they want to make sure that the sorting algorithm works
| well for all possible workloads, not just the most preferable
| ones.
|
| If we measured sorting algorithms by the fastest measurement,
| we might conclude that BubbleSort is the fastest possible sort
| algorithm on some inputs. (Bubblesorting an already-sorted list
| makes at most one comparison per list element)
| hexomancer wrote:
| I don't think that's what they meant (or I have
| misunderstood). Running the same algorithm on the same input
| still has variations because of OS/CPU idiosyncrasies. When
| measuring performance we usually run the algorithm on the
| same input multiple times and report the fastest performance.
| air7 wrote:
| Can anyone explain how this worked? as per the paper (which I
| TLDRed): "A single incorrect instruction in the AssemblyGame can
| potentially invalidate the entire algorithm, making exploration
| in this space of games incredibly challenging."
|
| What did it do if it didn't have a useful partial score function?
| How did it avoid brute force?
| blackbear_ wrote:
| This paragraph:
|
| > To better estimate latency, we implemented a dual value
| function setup, whereby AlphaDev has two value function heads:
| one predicting algorithm correctness and the second predicting
| algorithm latency. The latency head is used to directly predict
| the latency of a given program by using the program's actual
| computed latency as a Monte Carlo target for AlphaDev during
| training. This dual-head approach achieved substantially better
| results than the vanilla, single head value function setup when
| optimizing for real latency.
|
| Briefly, they use a neural network to predict whether a given
| sequence of instructions is correct, and how fast it is. Then
| they used this neural network to guide the program generation
| via Monte Carlo tree search [1]. It is this procedure that
| keeps track of the partial score functions at each node.
|
| [1] https://en.wikipedia.org/wiki/Monte_Carlo_tree_search
| usgroup wrote:
| " The fixed sort solutions for sort 3, sort 4 and sort 5
| discovered by AlphaDev have been integrated into the standard
| sort function in the LLVM standard C++ library3."
|
| Not exactly what the title would lead you to believe.
| computing wrote:
| *Google DeepMind
| greenflag wrote:
| Does anyone have high level guidance on when (deep) RL is worth
| pursuing for optimization (e.g. optimizing algorithm design)
| rather than other approaches (e.g genetic)?
| jeffbee wrote:
| Start with a planet-scale computer that makes the marginal cost
| of RL be nearly zero, and at the same time spend a lot of money
| on hashing and sorting so the micro-optimization pays off.
| AndrewKemendo wrote:
| Less of a scale problem than a type problem usually in my
| experience.
|
| My rule of thumb is when it's easy to specify a reward function
| but infinite ways to traverse the action space - versus having
| a constrained state and action space (small n solution
| traversal pathways) and only a few possible paths to traverse.
| dontupvoteme wrote:
| From "It can't Code" to publishing articles in Nature.
| whywhywhydude wrote:
| It is astounding how something as well as studied as sorting
| still has opportunities for further improvements!
| zzbn00 wrote:
| It is not the sorting per-se which was improved here, but
| sorting (particularly short sequences) on modern CPUs with
| really the complexity being on the difficulty of predicting
| what will work quickly on these modern CPUs.
|
| Doing an empirical algorithm search to find which algorithms
| fit well on modern CPUs/memory systems is pretty common, see
| e.g. FFTW, ATLAS, https://halide-lang.org/
| jackmott42 wrote:
| Part of it is because hardware properties are always changing.
| The instructions available, the relative speed of CPU to memory
| and the various caches, how big and numerous and fast the
| various caches are, etc etc.
| GravityLabs wrote:
| I am curious why things can't just get better on a base that
| doesn't change, until the base changes because the
| improvements with the new base are just that much better...
|
| Or is that why hardware properties change so much?
| zerodensity wrote:
| For a time that is what happened. Every year you would get
| a new CPU and that would be faster at pretty much
| everything than the version the year before. But then we
| hit the clockspeed wall and the only way of making faster
| CPUs was to add complexity to the internals of the CPUs. So
| branch prediction, micro code pipelining, larger caches,
| simd instructions more CPU cores ect was the result.
|
| So nowadays a new CPU might not be better at everything
| then the previous version but it will most likely have more
| cache and some internal improvements to
| pipelining/concurrency.
|
| Given this, for newer versions it can be useful to add
| instructions to take advantage of extra pipelining or using
| a different instruction that happen to be faster now.
| dist-epoch wrote:
| What's well studies is theoretical algorithmic sorting.
|
| For practical sorting, for a particular CPU architecture, there
| is still plenty of low hanging fruit:
|
| > Today we're sharing open source code that can sort arrays of
| numbers about ten times as fast as the C++ std::sort, and
| outperforms state of the art architecture-specific algorithms,
| while being portable across all modern CPU architectures
|
| https://opensource.googleblog.com/2022/06/Vectorized%20and%2...
| skellington wrote:
| So clickbait to claim a faster algorithm when all it did was find
| a faster machine optimization for a particular CPU. It's
| fun/cool, but not the leap they seem to be representing.
| mmaunder wrote:
| In 2023 we saw the first glimpse of machines becoming better at
| programming than humans when an AI created a new sorting
| algorithm that was faster than anything humans had yet created.
| hamilyon2 wrote:
| I want this for sql compilation, plan choosing. So much effort is
| wasted on suboptimal database performance and trying to improve
| that. I think even sorting pales in comparison.
| fwlr wrote:
| Some very cool improvements found in already highly optimized
| algorithms.
|
| They found that in a sorting network handling 3 inputs, the AI
| found a way to save an instruction by reducing a "min(A, B, C)"
| operation to just "min(A, B)" by taking advantage of the fact
| that previous operations guaranteed that B <= C. They also found
| that in a sorting network handling 4 inputs, when it is part of a
| "sort 8" network, there are similar guarantees (D >= min(A, C) in
| this case) that can be taken advantage of to remove an
| instruction as well. The compiler may not be able to compete with
| hand-written assembly, but it seems an AI can hand-write assembly
| code that's even better in some cases.
|
| Another improvement is also very cool. They discuss VarSort4,
| which takes a list that may be 2, 3, or 4 elements long, and
| sorts it. The existing algorithm is if (len =
| 4) { sort4 } elif (len = 3) {
| sort3 } elif (len = 2) { sort2
| }
|
| The AI found an algorithm that looks totally different:
| if (len = 2) { sort2 } else {
| sort3 if (len = 4) { sortspecial
| } }
|
| It's pretty wild! It immediately runs Sort3 on something that may
| be either 3 elements or 4 elements long, and only afterwards does
| it check to see how long it really is. If it's 3 elements long,
| we're done; otherwise, run Sort4 - but because (having already
| run Sort3) you know the first three elements are sorted, you can
| use a special and much simpler algorithm to simply put the 4th
| element in the right spot.
|
| Very cool. Improving on core LLVM sorting algorithms that have
| already been heavily hand-optimized by the best in the world is
| definitely a "it's 1997 and Deep Blue defeats the World Champion
| in chess" kind of feeling.
| ComputerGuru wrote:
| Interesting that it took AI to pull this off. I think mutagen
| testing would have discovered the first (by virtue of checking
| which code isn't necessary _globally_ ) though likely not the
| second (which needs a new branch, unless that branch was
| already there?).
| fkarg wrote:
| As I understand it it kind of _did that_, just in an
| extremely guided kind of way, which is why it produced
| results in some reasonable amount of time (probably)
| thomasahle wrote:
| I'm surprised they only report results up to sort5. It seems at
| that level, you could just iterate through all possible
| programs. AI generated code seems more interested once you get
| to 10+ values, where classical methods break down.
| cypherpunks01 wrote:
| I believe that's because above 5 elements, fixed sorting
| networks are no longer used. Introsort takes over and
| dispatches to insertion sort, quicksort, or heap sort as
| appropriate.
|
| Divide and conquer strategies are used for larger sorts, and
| the smaller arrays could include the fixed lengths 3, 4, 5.
| bitshiftfaced wrote:
| The paper notes that they brute forced all solutions for
| sort3 to verify that it was optimal. It said that this wasn't
| possible for 4+.
| bgirard wrote:
| The title implies it found an entirely new algorithmic approach
| to sorting (like quick sort) which would have been a fantastic
| discovery. But it feels a lot like micro-optimizing the control
| flow and codegen.
| ozr wrote:
| I think this is still super interesting. It's something humans
| are unable to do/there's few humans that can. I very much like
| the pattern of writing basic logic myself and then using a
| coding model to optimize it. It's effectively what we do with
| compilers already, this just makes it better.
| GravityLabs wrote:
| Still a fantastic discovery, because now there are more
| powerful automated algorithm improvement discovery pipelines. I
| wonder what will happen when those improvements are added to
| the processing that found the improvements. :D
| bgirard wrote:
| Yes, I'm not discounting the results. I think it's a very
| interesting approach. I just think the language is important
| here. If you ask a CS class turn in their own quick sort
| implementation, you have n implementations of 1 algorithm,
| not n new algorithms.
| fwlr wrote:
| I'm not sure there is a new algorithmic approach to sorting
| like you're thinking of. From a high level you can divide
| sorting algorithms into roughly two camps, "those that use
| divide-and-conquer", and "those that do not". The divide-and-
| conquer (split into smaller sub-lists, sort the sub-lists, and
| merge them while preserving sort order) algorithms are better.
| From this perspective, algorithms that we think of as quite
| distinct like quicksort and mergesort are not that different -
| quicksort just does the "divide" step of divide-and-conquer in
| a smarter way.
|
| In the end, no matter what divide-and-conquer sorting algorithm
| you pick, you will be doing lots of "small sorts". And it is
| those small sorts that DeepMind has optimized here.
| ozr wrote:
| > I'm not sure there is a new algorithmic approach to sorting
| like you're thinking of. From a high level you can divide
| sorting algorithms into roughly two camps, "those that use
| divide-and-conquer", and "those that do not". ...
|
| I think the sci-fi-but-possibly-real hope is that for sorting
| (among other things), we may have the perspective that there
| isn't any new algorithmic approach available, but an AI finds
| one for us.
| bgirard wrote:
| There's lot of accepted sorting algorithms [1]. I'm sure we
| can come up with novel new algorithms, even if they're not
| optimal. Like Wikipedia mentions, they all fall within some
| small number of higher level categories (eg. Partitioning,
| Merging, Selection, Insertion). I'm still not convinced that
| the optimizations presented in the article amount to the
| discovery of NEW sorting algorithms but merely optimizations
| of existing ones.
|
| [1] https://en.wikipedia.org/wiki/Sorting_algorithm
| nurettin wrote:
| My first intuition, knowing the limitations of generating
| first-order logic sequences, was that they must have somehow
| restricted the sorting sequence to a number of elements.
| lumb63 wrote:
| This is really cool. I'll be interested to see if the team can
| produce useful and provably hard cryptographic hash functions
| with this tech. The other interesting application that this
| inspires is use of this tech to optimize the optimization
| algorithms used by compilers. Perhaps we can all benefit from
| optimized optimizers.
| boomanaiden154 wrote:
| There's already been quite a bit of work done on replacing
| compiler heuristics with ML models. Google has productionized
| it with MLGO and there have been quite a few papers/experiments
| on the topic.
| danlark wrote:
| You can see hashing optimizations as well
| https://www.deepmind.com/blog/alphadev-discovers-faster-sort...,
| https://github.com/abseil/abseil-cpp/commit/74eee2aff683cc7d...
|
| I was one of the members who reviewed expertly what has been done
| both in sorting and hashing. Overall it's more about assembly,
| finding missed compiler optimizations and balancing between
| correctness and distribution (in hashing in particular).
|
| It was not revolutionary in a sense it hasn't found completely
| new approaches but converged to something incomprehensible for
| humans but relatively good for performance which proves the point
| that optimal programs are very inhuman.
|
| Note that for instructions in sorting, removing them does not
| always lead to better performance, for example, instructions can
| run in parallel and the effect can be less profound. Benchmarks
| can lie and compiler could do something differently when
| recompiling the sort3 function which was changed.
|
| For hashing it was even funnier, very small strings up to 64 bit
| already used 3 instructions like add some constant -> multiply
| 64x64 -> xor upper/lower. For bigger ones the question becomes
| more complicated, that's why 9-16 was a better spot and it
| simplified from 2 multiplications to just one and a rotation.
| Distribution on real workloads was good, it almost passed
| smhasher and we decided it was good enough to try out in prod. We
| did not rollback as you can see from abseil :)
|
| But even given all that, it was fascinating to watch how this
| system was searching and was able to find particular programs can
| be further simplified. Kudos to everyone involved, it's a great
| incremental change that can bring more results in the future.
| gumby wrote:
| Your description makes the approach sound like applying ca
| 1980s simulated annealing, also a form of gradient descent. Am
| I missing something?
| ska wrote:
| This doesn't really sound like SA, which is a principled
| approach to avoid the local min problem in descent algorithms
| and end up with a true global optimization. The general
| result is clean and impractical, the approximations hard to
| analyze.
| CalChris wrote:
| This sounds like a more intelligent version of
| superoptimization. The original Masselin SO, albeit for its
| time, also created surprising results which is similar to
| AlphaDev's _incomprehensible for humans_. You see the same
| thing in computer chess which Agadmator calls _disgusting
| engine lines_.
|
| https://courses.cs.washington.edu/courses/cse501/15sp/papers...
| thesz wrote:
| So it works as a superoptimizer of sorts.
| sytelus wrote:
| If more compute is spent than before but in parallel to reduce
| latency then wouldn't this increase power? Shouldn't latency
| and power both be objective?
| 13of40 wrote:
| > something incomprehensible for humans
|
| This might be buggy whip talk, but I wonder if you could take
| the same system and apply it to smaller problems (e.g.
| computing an 8-bit hash) so the novel techniques could be
| identified and used by humans.
| thomasahle wrote:
| I'm disappointed a the hashing is just based on training on
| microbenchmarks and SMHasher, rather than designing a fast
| _provably_ universal hash. Suites like SMHasher are never
| complete. They are just trying to catch the most common
| weaknesses. If you train on the test cases you'll only get an
| algorithm that passes the tests, but people can always find a
| set of values on which you will do badly.
| eklitzke wrote:
| I think you're confusing proving that the hash function is
| collision resistant with the other goal which is hashing
| speed. If you really need a collision resistant hash you need
| to use a cryptographic hash function, but outside of
| cryptographic applications that is rarely the requirement.
| And (huge caveat, this isn't my domain expertise) I'm not
| sure what security properties are really "proven" about
| existing cryptographic hash functions, AFAIK existing
| cryptographic hash functions are considered secure because we
| don't know how to break them, not because of some fundamental
| mathematical property about them.
|
| For the other 99.999% of hashing applications there is a
| balance between collision resistance and hashing latency. For
| example, in a hash table (probably the most common use for a
| non-cryptographic hash function) there is a cost incurred by
| hash collisions because lookups on keys with collisions may
| have to do extra probing. On the other hand, _every_ hash
| table lookup requires doing at least one hash operation,
| regardless of whether or not it collides. So it may make
| sense to have a slightly worse hash function (in the sense
| that it is more likely to have collisions with pathological
| inputs) if it has slightly lower latency. The only way to
| really know what is faster for a real world application is to
| have some kind of benchmark to train against as a loss
| function.
| Someone wrote:
| > I'm not sure what security properties are really "proven"
| about existing cryptographic hash functions
|
| AFAIK, we don't even know whether trapdoor functions
| _exist_.
|
| https://en.wikipedia.org/wiki/Trapdoor_function:
|
| _"As of 2004, the best known trapdoor function (family)
| candidates are the RSA and Rabin families of functions"_
|
| Also note that the 'examples' section starts with:
|
| _"In the following two examples, we always assume it is
| difficult to factorize a large composite number (see
| Integer factorization)."_
| tga_d wrote:
| >If you really need a collision resistant hash you need to
| use a cryptographic hash function, but outside of
| cryptographic applications that is rarely the requirement.
|
| There are reasons to use (strongly) collision resistant
| hashes outside of cryptographic settings. E.g., the default
| Rust hash function, used in hash maps and sets, has strong
| collision resistance, because otherwise you could open up
| applications to DoS attacks (the attacker uses lots of
| inserts with collisions to kill performance of accesses and
| further inserts at those buckets).[0]
|
| >I'm not sure what security properties are really "proven"
| about existing cryptographic hash functions, AFAIK existing
| cryptographic hash functions are considered secure because
| we don't know how to break them, not because of some
| fundamental mathematical property about them.
|
| There are provably secure hash functions[1] (typically
| using the same sort of primitives as public key crypto),
| but they're generally only used when certain properties
| need to be composed, and are often less secure than the
| non-provable ones in practice anyway. This is pretty
| similar to the state of symmetric vs. asymmetric
| cryptography in general: primitives like RSA, DH, etc. have
| much stronger proofs than AES, but algorithms built using
| AES for security are generally viewed as a lot less likely
| to be broken any time soon than algorithms built using
| typical asymmetric primitives for security, even ignoring
| things like quantum advantage.
|
| [0] https://doc.rust-
| lang.org/std/collections/struct.HashMap.htm...
|
| [1] https://en.wikipedia.org/wiki/Security_of_cryptographic
| _hash...
| thomasahle wrote:
| > I think you're confusing proving that the hash function
| is collision resistant with the other goal which is hashing
| speed. If you really need a collision resistant hash you
| need to use a cryptographic hash function.
|
| I wish this misconception would die. There is a great
| theory of algorithmic probabilistic hash functions,
| completely distinct from cryptographic hash functions. If
| you are designing a hash table, or a different algorithm
| using a hash function, you nearly always want the former
| kind.
|
| The idea is that `Pr[h(x) = h(y)]` is small _no matter the
| inputs x and y_. Here the probability is over the random
| seed of h. Lots of good hash functions, like UMASH
| (https://engineering.backtrace.io/2020-08-24-umash-fast-
| enoug...) has this guarantee. Other fast hash functions,
| like MURMUR don't.
|
| When a function doesn't have this guarantee, it means I can
| find sets of values x1, x2, ... that will likely collide
| under _any_ or most seeds! Sure, if your inputs are
| basically random, this probably won't happen, but people
| can still use this to DDoS your hash table, or whatever you
| are coding.
|
| Notice again, this has nothing to do with cryptography. It
| is all about probabilistic guarantees. You can't just test
| the hash function on a fixed number of inputs and say it's
| good, since you may just have moved the "bad set" to
| somewhere else.
|
| In this day and age there are super fast algorithmic hash
| functions with guaranteed low expected collisions. It's
| just silly to use one that you can break so easily.
| jacquesm wrote:
| Indeed, and this has been the case for quite a while now. You
| can _always_ improve on some general algorithm by taking
| advantage of knowledge of the data but that never generalizes
| and usually leads to either worse performance on other data
| and /or new pathological cases that result in results that
| are unusable.
|
| It's an instance of overfitting.
| bee_rider wrote:
| Ship the optimization framework in with the application,
| sample from the user data, and optimize for that? It isn't
| overfitting if you overfit on the data you care about,
| right?
| [deleted]
| scns wrote:
| Sounds like the JVMs recompilation of hor paths to me.
| jacquesm wrote:
| Data tends to change over time, and once a hash function
| is in use you can't really replace it easily without a
| lot of overhead, possibly quite a bit more overhead than
| what you saved in the first place. There are some
| examples of this in the sorting arena too, such as
| 'Timsort', personally I haven't found any that gave a
| substantial boost, but probably there are some cases
| where they do. Unless sorting or hashing (and lookup) are
| the main bottleneck for an application I would spend my
| time on other aspects of it.
| cabalamat wrote:
| > proves the point that optimal programs are very inhuman
|
| Maybe there should be an AI that produces optimally-
| readable/understandable programs? That's what I would want if I
| was adding the output to a codebase.
| whatever1 wrote:
| Optimal routing of delivery vehicles for UPS/Fedex etc is
| also non-compressible to the drivers, so the planners often
| intentionally generate suboptimal solutions.
|
| A suboptimal implemented solution is a better than an optimal
| not implemented one.
| make3 wrote:
| that's what LLMs can with rl from human (or ai) readability
| feedback & instruction tuning + prompting. we will 100% see
| this if gpt-4 doesn't already do this.
| ska wrote:
| I wouldn't classify any of the output I've seen so far as
| "optimally readable/understandable".
|
| Some if it looks pretty ok, especially where it overlaps
| with well established approaches.
| chaxor wrote:
| It can do well with optimization and readability *if you
| ask it specifically for those things*. Especially if you
| have a particular paradigm and algorithm in mind (you
| obviously already should anyway).
|
| This is why these systems are helpful in programming:
| they allow developers to think more about the design
| paradigms, and algorithmic solutions, rather than the
| fine grained code syntax and typing.
|
| My hope (not prediction unfortunately, but *hope*) is
| that these systems will make people "better* programmers.
| This _could_ happen by alleviating the requirement of
| typing out the code in a particular way, and allowing
| more time to really try out or think carefully about the
| correct solution for how to make their programs (i.e.
| multiprocessed, producer-consumers, distribute data with
| ipfs, faster algorithms, etc)
| m3kw9 wrote:
| It's not that is written like obfuscated, the routine/ algo
| is just hard to understand even if they commented every line.
| Likely some recursive trick is involved, those are always
| hard to follow
| hajile wrote:
| To what extent is this simply working around the weirdness of
| x86? Do these improvements apply to something like MIPS, ARM64,
| or RISC-V that have inherently simpler ISAs?
| danlark wrote:
| In this particular case they were universal but in paper it's
| said the optimizations were done on x86. One of the ideas was
| to use LLVM IR but intuition for optimizer over optimizer was
| unlikely to work properly.
| ndesaulniers wrote:
| > but intuition for optimizer over optimizer was unlikely
| to work properly.
|
| Wut?
| orlp wrote:
| > AlphaDev uncovered new sorting algorithms that led to
| improvements in the LLVM libc++ sorting library that were up to
| 70% faster for shorter sequences and about 1.7% faster for
| sequences exceeding 250,000 elements.
|
| As someone that knows a thing or two about sorting... bullshit.
| No new algorithms were uncovered, and the work here did not
| _lead_ to the claimed improvements.
|
| They found a sequence of assembly that saves... one MOV. That's
| it. And it's not even novel, it's simply an unrolled insertion
| sort on three elements. That their patch for libc++ is 70% faster
| for small inputs is only due to the library not having an
| efficient implementation with a *branchless* sorting network
| beforehand. Those are not novel either, they already exist, made
| by humans.
|
| > By open sourcing our new sorting algorithms in the main C++
| library, millions of developers and companies around the world
| now use it on AI applications across industries from cloud
| computing and online shopping to supply chain management. This is
| the first change to this part of the sorting library in over a
| decade and the first time an algorithm designed through
| reinforcement learning has been added to this library. We see
| this as an important stepping stone for using AI to optimise the
| world's code, one algorithm at a time.
|
| I'm happy for the researchers that the reinforcement learning
| approach worked, and that it gave good code. But the paper and
| surrounding press release is self-aggrandizing in both its
| results and impact. That this is the first change to 'this part'
| of the sorting routine in a decade is also just completely
| cherry-picked. For example, I would say that my 2014 report and
| (ignored patch of) the fact that the libc++ sorting routine was
| QUADRATIC (https://bugs.llvm.org/show_bug.cgi?id=20837) finally
| being fixed late 2021 https://reviews.llvm.org/D113413 is quite
| the notable change. If anything it shows that there wasn't a
| particularly active development schedule on the libc++ sorting
| routine the past decade.
| throwaway106382 wrote:
| [flagged]
| dundarious wrote:
| Yes, I believe what they're doing already exists in the
| literature as "supercompilation", though good to see its
| application under any name.
| choppaface wrote:
| A decade ago Google had people with breadth and context who
| could have adjusted the framing of the result of this work (and
| maybe even re-targeted it yo something useful). Today however
| Google is a mix of hyper-narrow expert ICs and leaders who lack
| domain expertise. Paper in Nature? Sure let's take it!
| danlark wrote:
| I agree on the sorting front. Removing one cmov is not likely
| to improve much.
| orlp wrote:
| It's not even a cmov. Look at Figure 3 in the paper:
| https://www.nature.com/articles/s41586-023-06004-9
|
| They eliminated a register-register mov.
| scotty79 wrote:
| Shouldn't libc++ be kinda good? Since it's the standard and
| all? Why isn't/wasn't it?
| petschge wrote:
| The LLVM implementation of libc++ is very young compared to
| other implementations (it was started 5 years ago or so), so
| there is still a lot of things missing and a lot of things
| that can be improved.
| bigcheesegs wrote:
| libc++ was open sourced May 11th 2010. 13 years ago.
| https://blog.llvm.org/2010/05/new-libc-c-standard-
| library.ht...
| TaupeRanger wrote:
| This is DeepMind's modus operandi. Every press release is just
| utterly hyperbolic nonsense that doesn't withstand the
| slightest scrutiny. AlphaGo, AlphaFold, AlphaDev... they've
| done literally nothing to improve the human condition, and may
| have actually made things worse. Go players have DECREASED
| their Elo after playing AlphaGo (or just quit the game
| altogether). I would be embarrassed to be associated with
| DeepMind.
| vosper wrote:
| > Go players have DECREASED their ELO after playing AlphaGo
| (or just quit the game altogether)
|
| Can you explain this for someone unfamiliar with the game?
| schaefer wrote:
| Lee Sedol retired in 2019 following his 2016 defeat by
| Alpha Go in a 5 round match. At the start of the match most
| people were confident an AI could never defeat a top human
| player at Go. By the end of the match, watching (arguable)
| world champ Sedol suffer lost game after lost game the
| story had changed dramatically. Sedol fans were championing
| his single win against the unstoppable AI.
|
| We (hacker news) discussed Lee Sedol's retirement here: [1]
|
| To active go players at the time, Alpha Go and Alpha Zero
| really were as shocking as the debut of Chat GTP was
| recently.
|
| [1]: https://news.ycombinator.com/item?id=21649495
| dfan wrote:
| I am familiar with the game and I cannot explain it. Go is
| not typically rated with the Elo system, and the quality of
| top-level human play has increased since 2016.
| 29athrowaway wrote:
| That is correct but in game servers, ranks (e.g.: 1 kyu,
| 1 dan) are computed via Elo-like means.
|
| https://forums.online-go.com/t/how-does-the-rating-
| system-wo...
| nahstra wrote:
| I also play some, and yeah that's incorrect. Also, there
| was recently a bunch of hype about an adversarial
| strategy that could beat AIs w/o deep readout (i.e. only
| using the 'policy network' nonlinear ML stuff and not
| enough actual calculation). Here's a vid of an amateur
| doing it.
|
| https://www.youtube.com/watch?v=H4DvCj4ySKM
|
| EDIT:
|
| Also, if you want to get into it, Micheal Redmond's Go TV
| on youtube has an amazing beginner playlist, watch some
| of that then maybe blacktoplay.com and if you likey play
| :)
| scotty79 wrote:
| Maybe they just lost so their ELO went down? Or do loses
| against AI not count?
| rcxdude wrote:
| Either way, in ELO it depends on the ELO of your
| opponent, and so a loss against an AI with an accurate
| (very high) ELO is not going to lose you much unless
| you're one of the best players in the world (heck, in
| chess, Magnus Carlsen's ELO would still go down by
| literally nothing from losing against stockfish).
| programmer_dude wrote:
| BTW Elo is not an abbreviation it is a persons name:
| https://en.wikipedia.org/wiki/Arpad_Elo
| blazespin wrote:
| It's a bit surprising how poorly DeepMind has lived up to
| their hype. But they're an OK lab, maybe a bit overly vain.
| TaupeRanger wrote:
| It's probably very fun to be _at_ DeepMind, I just don 't
| think I'd want to be a part of the cringey hype machine.
| blazespin wrote:
| I bet it really sucks, tbh. They did all this over
| promising and now the only way they can deliver is by
| grifting like this. That sounds really stressful to me.
| lurknot wrote:
| ya bet it sucks making $500-800k/year comp getting access
| to the best hardware and google datasets
| blazespin wrote:
| I've seen people who get promoted above their band, they
| are not happy campers.
| jimsimmons wrote:
| Why not. It's not like they have deadlines
| 29athrowaway wrote:
| They still have to publish something in major journals
| and have presence in major conferences.
| jprd wrote:
| a.k.a. - they still care enough to have some semblance of
| shame. There's the rub.
| 29athrowaway wrote:
| At least they've produced tangible value unlike black
| holes of money like the Human Brain project which has
| delivered close to nothing in multiple decades despite
| billions of dollars in investment.
| chaxor wrote:
| This is unbelievably wrong. Deepmind has probably the best
| academic group in RL. The difference between Deepmind and
| OpenAI is that Deepmind favors academic endeavours and
| novelty much more, while completely ignoring any
| commercialization or products, while OpenAI is the stark
| opposite in that they almost entirely focus on products
| first, and typically their academic endeavours are just
| slight modifications of other works scaled up.
|
| Don't get me wrong, Sutskyver (et al) has done incredibly
| good work previously, but when it comes to the products,
| they're much more engineering and marketing polish than
| scientific endeavours.
|
| Botvinick's work on metaRL for example is an interesting
| direction that Deepmind has shown that few other companies
| that are only interested in engineering would venture
| towards.
| rcxdude wrote:
| I think this is a bit far in the other direction. Deepmind's
| stuff is often deeply impressive, they just have a tendency
| to exaggerate on top of that.
| jprd wrote:
| +1
|
| There must be a "demonstrate (1) DeepMind #Win per
| <interval>" requirement somewhere that gets the once-over
| from the marketing dept. to meet some MBOs.
| gdy wrote:
| "may have actually made things worse. Go players..."
|
| Go players are using AI to get better at Go.
| TaupeRanger wrote:
| That claim is often made, and never substantiated.
| gdy wrote:
| Just ask any professional Go player.
| digging wrote:
| It's made here in response to a claim that Go players are
| made worse by practicing with AlphaGo, which is also
| unsubstantiated.
| nahstra wrote:
| https://www.amazon.com/How-Play-Way-Explained-
| illustrative/d...
| 29athrowaway wrote:
| Around the time of the AlphaGo challenge and afterwards...
|
| 1) you could see increased activity in go clubs and online go
| servers
|
| 2) the analysis of the games published by Deepmind has
| resulted in interesting "discoveries" (or rediscoveries) and
| changes to what is considered joseki.
|
| 3) many people started analyzing their kifus using AI, to
| find fluctuations in estimated win rate across moves.
|
| So I disagree entirely
| malux85 wrote:
| I thought the same thing - it smacks of desperation at the
| moment, any tiny win is exaggerated .
|
| It's not hard to see why, with the emergence (ha) of OpenAI,
| Midjourney and all of this generative modelling, what has
| DeepMind done? I imagine the execs at Google are asking them
| some very probing questions on their mediocre performance
| over the last 5 years.
| chaxor wrote:
| Deepmind has done quite an enormous amount actually, but
| it's been in _academia_ not in the commercial product
| sphere. Just because something is not on a little web page
| available to average Joe 's does not mean there isn't value
| in it. For example, Deepmind's work towards estimating
| quantum properties of materials via density functional
| theory may not be the best toy for your grandma to play
| around with, but it certainly does move academia way
| further ahead of where it once was.
| malux85 wrote:
| I run atomictessellator.com and have been working on many
| different implementations of density functional theory
| for the last 10 years, as well as working closely with
| professors at Stanford university and Oxford university
| of using advanced, non-static geometry data structure for
| density functional theory for multiple years, this is a
| subject I know a LOT about, so I'm glad you brought it
| up.
|
| Deep minds work on Density functional theory was complete
| rubbish, and everyone in computational chemistry knows
| it. They simply modelled static geometry and overfit
| their data, we wanted this methodology to work, computing
| DFT is expensive, and we did multiple months of rigorous
| work and the reality of the situation is that a bunch of
| machine learning engineers with a glancing amount of
| chemistry knowledge, made approximations that were way
| too naive, and announced it as a huge success in their
| typical fashion.
|
| What they then count on is people not having enough
| knowledge of DFT / Quantum property prediction to query
| their work and make claims like "it certainly does move
| academia way further ahead" - which is total rubbish. In
| what way? Why aren't these models being used in ab initio
| simulators now? The answer to that is simple: they are
| not revolutionary, in fact they are not even useful.
| dr_dshiv wrote:
| How so? Legitimately interested.
| blazespin wrote:
| Yeah, this type of grifting is very proforma for Researchers. I
| used to stress about it, but realize it just sort of goes with
| the territory.
|
| It's also worth noting that the paper is blindingly obvious and
| everyone started doing this a long time ago but didn't want to
| tip their cards.
|
| And that's the real contribution here - Google is tipping their
| cards. We now have a rough baseline to compare our results
| against.
| ryao wrote:
| It is likely meant to help secure further research funds.
| SethTro wrote:
| LLVM merge: https://reviews.llvm.org/D118029 Benchmark:
| https://bit.ly/3AtesYf
|
| Benchmark seems to be in the range of a 1-5% improvement for 80%
| of sizes.
| andreamichi wrote:
| And for the hashing patch this is the commit to Abseil:
| https://github.com/abseil/abseil-cpp/commit/74eee2aff683cc7d...
___________________________________________________________________
(page generated 2023-06-07 23:00 UTC)