[HN Gopher] NNCP: Lossless Data Compression with Neural Networks...
       ___________________________________________________________________
        
       NNCP: Lossless Data Compression with Neural Networks (2019)
        
       Author : ToJans
       Score  : 372 points
       Date   : 2021-05-22 06:15 UTC (16 hours ago)
        
 (HTM) web link (bellard.org)
 (TXT) w3m dump (bellard.org)
        
       | GistNoesis wrote:
       | I'd like to highlight the Introduction of nmcp.pdf which explains
       | in a very compressed form how it works which encompass all the
       | magic.
       | 
       | "The lossless data compressor employs the traditional predictive
       | approach: at each time t, the encoder uses the neural network
       | model to compute the probability vector p of the next symbol
       | values t knowing all the preceding symbols s0 up to st-1. The
       | actual symbol value st is encoded using an arithmetic encoder
       | with approximately -log2 ( pst ) bits. Then the model is updated
       | knowing the symbol st. The decoder works symmetrically so there
       | is no need to transmit the model parameters. It implies both
       | encoder and decoder update their model identically.
       | 
       | When no preprocessing is done, st represents the byte at position
       | t. Hence there are Ns= 256 different symbol values from 0 to
       | Ns-1."
       | 
       | For those usually familiar with Huffman coding this is using an
       | Arithmetic coding https://en.wikipedia.org/wiki/Arithmetic_coding
       | 
       | It allows adaptive coding (i.e. changing the probabilities of
       | symbols dynamically). Here these probabilities are modelled
       | dynamically using a neural network.
       | 
       | The magic is that the neural network parameters are defined
       | implicitly : You don't need to transmit the neural network
       | parameters, therefore you can use as big as you want neural
       | network.
       | 
       | During the decoding the from scratch network is continuously
       | trained using the freshly decoded data, in the same way that it
       | was during the encoding. The more data you compress, the more you
       | train the internal neural network parameters, and the better the
       | prediction for next character gets.
       | 
       | You decode 20 bytes, you update the model with this new data, you
       | decode 20 new bytes with the new model, you update the model,...
       | , every 1000000 bytes you can even update your model multiple
       | times using all currently available data to make the network
       | converge faster.
       | 
       | Of course this only works if everything is exactly deterministic,
       | which Fabrice Bellard took great effort in guaranteeing, which is
       | no small engineering feat.
        
         | meiji163 wrote:
         | I was just working on a similar project and didn't think you
         | could do decode without sending weights, amazing!
        
         | debbiedowner wrote:
         | What is the probability that there are no collisions by the
         | encoder?
        
           | labawi wrote:
           | What do you mean by collisions?
           | 
           | As described, arithmetic coding does a 1:1 mapping of input-
           | output, while NN deterministically generates probability
           | distribution for the arithmetic coding. Conceptually, it
           | would work with any deterministic / reproducible function,
           | even a PRF or a running hash, though the point of NN is to
           | give good distribution estimates for interesting inputs, thus
           | short output from the arithmetic encoder.
           | 
           | Presumably, there are some limited choices during coding,
           | which then need to be encoded as well.
        
           | pkhuong wrote:
           | 100%.
        
         | threatripper wrote:
         | Is the model actually trained (=updating the weights instead of
         | just the state) during en/decoding?
        
         | rakmos wrote:
         | The concept is rather intriguing in terms of having a
         | compression algorithm that evolves towards an ideal
         | optimization asymptote. I, for one, would be annoyed at the
         | thought that the compression of an identical artifact might
         | result in a different compression size output as it would seed
         | doubt as to whether they actually had the same input.
        
           | TD-Linux wrote:
           | As long as the initial state of the network is fixed, the
           | compressed output will be identical every time.
        
       | esjeon wrote:
       | FYI, the compression speed numbers from the paper:
       | gzip = 17 MB/s         xz = 1 MB/s         CMIX = 1.6 kB/s
       | NNCP = 2~3 kB/s
       | 
       | (Note that NNCP is using RTX 3090 here.)
       | 
       | Not only this is unacceptable for hot/warm data, also it's still
       | impractical for cold data. It takes 10~16 years to compress or
       | decompress 1 TB data.
        
         | vanderZwan wrote:
         | It does make me wonder: if we wrote an LZ77-based algorithm
         | (like gzip and xz) with that time-budget in mind, could we
         | match NNCP?
        
           | user-the-name wrote:
           | No. People are pushing the envelope for these algorithms
           | constantly, as part of the various competitions in data
           | compression, and they don't get anywhere close.
        
           | pornel wrote:
           | I don't know enough about xz, but for gzip I can tell you
           | that definitely not. Even with an infinite amount of
           | computation there's a hard limit to how good gzip can
           | compress, and it's not very good.
           | 
           | For example, let's say you've decoded a stream that so far
           | contained:
           | 
           | 123456
           | 
           | a clever algorithm can make a guess that "7890" will be next
           | (compression is all about making good guesses about the
           | future data). LZ77 just can't -- it has no way to express
           | such prediction. This is general problem for LZ77 with data
           | that has a predictable structure, but isn't a literal
           | repetition of previous data.
        
         | jopsen wrote:
         | It's still an amazing research result.
         | 
         | Demonstrating a novel compression scheme with a high
         | compression ratio is worthwhile. It might inspire something
         | else.
        
       | fabioyy wrote:
       | Fabrice Bellard is a legend!
        
       | telesilla wrote:
       | Could this address lower file sizes for lossless video and audio?
        
         | riffraff wrote:
         | It seems too slow for real time decompression
        
           | telesilla wrote:
           | Real time decompression isn't the only requirement, for
           | example media storage has massive needs now studio production
           | is moving to the cloud. If a new lossless codec is possible
           | even if it needs a day to compress or decompress there are
           | still huge benefits in some applications.
        
         | layoutIfNeeded wrote:
         | No.
        
       | legulere wrote:
       | I often wonder where Fabrice Bellard finds the time for all of
       | his projects
        
         | alecco wrote:
         | If you remove all the time wasters and you do strict 40h of
         | work you'd be surprised how much time you have left.
         | 
         | Remove wasting time on HN/reddit/watercooling and then
         | compensating by working more hours, and cut the "cord" for
         | all/most entertainment. Instead, read technical stuff. Even if
         | you do it for just a couple of hours a day you'll leave most
         | people behind. Same with programming. Ideally do it in the
         | morning _before_ work. So go to sleep very early.
        
           | ZephyrBlu wrote:
           | For me, I think social media on the whole quite has been
           | quite valuable use of time. It feels less like a waste and
           | more like research or learning.
           | 
           | The serendipity of HN, Reddit and Twitter have exposed me to
           | a lot of ideas and information I would have otherwise never
           | known about.
        
             | opsy2 wrote:
             | Maybe it's an explore/exploit thing, social media is good
             | for exploring but to build depth we should switch off the
             | socials and hit the books every other week.
        
             | alecco wrote:
             | Sure. I *try* to limit my HN/Wikipedia/etc usage because
             | it's interesting but it gets me very distracted from my
             | goals. For example, my monkey brain hears a song and I
             | suddenly _need_ to get the lyrics, the story behind it,
             | etc. The monkey brain is very creative at finding ways to
             | avoid doing difficult tasks.
             | 
             | Reddit and Twitter are mostly a waste of time nowadays due
             | to the signal to noise ratio. I dropped my reddit addiction
             | and I'm working on eliminating Twitter.
             | 
             | I wish there were a social network system where people pool
             | and share what they are expected to be doing and help each
             | other. Or at least encourage each other to focus. But this
             | goes against the advertisement paradigm where they want us
             | to be impulsive and stupid. Sad, sad, sad.
        
               | ZephyrBlu wrote:
               | I think that a lot of serendipitous moments created by
               | social media have actually helped me move towards my
               | goals.
               | 
               | Not directly, but by giving me new perspectives and ideas
               | that bounce off my current perspective and ideas to
               | create new ways of thinking about things.
               | 
               | My perspective now is that too much focus is a bad thing
               | because it comes at the expense of broadening your
               | perspective.
               | 
               | Reddit I agree is low signal, but I think Twitter is high
               | signal depending on who you're following.
        
               | alecco wrote:
               | > Twitter is high signal depending on who you're
               | following.
               | 
               | The signal might be good but is it the signal you should
               | be focusing on? There is an amazing amount of interesting
               | stuff being discovered all the time. But is it helping
               | you achieve concrete goals?
               | 
               | With Twitter I see a trap. Say you work on
               | implementing/using <Knowledge Domain X> and follow
               | exclusively people doing similar work. They will also
               | share things outside this domain. Since 2016 almost all
               | science twitter users became highly political. And there
               | is no way to filter this out.
               | 
               | This is a feature for Twitter and the like. They profit
               | by making you emotional and going a rabbit hole of
               | impulsiveness.
        
       | ballenf wrote:
       | Could this work on an M1, if optimized for its ML cores?
        
       | Bulat_Ziganshin wrote:
       | There is long NNCP thread on the forum dedicated to compression
       | algos:
       | 
       | https://encode.su/threads/3094-NNCP-Lossless-Data-Compressio...
        
       | londons_explore wrote:
       | Fabrice Bellard picks up one more side project, and beats state
       | of the art in text compression by a substantial margin...
        
         | rhaps0dy wrote:
         | It's a very cool achievement (a NN in 200KB, to compress? Wow)
         | but the CMIX compression program performs better in the 1st
         | benchmark, and is competitive in the 2nd.
         | 
         | So he does not beat the SoTA "by a substantial margin". Your
         | comment is overstating the case.
        
           | rndgermandude wrote:
           | True.
           | 
           | Aside from the compression efficiency, I'd like to see other
           | metrics too, like the work it takes to compute the
           | compression (in units of energy, time, or whatever), to get
           | more of a sense on what kind of workloads using something
           | like that would make sense. These super-efficient compressors
           | often only make sense in long term archival and compress-
           | once-transmit-often scenarios.
           | 
           | It appears http://www.mattmahoney.net/dc/text.html (that
           | Bellard links to on the NNCP page) has some information:
           | 
           | - NNCP (w/GPU) used 161812 nanoseconds/byte to compress,
           | 158982 nanoseconds/byte to decompress, on an Intel Xeon
           | E3-1230 v6, 3.5 GHz, RTX 3090 GPU, using about 6GB of RAM, to
           | compress enwik9 down to 110,034,293 bytes
           | 
           | - CMIX used 602867 nanoseconds/byte to compress, 601569
           | nanoseconds/byte to decompress, on an Intel Core i7-7700K, 32
           | GB DDR4 using about 25GB of RAM, to compress enwik9 down to
           | 115,714,367 bytes
           | 
           | - gzip -9 used 101 nanoseconds/byte to compress, 17
           | nanoseconds/byte to decompress, on an... Athlon-64 3500+ in
           | 32-bit mode running XP, to compress enwik9 down to
           | 322,591,995 bytes
           | 
           | - 7zip 9.20 used 1031 nanoseconds/byte to compress, 42
           | nanoseconds/byte to decompress, on a Gateway M-7301U laptop
           | with 2.0 GHz dual core Pentium T3200 (1MB L2 cache), 3 GB
           | RAM, Vista SP1, 32 bit, to compress enwik9 down to
           | 227,905,645 bytes
           | 
           | These benchmarks are not directly comparable, as vastly
           | different machines were used, but it still gives a trend at
           | the very least.
           | 
           | As expected, these super-efficient algorithms are slow,
           | really slow. Honestly, I'd think at least today too slow to
           | make sense even for long term archival, and far too slow for
           | compress-once-transmit-often scenarios, as the decompression
           | rate is almost as slow as the compression rate.
           | 
           | Still, it's impressive work, don't get me wrong.
        
       | rogerb wrote:
       | Every time I see a Fabrice Bellard submission here - it's like a
       | little cake that you found in your fridge unexpectedly. Great
       | stuff!
        
         | mercer wrote:
         | It's like checking out the freezer and realizing you bought ice
         | cream a while back.
        
       | etaioinshrdlu wrote:
       | I suspect his neural net library used in this could become quite
       | useful and propagate far and wide if it were open-sourced:
       | https://bellard.org/libnc/
        
         | justinclift wrote:
         | For reference, this is what it's License section says:
         | The LibNC library is free to use as a binary shared library.
         | Contact the author if access to its source code is required.
         | 
         | https://bellard.org/libnc/libnc.html#License
        
         | xxs wrote:
         | Compression ratio/speed                 Program or model
         | size: bytes    rato: bpb  speed: KB/s       xz -9
         | 24 865 244     1.99       1020       LSTM (small)         20
         | 500 039     1.64       41.7        Transformer          18 126
         | 936     1.45       1.79 (!)       LSTM (large2)        16 791
         | 077     1.34       2.38
         | 
         | Note the decompression is not faster than compression (unlike
         | xz)
        
           | bckr wrote:
           | Okay. Stupid question. What in the world do the ratio numbers
           | mean? I get that "bpb" means "byte-per-byte". Is it input
           | bytes per output bytes? Or the other way around? And why do
           | some of the ratios go below 1.0? e.g. NNCP v2 (Transformer)
           | _0.914_
        
             | xxs wrote:
             | bpb - it's bits per byte, e.g 8/bpt is the compression
             | ratio.
        
             | IcePic wrote:
             | bits per byte?
        
         | londons_explore wrote:
         | It's a lightweight ML library... But I'm not sure if it makes
         | sense for anything with CUDA as a dependency to be lightweight.
        
       | tromp wrote:
       | Compression achieved on the first billion bytes of English
       | Wikipedia is only 0.88 bits per byte (bpb). Even when the
       | decompressing program is large at almost 200KB, that is... wow!
        
         | londons_explore wrote:
         | Note that it is an XML formatted version of Wikipedia, so some
         | of that data is very predictable XML tags.
        
           | IcePic wrote:
           | True, but that would hold true for all common compressors, so
           | beating them is still a huge win.
        
       | smel wrote:
       | Guys are you sure that bellard is human ?
        
       | agustinbena wrote:
       | Is there a church to go and pray God Bellard
        
         | fit2rule wrote:
         | You're in it.
        
       | kzrdude wrote:
       | I hope he's eligible for the Hutter prize for this improvement,
       | should be a pretty nice sum.
        
         | yalok wrote:
         | True story: long time ago, I was once interviewing a guy, and
         | wanted to check his coding skills. Asked him to write some
         | simple C function, doing some simple algorithm. Let the guy use
         | my PC as I watched him (did I say it was very long time ago?).
         | To my surprise, instead of writing C code right away, the guy
         | went to Far on my PC, with a blink of an eye, with hot keys,
         | found some existing files on my disk, glanced through them,
         | returned to file editor and wrote that function.
         | 
         | When I asked what that was, why did he have to dive into my C
         | files, he admitted he doesn't care to remember C syntax, and
         | just had to look it up quickly to write his function.
         | 
         | It turned out he was mostly writing all of his code in x86
         | assembly, and a pretty complex code. His main project in
         | assembly was a compressor that was faster and more efficient
         | than RAR, which at that time was at the top. I tried it and
         | that was true.
         | 
         | Many year later, the guys went on to win Hutter prize, and kept
         | improving his own record year after year. I admire his grit and
         | persistence, and how he keeps discovering new ways to improve
         | his algorithms. And I wouldn't be surprised if he maintains
         | them in pure assembly (he did originally).
         | 
         | I did hire the guy, and enjoyed working with him.
        
           | ZephyrBlu wrote:
           | So this guy was Alexander Rhatushnyak?
        
         | londons_explore wrote:
         | Sadly not because it only works well on enwiki9 but not the
         | smaller enwiki8.
         | 
         | I would guess that's down to neural networks poor performance
         | with small training sets.
        
           | userbinator wrote:
           | According to http://prize.hutter1.net/#prev he might actually
           | be eligible.
        
             | kzrdude wrote:
             | GPUs are excluded, unfortunately.
        
               | amelius wrote:
               | Anything a GPU can do can be done on a CPU, albeit
               | slower.
        
               | kzrdude wrote:
               | Uh.. not so great, this algorithm is already _very_ slow
               | :)
        
       | alecco wrote:
       | This is very interesting but for people who are not from the
       | field note                   1. The compressors he is comparing
       | to *are* neural networks themselves (PAQ)         2. He is using
       | a rather small corpus enwik8 (100,000,000 bytes), instead of the
       | common enwik9         3. He is not publishing the amount of RAM
       | or hardware he used
       | 
       | http://mattmahoney.net/dc/dce.html (seems down)
       | 
       | https://web.archive.org/web/20210216231934/http://mattmahone...
       | 
       | It seems Bellard is starting to play with these. I bet he will
       | find interesting things and I would love to see him interact in
       | the Russian compression webforum (please don't link). Big names
       | in compression like Yann Collet, Matt Mahoney, and Jarek Duda
       | used to hang out there.
        
         | SahAssar wrote:
         | The linked page has results for both enwik8 and enwik9, right?
        
           | alecco wrote:
           | Don't know how I missed enwik9. Maybe too early for me.
           | -\\_(tsu)_/-
        
       | mpfundstein wrote:
       | Bellard is so awesome.
       | 
       | But can someone please explain to me how this guy finds the time
       | to build all those amazing things? It's so much non trivial stuff
       | in his list. I wonder... Does he have kids? A family? Is he rich?
       | Does he get millions to build quantum computers? I really wanna
       | know :-)
       | 
       | Thanks
        
       | TynanSigg wrote:
       | Oh hey I did something like this a while ago, but much less
       | polished. I basically made an rANS encoder and hooked it up to
       | GPT-2 (via the excellent transformers library from HuggingFace)
       | to see what compression ratio I could get. Bellard's approach
       | seems to use a custom model, which is probably advisable as big
       | language models like I used are _very_ slow. I didn 't try to
       | optimize much and only ever tested on my little personal laptop
       | with just CPU, but I only got encoding speeds of ~0.1 Kb/s
       | (limited mostly by available memory). However I was able to get
       | very good compression factors of 6.5x using standard GPT-2 and
       | 7.3x using the medium version.
       | 
       | Here's the code if anyone's interested:
       | https://github.com/trsigg/lm-compression . I haven't looked at it
       | in a while except for updating it today to work with more recent
       | versions of the transformers library, but I think it should still
       | function and is pretty easy to try out with different models. I
       | thought about using something like a hidden Markov model instead
       | which would be much faster (even more so because it would enable
       | using tabled rather than range-based ANS), but I haven't gotten
       | around to it.
        
         | meiji163 wrote:
         | Using a pretrained model is a little bit cheating, you would
         | have to include the size of the weights in the ratio
        
           | TynanSigg wrote:
           | Those two metrics will converge as the size of the text being
           | compressed goes to infinity. It's necessary to include model
           | size for things like the Hutter prize that involve
           | compressing a fixed text (to avoid hard-coding) but isn't
           | usually a useful metric for compression, especially because
           | it will cause the compression ratio to depend on the size of
           | the data being compressed.
           | 
           | Edit: I thought the model in the above paper was pre-trained,
           | but apparently it's only trained on the data as it arrives!
           | That's indeed a very interesting approach, I wouldn't have
           | expected that neural network models would converge quickly
           | enough for it to be useful.
        
         | TynanSigg wrote:
         | After a bit more digging, it seems Bellard also has a version
         | using GPT-2: https://bellard.org/libnc/gpt2tc.html Looks much
         | cleaner and faster than my little winter break project, cool to
         | see this implemented properly!
        
       | amcoastal wrote:
       | This seems like an old school and specific way of describing a
       | much larger goal using NNs and auto encoders. To take data and
       | reduce its dimensionality, while getting a (near) perfect
       | reconstruction of the original input from the latent space data.
       | Reducing (haha pun) this broad effort to just "data compression"
       | leaves a lot of nuance and use out of sight.
        
       | kdmdmdmmdmd wrote:
       | This is some of the dumbest shit I've ever seen. There's
       | literally no qualified people commenting on this board any
       | longer. If you have any idea what the fuck neural network is or
       | how compression works you'll know why this is a bad idea I would
       | elaborate but I'd probably get banned for eing negitive
        
       | kdmdmdmmdmd wrote:
       | Lol dumb
        
       | dariosalvi78 wrote:
       | When I was a student at the university I thought it could be
       | possible to combine quantisation used in lossy compression like
       | JPEG and NN in some way..
        
       | ZephyrBlu wrote:
       | So what's the Weissman score of this baby?
        
         | xxs wrote:
         | Extremely low - it's a very slow algorithm. Think of xz being
         | slow and takes forever... that's 3 orders of magnitude worse.
        
           | linkdd wrote:
           | In the TV show "Silicon Valley", the Weissman score measure
           | the compression, not the speed.
           | 
           | In reality, the Weissman score does not exist.
           | 
           | I highly recommend this serie if you haven't seen it :)
        
             | xxs wrote:
             | >the Weissman score measure the compression, not the speed
             | 
             | This is false, it uses both - hence the 'middle out' was
             | suitable for video chat and all kind of enterprise software
             | (incl. a custom appliance box). Overall - a be it all
             | compression.
             | 
             | The score is so popular, it has its own wikipedia page[0].
             | _It compares both required time and compression ratio of
             | measured applications, with those of a de facto standard
             | according to the data type._
             | 
             | [0]: https://en.wikipedia.org/wiki/Weissman_score
        
       | merricksb wrote:
       | Original 2019 discussion for those curious:
       | 
       | https://news.ycombinator.com/item?id=19589848
        
       | eps wrote:
       | Compression speed is around 2 KB/sec.
       | 
       | The figure is given in nncp_v2.1.pdf. This is 1000x slower than
       | xz, which in turn is 1000x slower than gzip.
       | 
       | I have always wondered if a compression _service_ could be a
       | viable commercial offering and with NNCP rates and its
       | compression speed, it looks like it could! That 's for cases when
       | you need to _really_ compress the heck out of something, e.g.
       | prior to distributing a large static piece of data to a very
       | large number of recipients. I think there was a service like
       | this, used by AAA game developers for distributing patches, but
       | it didn 't take because it had a very slow decompression... and,
       | now reading further through the paper, it looks like an issue
       | with NNCP too -                   The NNCP v2 decompression speed
       | is similar to its compression speed.
       | 
       | So it's a no-go for my brilliant and lazy startup idea, but
       | fascinating stuff nonetheless.
        
         | m463 wrote:
         | I think maybe matching the compression speed with the network
         | speed is what you want to do.
         | 
         | In a place with ubiquitous fast broadband, a compromise with
         | regard to decompression speed might be ok.
         | 
         | If you're sending something to mars... 2 kb/sec might make
         | sense.
        
           | xxs wrote:
           | >If you're sending something to mars... 2 kb/sec might make
           | sense.
           | 
           | Keep in mind that 2KB/s is on something like 3090...
           | Massively power hungry even on the decompression.
        
           | loeg wrote:
           | > I think maybe matching the compression speed with the
           | network speed is what you want to do.
           | 
           | You might be interested: Zstandard has an adaptive mode that
           | attempts to do this.
        
         | agumonkey wrote:
         | What about decompression speed ?
        
         | xxs wrote:
         | Yup, it isn't a dictionary based compression. Nonetheless
         | that's a fascinating take
        
           | thesz wrote:
           | With preprocessing it _is_ a dictionary based compression. It
           | uses a fixed dictionary to improve on compression.
        
         | wolf550e wrote:
         | This is what the games industry uses in practice. It is
         | optimized for fast decompression.
         | http://www.radgametools.com/oodle.htm
        
         | AtlasBarfed wrote:
         | Hm, there are likely multiple large files that exist that are
         | very common (DLLs, versions of shared libs, etc etc etc) that
         | might be frequently compressed (SCP, stored in SSDs, etc) since
         | disk space isn't free yet last I checked.
         | 
         | Each file is really its own special theoretical multivariate
         | space to find the best theoretical compression ratio. Usually
         | finding the optimal or near-optimal compression is the hardest
         | computationally.
         | 
         | And it's basically a perfectly cacheable computation if you
         | have some sort of database mapping algorith + parameters to
         | apply to a given file.
         | 
         | Decompression usually isn't as big a deal for performance.
         | 
         | Anyway an online database for a service that has actively tried
         | to take a known common file (match on filesize, hash, etc) down
         | to its best compressed representation so 1) we don't have to do
         | it every time.
         | 
         | Cloud providers would presumably love these, since S3 probably
         | has millions copies of many files and would 1) get to compress
         | the file and 2) get to share its representation across all
         | instances.
        
         | raverbashing wrote:
         | For real
         | 
         | I know shipping speeds are important, but it seems game
         | developers could try harder with delta compressing their
         | updates, because it's hard to believe a tens-of-GBs download is
         | the best they could do when you already have a BluRay disk or
         | something (unless it's new content, of course)
         | 
         | I wonder how much their bandwidth costs are and how quickly a
         | reduction would pay itself.
        
           | tomalpha wrote:
           | Does DRM contribute to this perhaps?
           | 
           | Given the layers of encryption and obfuscation, and leaving
           | aside the merits or otherwise of DRM in the first place, I
           | can imagine it could be hard to do delta patching. At least,
           | without making it easier to crack the DRM - e.g. you'd
           | presumably have to decrypt the asset that needed patching,
           | apply the delta, and re-encrypt/re-sign it all client side.
        
           | xxs wrote:
           | What would pay for itself is p2p, e.g. bittorrent... alas.
        
             | raverbashing wrote:
             | p2p wouldn't solve the main issues, which are excessive
             | download times and poor user experiences for upgrades.
        
             | lxgr wrote:
             | Blizzard used to use Bittorrent in some of their
             | installers/updaters a while ago.
             | 
             | I'm not sure how viable this is today, or ever was: Many
             | residential connections are heavily asymmetric. Players
             | presumably aren't too excited either by the idea of
             | continuing to seed while they already play online, with
             | shitty ISP-provided routers suffering hundreds of
             | milliseconds of bufferbloat whenever the uplink is
             | saturated.
        
               | xxs wrote:
               | >Blizzard used to use Bittorrent in some of their
               | installers/updaters a while ago.
               | 
               | Yes, WoW used to have bit-torrent based update.
               | 
               | >Many residential connections are heavily asymmetric.
               | 
               | this is a US centric-view, though. For instance I have
               | 300Mb up/down. At least for most of the Europe (and I
               | suppose Asia), it's pretty common not to be restricted in
               | any way.
               | 
               | I could leave a bit-torrent client non-stop and won't
               | feel it. Like I've said: alas. There is some stigma
               | associated with p2p/torrents.
        
               | dividuum wrote:
               | Not sure where you are in Europe, but in Germany you have
               | to search a bit to find a residential ISP with symmetric
               | bandwidth. Mine is 50/10.
        
               | lxgr wrote:
               | > this is a US centric-view, though
               | 
               | I live in Europe and I have a 1000/0.5 Mbit/s connection.
        
               | labawi wrote:
               | That seems very atypical. As the link capacity ratio goes
               | past 10:1, even for typical dowloads, speed starts to be
               | limited by the ability to send requests, or even TCP
               | ACKs.
               | 
               | I've seen 20:1 offers on cable providers, but it's mostly
               | marketing BS, as you're unlikely to get full download
               | speed in most circumstances due to upload limitations.
        
               | cinntaile wrote:
               | This is not the standard at all in Europe, maybe that's
               | the standard in your specific country?
        
               | xxs wrote:
               | I have lived in quite few countries, travelled as well.
               | Someone mentioned Germany (and Bavaria) - indeed, it had
               | one of the worst inet couple of years back. OTOH, Austria
               | was totally fine.
        
               | cinntaile wrote:
               | Your claim was that the lines were symmetric and
               | "unrestricted". Neither are the standard unfortunately.
        
               | lxgr wrote:
               | "It's fine" is a different statement from "up- and
               | downlink bandwidths are heavily asymmetric", no?
               | 
               | While asymmetric connections make sense for many
               | residential use cases, P2P is not one of them.
        
               | Bayart wrote:
               | Asymmetry is the rule in my country for both copper and
               | fibre lines, although it's less pronounced with fibre.
               | Symmetric lines are a specific business option.
        
               | paavohtl wrote:
               | I have very rarely seen symmetric broadband plans
               | available in Finland. In the past I've had 10/1, 24/1,
               | 100/10 and now I have 1000/100. With ADSL connections
               | being asymmetric was basically necessary because the
               | bandwidth is so limited - I think most consumers would
               | prefer 24/1 over 12/12. With fiber I think it's more
               | about market segmentation: ISPs want businesses to buy
               | significantly more expensive symmetric plans.
        
               | patrakov wrote:
               | > At least for most of the Europe (and I suppose Asia),
               | it's pretty common not to be restricted in any way.
               | 
               | In the Philippines, 15 Mbps down / 3 Mbps up ADSL with a
               | static public IP is not something that a foreigner could
               | easily buy. Globe Telecom in Cebu asked for evidence that
               | I have sufficient income to pay for this (2600 PHP per
               | month), and their rules only allow documents from the
               | Philippines, so my bank statement was rejected. I had to
               | talk to their higher-level manager to negotiate an
               | exception. And their default router has 2 seconds of
               | bufferbloat, so I bought a TP-Link Archer C7 and
               | installed OpenWRT to compensate.
        
           | cinntaile wrote:
           | Would it even pay for itself? It seems to me like bandwidth
           | is pretty cheap compared to computation costs so why bother
           | trying to squeeze everything out. Just do some basic
           | compression and you're good to go. Buy a bigger SSD if
           | necessary, with technologies like reBAR compression just
           | slows down a game anyway.
        
             | literallycancer wrote:
             | As far as cloud providers go, bandwidth is where the lock
             | in is. It's priced to make it too expensive to switch to a
             | competitor.
        
         | kitd wrote:
         | _Compression speed is around 2 KB /sec._
         | 
         | Apologies if this is a noob question, but could you improve it
         | with training?
        
           | esquire_900 wrote:
           | > Apologies if this is a noob question, but could you improve
           | it with training?
           | 
           | Considering all other things like GPU, code quality etc.
           | equal, the speed here mainly depends on the size of the
           | network. Larger networks result in better performance
           | (logarithmic), but also slower speeds (linear). There are
           | some tricks like pruning to create significantly smaller
           | networks with similar-ish performance, but good models are
           | often large.
           | 
           | Unrelated to your question, what nobody mentions is that the
           | decompression side also needs to complete network. With 57M
           | and 187M parameters, those files are going to be quite large,
           | > 100MB for sure. That completely annihilates the performance
           | wins for 1 time transfers.
        
             | 317070 wrote:
             | No, as I understand it, the network does not need to be
             | transmitted. The decoder learns on the same stream as the
             | encoder, so you do not need to transmit network parameters.
        
               | ironSkillet wrote:
               | If I compress a large file at location A using this
               | algorithm, and want to send it to location B, how does
               | location B know how to decompress it?
        
               | clavigne wrote:
               | see the (current) top comment
               | 
               | https://news.ycombinator.com/item?id=27244810
               | 
               | it's a very clever use of symmetry.
        
               | vidarh wrote:
               | It's clever, but pretty much "standard" in compression.
               | Earliest I'm aware of that used symmetry in the encoder
               | and decoder to prevent explicitly transferring the
               | parameters this way was LZ78 (Lempel, Ziv; 1978), but
               | there could well be predecessors I'm not aware of.
               | 
               | LZ78 used it "just" to build a dictionary, but the
               | general idea of using symmetry is there.
               | 
               | (A fascinating variant on this is Semantic Dictionary
               | Encoding, by Dr Michael Franz in his doctoral
               | dissertation (1994), that used this symmetry to compress
               | partial ASTs to cache partially generated code for
               | runtime code generation)
        
               | e12e wrote:
               | See https://news.ycombinator.com/item?id=27244810
               | 
               | As I understand it, when starting up (beginning of
               | stream) - the network does not compress, then guesses the
               | next byte at every turn _determininstically_ from input
               | so far.
               | 
               | Decoding can "read" the fist byte, then proceed to learn
               | and guess the second byte and so on.
        
               | Yoric wrote:
               | Basically the same way your favorite codec knows how to
               | decompress. It builds the predictor (in this case, a NN)
               | from the known input. The predictor then assigns a symbol
               | (or a series of symbols) to the next few bits - higher
               | probabilities need fewer bits (I haven't checked for this
               | specific technique, but there are techniques that don't
               | even require an integer number of bits, e.g. range
               | encoding).
        
       | rurban wrote:
       | paq also has a similar NN encoder, but a much better and much
       | faster one. In asm though, but open source. LibNC is closed
        
       | ivoras wrote:
       | Did anyone manage to use the provided Windows binary to
       | succesfully compress anything? For me, it exits producing a
       | 0-byte "compressed" file.
        
         | cylon13 wrote:
         | That's impressively small!
        
           | drran wrote:
           | Yep, but file metadata still takes some space.
        
         | thesz wrote:
         | It needs AVX-512.
        
       | logicchains wrote:
       | Fabrice Bellard is proof that 10x programmers exist. Or even more
       | than 10x; he single-handled wrote a software 4G LTE base station
       | (https://bellard.org/lte/), something that would normally take a
       | team of hundreds to develop due to the complexity of the
       | specification.
        
         | [deleted]
        
         | willvarfar wrote:
         | Fabrice is fantastic!
         | 
         | There are other programmers at his level, or with the potential
         | to be at his level, splattered all over the industry.
         | 
         | Unfortunately, they 'waste' their time on their day job stuff,
         | often having to do things badly because technical approaches
         | are prescribed by others etc.
         | 
         | Many of us meet truly great programmers, but so few of them
         | actually demonstrate and express it. They just grin when asked
         | to implement micro services and add some banal features to the
         | website etc.
        
         | vanderZwan wrote:
         | He's pretty much the programmer's John von Neumann of our time,
         | isn't he? Or at least one of the few contenders for that title
        
           | amelius wrote:
           | Von Neumann was more a mathematician/scientist than an
           | engineer.
        
             | vanderZwan wrote:
             | Well yes, but a 10x mathematician/scientist. Plus important
             | enough to the field of computing to be comparable I think
        
               | [deleted]
        
         | yalok wrote:
         | Totally. Plus FFmpeg...
        
           | logicchains wrote:
           | And QEMU...
        
             | sugarkjube wrote:
             | And quickjs
        
               | mobilio wrote:
               | And LZEXE!
        
               | zserge wrote:
               | And tcc/otcc
        
           | mercer wrote:
           | I love how FFmpeg is way down on his homepage. Meanwhile,
           | even non-programmers in my world know and use it.
        
         | Y_Y wrote:
         | I think hero-worship in any field is weird and unhealthy. That
         | said, Bellard is astonishingly talented and prolific and it's
         | hard to look at someone like that and say he's just a mortal
         | who tries his best to write code, just like everyone else.
         | 
         | So what is it that causes this big gulf in productivity? I know
         | that as a programmer I have all sorts of internal and external
         | obstacles to just sitting down and churning out entire
         | projects, let alone something useful with clever and high-
         | quality code. Is a 10x rockstar just somebody who's been able
         | to eliminate most of their obstacles, and has some amount of
         | talent left over to solve problems with?
        
           | slim wrote:
           | Why is hero-worship is unhealthy? I thought it's the contrary
        
             | user-the-name wrote:
             | Because it decreases self-confidence and initiative. It
             | makes you think of the things that the "hero" does as
             | exceptional and out of your reach, instead of something you
             | could do as well if you put your mind to it properly.
        
               | zaphirplane wrote:
               | Interestingly enough seeing powerful role models inspired
               | schwarzenegger, bolt martial artists, poets, architects
               | even Tom cruise inspired Air Force pilots
        
               | opsy2 wrote:
               | It seems small, but it's a big distinction between hero
               | worship and role model!
               | 
               | 'Hero Worship' is about elevating someone to a
               | categorically higher level. This can lead to thinking you
               | could never be as good, taking everything they say as
               | gospel, defending even their poor opinions, etc.
               | 
               | If we -worship- 10x developers, you can maybe better see
               | how with the emphasis on that word things can get pretty
               | toxic pretty quick. For example, I think many would agree
               | HR disputes should not take talent level as an input, but
               | if one of those people isn't even a mortal...
               | 
               | ---
               | 
               | None of this is necessary to have a role model, or
               | precludes celebrating someone for doing excellent work as
               | is this case.
        
           | foepys wrote:
           | Some people are just very intelligent and can grasp concepts
           | quickly and apply this new knowledge efficiently. To them a
           | person of average intelligence is often like a toddler that
           | is struggling to put the cube into the round hole.
        
             | logicchains wrote:
             | There are billions of people in the world, over seven
             | thousand people with one-in-a-million intelligence, seven
             | million people with one-in-a-thousand intelligence, but
             | nobody else close to Fabrice Bellard in software output.
        
           | Cybiote wrote:
           | The trick is that even a tiny difference in ability can lead
           | to a large difference in results as the number of decision
           | steps involved in a task increase. The expert wastes less
           | time in search space. This is a combination of good mental
           | models honed from experience and attention to detail.
           | 
           | Something unusual about Bellard is the breadth of his
           | knowledge. There are probably a fair few who can write a fast
           | deterministic low level code optimized neural network
           | library. A decent number who can write an autodiff library
           | but few who can do both. Once that's done, I daresay that's
           | the hard part.
           | 
           | Given such a library, implementing an LSTM or transformer
           | model and combining it with Adaptive Modeling/incremental
           | learning of data statistics using a neural network to feed an
           | Arithmetic Coder is something that is within the reach of
           | many more.
           | 
           | In summary, a 10x developer is not 10x smarter or 10x more
           | skilled, they just waste less time investigating unproductive
           | paths and are maybe 1% less likely to take wrong turns. This
           | ends up making a huge difference for large complex tasks.
        
             | pansa2 wrote:
             | > _Something unusual about Bellard is the breadth of his
             | knowledge._
             | 
             | One thing I've noticed is that he chooses to apply himself
             | to many different subfields of development, but always
             | using the same technologies e.g. all his work seems to be
             | in either C or JavaScript.
             | 
             | I've only worked in a couple of subfields but have
             | seriously used at least 7 different languages, plus
             | countless libraries. Compared to Bellard I've probably
             | spent (wasted?) a large amount of time trying to choose,
             | and then learn, the "best" tool for each job.
        
               | ZephyrBlu wrote:
               | To be fair, Bellard has the luxury of doing that because
               | he's generally writing things from near scratch and he's
               | not beholden to other people or outside requirements.
        
           | chii wrote:
           | > it's hard to look at someone like that and say he's just a
           | mortal who tries his best to write code, just like everyone
           | else.
           | 
           | would anyone look at Usain Bolt and say that he's mortal and
           | just like everyone else? Why is it so difficult to imagine
           | that there are people who are just so much better than you,
           | that you (an average person) can never catch up?
           | 
           | Most people have no trouble understanding that they cannot
           | reach the level of Usain Bolt. Why should programming be any
           | different from sprinting in this case?
        
             | jbjbjbjb wrote:
             | I would say an innate talent for 'coding' is a relatively
             | minor factor. What is very rare to see is people showing
             | the level of responsibility and grit to get on by
             | themselves and build something. Also in a business setting
             | it is unheard of for someone to have the opportunity or
             | even be trusted enough to do it not to mention all the
             | other business factors.
        
               | ZephyrBlu wrote:
               | > _I would say an innate talent for 'coding' is a
               | relatively minor factor_
               | 
               | I would argue that innate talent is strongly linked to
               | genetics. In the case of coding, I think intelligence
               | plays a large role.
        
               | jbjbjbjb wrote:
               | It plays a role, sure. There will be significant
               | diminishing returns though. What I'm suggesting is there
               | are other factors at play here that are way more
               | important than a few extra IQ points. Completing a
               | significant project isn't one dimensional.
        
               | Datenstrom wrote:
               | > innate talent is strongly linked to genetics
               | 
               | I do not believe there is much of a spread in
               | intelligence based on genetics. Short of disability, and
               | given the same consistency, training, and opportunities I
               | seriously doubt that anything even remotely close to a
               | 10x gap would be seen.
               | 
               | I am certainly not keen on the science of the topic
               | though, biology is certainly my worst subject. But I have
               | heard of no research showing such a divide.
        
               | taneq wrote:
               | Just like Usain Bolt's physical attributes are a minor
               | factor and it's his grit and responsibility that makes
               | him so quick?
        
               | jbjbjbjb wrote:
               | No it is completely different the analogy doesn't work.
               | Coding isn't running.
        
             | abiro wrote:
             | Usain Bolt doesn't run 10x faster than your average fit
             | person. It's more like 2x. So it's fair to ask if there are
             | factors other than innate ability at play when observing
             | orders of magnitudes of differences in programmers'
             | outputs.
        
               | nl wrote:
               | The combination of better performance, more consistently
               | over a long period of time is what makes 10x
               | 
               | If you put Manchester City up against a team consisting
               | of average people who are fit and know how to play soccer
               | it's pretty easy to see Manchester City scoring 10x the
               | goals - at least.
        
               | SirSourdough wrote:
               | Isn't the question here more like putting Man City (elite
               | professionals) up against average professionals? I don't
               | see the 10x goal differential there necessarily. It would
               | be unremarkable for a "10x programmer" to be 10x better
               | than a hobbyist, which is what I see as the equivalent of
               | someone "who is fit and knows how to play soccer".
        
               | ZephyrBlu wrote:
               | The difficulty of tasks generally increases
               | exponentially, not linearly though. 2x in absolute terms
               | definitely translates to 10x or more in terms of skill.
        
             | roenxi wrote:
             | 1) Yes, Bolt is also mortal.
             | 
             | 2) If quantified the performance gap between me (unfit,
             | untrained) and Bolt will be smaller than the gap between me
             | (practised programmer) and Bellard.
        
               | itsoktocry wrote:
               | > _If quantified the performance gap between me (unfit,
               | untrained) and Bolt will be smaller than the gap between
               | me (practised programmer) and Bellard._
               | 
               | This is quite the assumption. Bolt is the fastest man
               | we've ever seen. You actually think you're more closely
               | competitive to him than Bellard???
        
               | vlovich123 wrote:
               | It may be stated poorly but what OP is trying to say is
               | that in terms of just output, the relative time
               | difference in the 100m dash is smaller than the relative
               | software output between them.
               | 
               | Obviously that's silly because the physical result is
               | approached asymptotically. World class athletes train and
               | compete their whole lives of a fractional difference in
               | result. Clearly the output by the human brain is not
               | constrained by something asymptotic as what the human
               | body can physically be trained to accomplish.
        
           | agumonkey wrote:
           | What surprises me about bellard, beside the excellency, is
           | the breadth. Who could predict his next topic...
           | 
           | Surely his intellectual life is diverse
        
         | sesm wrote:
         | Did he ever had a dayjob as a programmer? My bet is 'no'
        
           | av500 wrote:
           | he did, at least a few years ago a former coworker worked
           | with him in Paris at a French streaming provider or so
        
       ___________________________________________________________________
       (page generated 2021-05-22 23:01 UTC)