[HN Gopher] Petals: Run 100B+ language models at home bit-torren...
       ___________________________________________________________________
        
       Petals: Run 100B+ language models at home bit-torrent style
        
       Author : antman
       Score  : 530 points
       Date   : 2023-01-02 08:28 UTC (14 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tjoff wrote:
       | > _Fine-tuning and inference up to 10x faster than offloading_
       | 
       | What is "offloading" in this context?
        
         | taink wrote:
         | It's mentioned in their paper:
         | https://arxiv.org/pdf/2209.01188.pdf                 Several
         | recent works aim to democratize LLMs       by "offloading"
         | model parameters to slower but       cheaper memory (RAM or
         | SSD), then running       them on the accelerator layer by layer
         | (Pudipeddi       et al., 2020; Ren et al., 2021). This method
         | allows       running LLMs with a single low-end accelerator
         | by loading parameters from RAM justin-time for       each
         | forward pass. Offloading can be efficient for       processing
         | many tokens in parallel, but it has inher-       ently high
         | latency: for example, generating one to-       ken with
         | BLOOM-176B takes at least 5.5 seconds       for the fastest RAM
         | offloading setup and 22 sec-       onds for the fastest SSD
         | offloading. In addition,       many computers do not have
         | enough RAM to of-       fload 175B parameters.
        
           | dpflan wrote:
           | Is a mobile device / edge device a possible participant /
           | source of resources?
        
         | borzunov wrote:
         | Offloading is another popular method for running large LMs when
         | you don't have the GPU memory to fit the entire model. Imagine
         | you have an A100 GPU with 80 GB memory and want to generate
         | text with BLOOM, a 70-block transformer model with ~2.5 GB of
         | weights per block. For each token, offloading will load the
         | first 1/3 of the model (~27 blocks) from RAM/SSD to your GPU
         | memory, run a forward pass through them, then free the memory
         | and load the next 2/3, and so on.
         | 
         | It turns out, Petals is faster than offloading even though it
         | communicates over the Internet (possible, with servers far away
         | from you). That's because Petals only sends NN activations
         | between servers (a small amount of data), while offloading
         | copies hundreds of GB of NN weights to GPU VRAM to generate
         | each new token.
        
           | madisonmay wrote:
           | Interestingly it sounds like offloading could be made quite
           | efficient in a batch setting if you primarily care about
           | throughput rather than latency. Though I guess for most
           | current LLM applications latency is quite important.
        
       | Seattle3503 wrote:
       | Could this approach be used on other types of models, such as
       | image models eg stable diffusion?
        
         | ShamelessC wrote:
         | There is probably less motivation, as these models are much
         | smaller. An M1 Mac can run inference in under a minute. An GPU
         | in as little as 3-5 seconds. This is supposed to be as much as
         | 20x faster when the distilled Stable diffusion model is
         | released.
        
       | tommica wrote:
       | What an interesting concept - also makes me wonder how BitTorrent
       | could be used for more de-centralizing of data, while keeping it
       | accessible on-demand.
        
         | _joel wrote:
         | Sounds just like https://ipfs.tech/
        
           | Alifatisk wrote:
           | I've always wanted to download an ipfs-node and run it on my
           | pc in the background but I'm worried if it will wear down my
           | hard drives?
        
           | jazzyjackson wrote:
           | rather, ipfs sounds just like bittorrent
        
             | neiman wrote:
             | It has a unique identifier of data (their main feature), a
             | naming system and some other features, that makes it quite
             | different than bittorrent (as far as two p2p sharing data
             | networks can be different, obviously).
        
               | ShamelessC wrote:
               | Right, but one works and has widespread adoption. The
               | other does not. And they certainly cover similar ground.
        
       | simongray wrote:
       | What an fascinating concept. I guess this won't be useful for any
       | kind of realtime feedback system, though?
        
         | borzunov wrote:
         | A Petals dev here. It is not real-time, but we think the speed
         | of ~1 token/sec may be enough for some interactive apps such as
         | chat bots (especially, if you show tokens to a user once they
         | are generated). You can try one at http://chat.petals.ml
         | (heads-up: it may be laggy right now due to lots of HN users
         | trying out the system).
         | 
         | Of course, you could do better if you have enough high-end GPUs
         | to host the entire model yourself (3x A100 or 8x 3090). But if
         | you don't, 1 token/sec is much faster than what you get with
         | other existing methods.
        
           | dpflan wrote:
           | I have not read the technical details, apologies for
           | ignorance, but is there an opportunity for caching?
        
             | jerpint wrote:
             | Probably not, since you need to compute the activations of
             | unknown inputs and there could be infinitely many
             | variations of them
        
           | KaoruAoiShiho wrote:
           | What are the speeds of other existing methods?
        
             | borzunov wrote:
             | Theoretical best-case for RAM offloading is 5.5 sec/token,
             | for SSD offloading - 22 sec/token. Implementations we've
             | tested are not faster than 10 sec/token though. See details
             | in our paper: https://arxiv.org/pdf/2209.01188.pdf
        
         | colordrops wrote:
         | Why not?
        
           | simongray wrote:
           | How would one make a reliable realtime system that depends
           | entirely on unknown network conditions? Perhaps inside a
           | closed network it is possible.
        
             | _joel wrote:
             | That's orthoganal to a realtime system. You can infer at a
             | fair speed so realtime would be possible.
        
               | simongray wrote:
               | Guarantees are not orthogonal to realtime feedback, they
               | are essential. If I write a query, it is not irrelevant
               | whether it takes 1 second or 1 minute to return at any
               | given moment.
               | 
               | You write that speed can be inferred, but the analogy
               | that was used here is BitTorrent--and my experience with
               | BitTorrent tells me that it certainly cannot be inferred.
        
               | _joel wrote:
               | If you read the article text and the response from the
               | dev then yes, inference can happen at 1/s or if
               | parallelised, more. I'm not sure what your parameters are
               | for a realtime system. If you're talking about network
               | reliability, that's a different issue. Yes it can infer
               | quickly, can it do it reliably is another matter.
        
       | rightbyte wrote:
       | Can anyone get it to write code? It just says it has written the
       | code to the file system when I prompt it.
        
         | borzunov wrote:
         | You can switch the chat bot (http://chat.petals.ml) to the
         | "few-shot mode" and provide a couple of "task description &
         | code" examples. Then you can add a new task description and
         | it'll respond with code.
         | 
         | The underlying LM, BLOOM, had a few programming languages in
         | its dataset, so it works at least with Python and C++.
        
       | [deleted]
        
       | alexb_ wrote:
       | Is the MIT license that this uses compatible with the RAIL
       | license that Bloom uses? Or are there not issues with that?
        
         | borzunov wrote:
         | BLOOM is a large LM, and Petals is a tool for running large LMs
         | (not necessarily BLOOM). People using Petals should still
         | follow the model's terms of use regardless of how the tool is
         | licensed.
        
           | alexb_ wrote:
           | Thanks for the clarification
        
       | rolenthedeep wrote:
       | So this sounds like BOINC but specifically for language neural
       | nets?
       | 
       | It's a very interesting concept, and I quite like the idea of a
       | public, open compute cloud. I'd like to see more detail on
       | security: if I'm going to donate time on my personal machine, I'd
       | like some assurance that the workload is properly sandboxed and
       | can't reasonably access my network or data.
       | 
       | Mostly out of interest, what's the advantage to this over just
       | using the existing BOINC network? I've been running BOINC on and
       | off since the dialup days, it's an extremely mature platform with
       | all kinds of workload capabilities.
        
       | Roark66 wrote:
       | This is a nice effort if it allows you to run bloom 170B in 1s
       | per token. Just for comparison sake. With a last gen Ryzen cpu
       | (16core) it takes me about 90s to run the model with 32gb ram
       | (the entire model uses few GB of nvme storage too, as 32gb isn't
       | enough ram).
       | 
       | However, I wonder how they prevent abuse. The main page doesn't
       | mention it. As they mentioned block chain I suspect there will be
       | some sort of credits implemented. I'll definitely be watching
       | where this project goes.
       | 
       | Edit:just to clarify the 90s is not the 170b parameter model. It
       | is 7b bloom version. I forgot to mention it and it puts the
       | ability to run a 170B model in 1s in better perspective.
        
         | borzunov wrote:
         | A Petals dev here. At the moment, we're working on a
         | centralized incentive system, no blockchain involved. It will
         | award points if someone is running a server that consistently
         | stays online and returns correct results. Then, users will be
         | able to spend these points for prioritized inference and
         | (maybe) extra features like increased sequence length/batch
         | size limits. This way, the swarm will prioritize people who
         | actually contribute compute and serve others in the remaining
         | idle time.
        
           | narrator wrote:
           | Maybe you could get Bram Cohen to work on this. Seriously,
           | reach out to him, he loves to work on these game theory sorts
           | of things.
        
             | oefrha wrote:
             | I'd say his reputation suffered quite a bit after the whole
             | Chia (that proof of SSD thrashing coin) BS.
        
           | prettyStandard wrote:
           | Is it possible to have the server shutdown predictably when
           | it finishes tasks periodically, and not get penalized? I
           | would like my machine to run while I'm not using it.
        
             | borzunov wrote:
             | Sure! People who disconnect for a while (not necessarily
             | predictably) won't be penalized - it's okay if you suddenly
             | decide to use your GPU for something else, then get back to
             | running a server.
        
       | pmontra wrote:
       | I guess that it could be used to create a private swarm, if one
       | has a lot of hardware at home.
        
       | technocratius wrote:
       | This sounds (very narrowly) similar to the Enigma network, a
       | blockchain-based technology that can be used for fully encrypted
       | multi-party computation (MPC). It was one of the earlier
       | blockchain projects that actually had an interesting use case and
       | technology in this quite "overhyped" space. They rebranded to the
       | Secret network [0] a few years back and somehow I don't find this
       | use case/promise back nowadays...website screams all of the Web3
       | BS buzzwords it seems :(
       | 
       | [0] https://scrt.network
        
         | ShamelessC wrote:
         | Well yeah, the whole movement is founded in deliberate
         | ignorance of all the existing, _working_ solutions we already
         | have. Also, apparently none of them watched the HBO comedy
         | Silicon Valley.
        
       | Kerbonut wrote:
       | Anyone participating in the swarm is able to potentially log the
       | tokens that get processed by their node. Obviously a security
       | concern. Is there any way to implement homomorphic computing to
       | securely process the tokens?
        
         | borzunov wrote:
         | A Petals dev here. Indeed, the public swarm should not be used
         | for any kind of sensitive data (we have warnings about that in
         | the instructions). If someone wants to process such data, we
         | recommend to set up a private swarm among the orgs they trust
         | (e.g., a couple of labs/small companies who don't have many
         | GPUs themselves may set up a private swarm and collaborate to
         | process their datasets).
         | 
         | Regarding homomorphic encryption (HE), I'm afraid the current
         | methods to run neural networks in the HE fashion involve
         | 10-100x slowdown, since they are mostly not designed for
         | floating-point operations. We'd love to find a way to do it
         | faster though, since privacy is obviously an important issue
         | for many tasks.
        
           | Kerbonut wrote:
           | Hi there, thanks for taking the time to answer questions!
           | There are numerous use cases where even a 100x slowdown would
           | be acceptable if it was demonstrable able to process
           | sensitive data. Can you help me understand what kind of a
           | slowdown that is? Could the 10-100x slowdown be overcome by
           | more compute nodes, or would it require the nodes themselves
           | to be 10-100x faster for e.g.?
        
             | borzunov wrote:
             | If someone wants to process sensitive data and is okay with
             | 10x slowdown, it's better to use offloading. This is
             | another, slower method for running large LMs locally
             | without high-end GPUs, see details here:
             | https://news.ycombinator.com/item?id=34216213
             | 
             | In other words, if Petals nodes became 10-100x slower,
             | Petals would lose its competitive advantage over simpler
             | methods that don't communicate over the Internet.
        
       | thot_experiment wrote:
       | Is there a easy way to run a large language model and/or speech
       | synthesis model locally/in colab? Stable Diffusion is easily
       | accessible and has a vibrant community around AUTOMATIC1111. It's
       | super straightforward to run on a Google Colab. Are there similar
       | open source solutions to LLM/TTS? I believe I had GPT2 running
       | locally at one point, as well as ESPNET2? Not 100% sure it's been
       | a while. Wondering what the state of the art for FOSS neural LLMS
       | and TTS is in 2023.
        
         | takantri wrote:
         | For LLMs, the closest thing that comes to mind is KoboldAI[1].
         | The community isn't as big as Stable Diffusion's, but the
         | Discord server is pretty active. I'm an active member of the
         | community who likes to inform others on it (you can see my
         | previous Hacker News comment was about the same thing, haha).
         | 
         | Like Stable Diffusion, it's a web UI (vaguely reminiscent of
         | NovelAI's) that uses a backend (in this case, Huggingface
         | Transformers). You can use different model architectures, as
         | early as GPT-2 to the newer ones like BigScience's BLOOM,
         | Meta's OPT, and EleutherAI's GPT-Neo and Pythia models, just as
         | long as it was implemented in Huggingface.
         | 
         | They have official support for Google Colab[2][3]; most of the
         | models shown are finetunes on novels (Janeway), choose-your-
         | own-adventures (Nerys / Skein / Adventure), or erotic
         | literature (Erebus / Shinen). You can use the models listed or
         | provide a Huggingface URL.
         | 
         | [1] - https://github.com/koboldai/koboldai-client (source code)
         | 
         | [2] -
         | https://colab.research.google.com/github/koboldai/KoboldAI-C...
         | (TPU colab; 13B and 20B models)
         | 
         | [3] -
         | https://colab.research.google.com/github/koboldai/KoboldAI-C...
         | (GPU colab; 6B models and lower)
        
         | FireInsight wrote:
         | Not sure about TTS, but I've trained GPT-2 (a pytorch
         | implementation I think) on my own data and it worked pretty
         | well, also tried eleutherai's 6B model but, couldn't figure out
         | how to run it.. About an "easy way", I don't think such user
         | interface like what Stable Diffusion has got exists as of now.
        
         | borzunov wrote:
         | Really large (GPT-3-sized) language models have much more
         | parameters than diffusion models, so it's difficult to load
         | them locally unless you have a server with 8x 3090/3x A100
         | GPUs. Petals is the only way to fine-tune and inference 100B+
         | parameter models from Colab, as far as I know.
        
           | borzunov wrote:
           | clarification: You can also use offloading on Colab, but
           | inference with offloading is at least 10x slower (see other
           | comment threads). So it can't really be used for interactive
           | inference, but may be used for fine-tuning with large
           | batches/sequence lengths.
        
           | thot_experiment wrote:
           | Interesting, how does that work with the multiple GPUs? I'm
           | not familiar with the internal workings of these models, is
           | there anywhere where I can get a brief rundown of how the
           | processing is split. I imagine there can't me much swapping
           | between GPUs as that seems prohibitively slow? How is the
           | model split such that it can be worked on in parallel by
           | multiple GPUs w/o being bottlenecked by IO?
        
             | borzunov wrote:
             | I think this is a relevant link for you:
             | https://huggingface.co/transformers/v4.9.0/parallelism.html
             | 
             | For large LMs, people usually use tensor-parallelism (TP)
             | or pipeline-parallelism (PP). TP involves lots of
             | communication, but uses all GPUs 100% of the time and works
             | faster. PP requires much less communication, but may keep
             | some GPUs idle while they are waiting for data from others.
             | 
             | Usually, TP is used when you have good communication
             | channels between GPUs (e.g., they are in one data center
             | and connected with NVLink), while PP is used when
             | communication is a bottleneck (like in Petals, where the
             | data is sent over the Internet, which is much slower than
             | NVLink).
        
             | zone411 wrote:
             | You can read all the gory details here:
             | https://arxiv.org/pdf/2207.00032.pdf
        
             | nmitchko wrote:
             | You can split the model across devices with huggingface
             | accelerate library.
             | 
             | Check out the infer_auto_memory_map metho which will
             | optimize the model for your configuration (multi gpu, ram,
             | nvme) and then run dispatch model on with that memory map.
        
         | Roark66 wrote:
         | There is (for many, but not all large models). Specifically
         | there is huggingface's accelerate library that let's you run
         | the model partially on your gpu, partially on cpu/ram and what
         | doesn't fit in ram is cached in nvme storage (a mirror of two
         | fast drives recommended).
         | 
         | I didn't have much luck with stock accelerate, but once gpu is
         | disabled (so it runs only on cpu offloading to nvme storage
         | where ram is insufficient) worked pretty well with me. (there
         | is a small code change that has to be done as the stock
         | software refuses to run without gpu-it is a simple change
         | described in its github issues). My gpu is 8gb vram, but this
         | way I managed to run 7b parameter models. In principle I could
         | run a lot larger ones, but of course it takes a lot more time.
         | The 7b bloom takes 90s for one inference and additional 60s to
         | load the model (from a spinning disc array) initially.
        
       | oersted wrote:
       | Are there basic stats on real-time contributors and latency?
        
       | vegabook wrote:
       | Any plans for releasing an API spec that would allow for access
       | from languages other than Python?
        
         | borzunov wrote:
         | There's a lightweight HTTP API for inference:
         | https://github.com/borzunov/chat.petals.ml#http-api-methods
        
       | 29athrowaway wrote:
       | People nowadays will use 100 GB in VRAM to run a model that
       | taught itself how to do quicksort.
        
       | Alifatisk wrote:
       | I sometimes tend to forget that you can use decentralization
       | without using all these crypto stuff.
        
         | ShamelessC wrote:
         | So does literally the entire "web 3" movement.
        
         | [deleted]
        
       | NicoleJO wrote:
       | What is the copyright status of these models?
        
         | borzunov wrote:
         | Petals runs BLOOM, an open-source, publicly released model of
         | the same size as GPT-3. Here's a description of the data used
         | to train this model: https://huggingface.co/bigscience/bloom
         | (the "Training" section)
        
           | alexb_ wrote:
           | BLOOM is not open source. It has the RAIL license, which
           | exists solely to place restrictions on the use of the
           | software, as well as forcing people to update. Read more:
           | https://bigscience.huggingface.co/blog/the-bigscience-
           | rail-l...
        
       | szundi wrote:
       | Hm, so the emlolyees never turn off their personal computers
       | please and I have a chatbot? Makes sense.
        
         | [deleted]
        
       | Labo333 wrote:
       | I would love for most of the Blockchain trend to be converted in
       | efforts towards BitTorrent style projects.
       | 
       | Distributed File Sharing or computation without the whole
       | tokenomics that, while interesting, creates too much attention
       | from scammers.
        
         | Pilottwave wrote:
         | While I understand where you are coming from, this argument is
         | basically "money attracts attention from scammers". As a
         | counter I would say "Money attracts attention, period" and
         | attention is an important resource to foster growth.
         | 
         | Decentralized tech would never be where it is today if it
         | weren't for investor attention and the potential for gains. We
         | just have to separate the wheat from the chaff, and remain
         | vigilant for bad actors.
        
           | yunohn wrote:
           | > Decentralized tech would never be where it is today if it
           | weren't for investor attention
           | 
           | That's exactly why blockchains haven't found Product Market
           | Fit.
           | 
           | Investors != Users
        
           | 2fast4you wrote:
           | *blockchain tech would never be where it is today...
           | 
           | I'll bet blockchain is only as popular as it is because of
           | the money. But other forms of decentralization like Mastodon
           | or Matrix are pretty separate from the whole crypto sphere
        
             | Labo333 wrote:
             | Totally agree on that.
             | 
             | Big corps only invest in blockchain because of the buzz
             | words that are used as marketing by the consulting firms to
             | sell their "expertise" and by VCs to sell their companies.
             | 
             | Sure they hope to gain some money, like luxury brands
             | wanting to sell to crypto-billionaires. But crypto was a
             | useful toy, then Ponzi scheme and now it's a closed loop.
             | How long will the bubble last?
        
             | dist1ll wrote:
             | Matrix is separate from the crypto sphere, because it
             | solves a different problem.
             | 
             | Federated platforms appeal to the privacy-oriented "f** big
             | tech" mindset, which is pretty common in the hacker & FOSS
             | crowds. I'd put it in the same category as VPNs, E2E
             | messengers and TOR.
        
               | olivierduval wrote:
               | VPN are really usefull for business to link different
               | locations with internet instead of using (awfully costly)
               | dedicated link... or to allow remote work
               | 
               | So VPN are not really in the same category
        
               | mikepurvis wrote:
               | VPN as a technology, yes. But I think "VPN" in this
               | discussion is referring specifically to the myriad of
               | consumer-oriented paid solutions (SurfShark, NordVPN,
               | whatever) that are pitched as being about protecting your
               | online security, pirating with impunity, and bypassing
               | region-locks.
        
           | ricardobeat wrote:
           | Where exactly is decentralized tech today?
           | 
           | Nobody around me ever uses any of it. Old p2p networks
           | (gnutella, kademlia, emule) had way larger impact on society
           | 20 years ago.
        
             | pmontra wrote:
             | Email and the web are more decentralized than they look.
             | Just think that different FOSS and closed source user
             | agents and servers interoperate without any problem,
             | especially for email.
        
               | feanaro wrote:
               | And in non-web space, there's Matrix
               | (https://matrix.org).
        
             | blamestross wrote:
             | BitTorrent isn't getting any smaller. Mainline DHT is still
             | 10x bigger than Bitcoin.
        
           | sdiacom wrote:
           | The "chaff" and the bad actors are in it for the money.
           | Without them, "decentralized tech" indeed wouldn't be where
           | it is today -- meaning, it wouldn't be overwhelmingly
           | associated with crypto-adjacent grifts.
           | 
           | The _real_ decentralized tech, the one that serves a purpose
           | other than emptying the wallets of naive crypto-enthusiasts,
           | does just fine without a profit motive. You don 't need get-
           | rich-quick promises to get an audience if you're actually
           | doing something useful.
        
           | [deleted]
        
           | Labo333 wrote:
           | What I mean is that the current signal-to-noise ratio is way
           | too weak.
           | 
           | This created a lot of bubbles. NFTs are already down by a
           | lot, now yield farming
           | (https://www.bloomberg.com/news/articles/2022-04-25/sam-
           | bankm...) just took a big hit from the FTX case. I see way
           | too many "revolutionnary" projects from fresh graduates.
           | There is no way that tens of thousands of inexperienced
           | people with barely enough CS education to pass programming
           | interviews would magically create innovation just because VCs
           | put a ton of money on them.
           | 
           | Also, can you tell me more about where decentralized tech is
           | today? BitTorrent was a revolution as a way of information
           | sharing, Onion was a revolution for privacy and Bitcoin was a
           | revolution for decentralized ledgers.
           | 
           | Starting from that, IPFS is the continuation of BitTorrent
           | with more features and Ethereum is a more efficient
           | (especially since The Merge) and customizable (smart
           | contracts are advanced checkers for write operations) ledger.
           | 
           | But what are the real world applications of those
           | technologies? What are concrete use cases of Ethereum and
           | IPFS besides payments, records and file sharing?
           | 
           | Surely there are exciting progresses to be made on the
           | technical side like zk-SNARKS but how useful will they be to
           | society?
           | 
           | I think we already have all the technical blocks we need. If
           | there is no real-world adoption maybe we should just wait
           | another 10 years before pumping crazy amounts of money.
        
         | O__________O wrote:
         | Anyone aware of an open source token-based system that allows
         | users to pool hardware assets, but allow some sort of priority
         | and fairness enforcement to reduce network abuse?
        
         | college_physics wrote:
         | Imho the "print-your-own-money" siren call is only one of the
         | aspects that hampered the whole blockchain world from
         | delivering the disruption it so much craved. The core
         | architectures themselves are somehow too overengineered for
         | broad applicability. Maybe that is what was needed to support
         | the digital gold use case, but it is manifestly not needed for
         | all sorts of other very relevant decentralized applications
         | (bittorrent, fediverse, messaging, email etc)
         | 
         | Its a mute point whether the whole crypto/blockchain period was
         | a net positive. It certainly made a noisy case for "re-
         | decentralization" given the very real and mostly harmful status
         | quo. One could also argue that it diverted vital resources to
         | potentially dead-end or limited use areas. The recurrent scams
         | may also give decentralization a bad name to an uninformed
         | public that can't distinguish all the different versions.
         | 
         | What matters next is that projects that deliver real benefits
         | to users get attention and traction. Worth keeping in mind that
         | the real trouble starts when you get noticed by vested
         | interests as a potential threat.
        
         | nukemandan wrote:
         | https://ipfs.io sans the filecoin aspect that creates
         | incentives for long term/proven storage of data is what you are
         | likely asking for.
        
         | boramalper wrote:
         | > Distributed File Sharing or computation without the whole
         | tokenomics
         | 
         | They went hand in hand even back in the day: private torrent
         | trackers were all about tokenomics where tokens were the number
         | of bytes you've seeded (uploaded) minus you've downloaded.
         | 
         | I'm not saying it's impossible to imagine distributed file
         | sharing otherwise, but to "guarantee" the availability of
         | (especially unpopular) content, you need some incentive
         | mechanisms either built in to the protocol or externally
         | imposed.
        
         | birracerveza wrote:
         | From the project's readme:
         | 
         | >Please do not use the public swarm to process sensitive data.
         | We ask for that because it is an open network, and it is
         | technically possible for peers serving model layers to recover
         | input data and model outputs or modify them in a malicious way.
         | Instead, you can set up a private Petals swarm hosted by people
         | and organization you trust, who are authorized to process your
         | data.
         | 
         | This is what blockchain and staking tokens is for. (Part of the
         | reason, at least)
         | 
         | You act maliciously, the network slashes your stake. "pinky
         | promise not to do bad stuff" only goes so far... and it's
         | really not far at all. You can trust "trusted" organizations or
         | private individuals, but they have no incentives to ensure that
         | the service works as intended, regardless of intent.
        
           | amelius wrote:
           | A blockchain does not magically solve security issues.
           | 
           | In fact, it adds traceability. And data stored in it can
           | never be deleted. Just to name a few issues.
        
             | menzoic wrote:
             | > A blockchain does not magically solve security issues.
             | 
             | This is a weird statement. Blockchain security is real and
             | it isn't "magic". Blockchain is specifically designed to
             | secure decentralized applications.
             | 
             | > In fact, it adds traceability. And data stored in it can
             | never be deleted. Just to name a few issues.
             | 
             | These aren't issues, these are part of the security model.
             | Traceability is fine here because everything is
             | pseudonymous, if you want to avoid that use a chain that
             | has untraceable transactions with zero knowledge proofs
             | (zero traceability).
             | 
             | > And data stored in it can never be deleted. Just to name
             | a few issues.
             | 
             | Storing data on blockchain is extremely expensive. Only
             | hashes are stored on chain, not the data itself. Hashes are
             | much different from encryption because they're
             | irreversible.
        
             | birracerveza wrote:
             | > A blockchain does not magically solve security issues.
             | 
             | No, but staking is certainly an improvement over "pinky
             | promise", and it requires a public blockchain.
             | 
             | > issues
             | 
             | I'm fairly sure those are features, not issues. You are
             | free to disagree.
        
           | taink wrote:
           | Are you arguing for the processing of sensitive data on a
           | public proof-of-stake blockchain?
           | 
           | First, automating the detection of malicious acts against
           | sensitive data seems pretty difficult. So this can't be
           | implemented to systematically occur, and has to be determined
           | after the fact by an investigation. Then, if a malicious act
           | has been detected, the stake is slashed (and the acts are
           | reverted where possible).
           | 
           | Is my understanding sound so far?
           | 
           | Because this would mean in any case where a slashed stake is
           | considered an "acceptable cost" to the bad actor, then the
           | sensitive data is fairly accessible -- the stake is
           | effectively a paywall. And raising the stake is a difficult
           | decision because higher stake means less actors and higher
           | risk of collusion.
           | 
           | I mean this is probably fine for a very large public
           | blockchain where detecting malicious acts is not as difficult
           | or where the malicious act is not very profitable, but
           | sensitive data can, depending on its nature, be extremely
           | profitable to exploit (and as I stated, I don't see how it
           | could be easily detected).
           | 
           | With sensitive data, "trusting" an organization only means
           | having a legal agreement or strategic alliance with a third
           | party. In these circumstances the consequences are usually
           | more serious for the malicious actor than the loss of an
           | arbitrary amount of money.
           | 
           | I've seen suggestions to do sensitive (e.g. medical) data
           | processing on the ethereum blockchain from some enthusiasts
           | and I have never been able to understand this beyond assuming
           | they have a insufficient threat model in mind for this kind
           | of data.
        
         | Galanwe wrote:
         | I agree that unfortunately a lot of crypto projects are way too
         | tokenomics centered, instead of utility centered.
         | 
         | BitTorrent style projects are far more restrictive for a lot of
         | applications though. If something is without cost, then it
         | becomes open to abuse.
         | 
         | Take domain names for instance. I would love to have a
         | decentralized name registry, so that no country have censorship
         | power on the _whole_ internet, as we've seen with recent US
         | intervention at the tld level.
         | 
         | DNS is a good example because it's quite trivial to implement
         | with a plain old DHT. The problem though is how do you prevent
         | scammers and squatters in this model?
         | 
         | There needs to be a cost on a distributed database, otherwise
         | after 1 year it will be fully squatted, used as free hosting,
         | store illegal content, DDoS'd for fun, etc.
         | 
         | How to set this cost though, while keeping the distributed
         | nature of this database ? the simplest solution is to let the
         | users decide, over the price of a token, sold by people running
         | nodes, bought by people using the service.
         | 
         | Honestly I love this idea. The problem with crypto currently is
         | that a whole bunch of parasites jump on these tokens to
         | speculate on their price without giving a.. about the
         | underlying utility. This completely screws the price optimum
         | and creates a inflated price bubble, in turn preventing
         | adoption.
        
           | scotty79 wrote:
           | > ... bunch of parasites jump on these tokens to speculate on
           | their price without giving a.. about the underlying utility.
           | This completely screws the price optimum and creates a
           | inflated price bubble, in turn preventing ...
           | 
           | We have exactly the same problem with real life systems like
           | food, raw materials and real estate.
        
             | Galanwe wrote:
             | Not quite as much I would say, because crypto tokens are
             | priced based on an over speculation of their potential long
             | term explosion, rather than their more down to earth
             | utility function.This is because their current utility is
             | still to be discovered.
             | 
             | Take the DNS example for instance, this was implemented on
             | Ethereum by "ENS", but the price of ETH/gas at the time
             | made a single ".eth" domain name cost something like $500.
        
               | scotty79 wrote:
               | > Not quite as much I would say
               | 
               | Way more in terms of money involved, just slower.
        
         | [deleted]
        
           | [deleted]
        
         | aortega wrote:
         | While superficially similar, BitTorrent and a blockchain are
         | inherently different designs that target different problems.
         | Blockchain is massive data replication, BitTorrent is massive
         | data distribution (with some replication too).
         | 
         | That's why you can actually attack and shut down a bittorrent
         | network, by targeting the index servers, that are not massively
         | replicated. I.E. The Piratebay is often down.
         | 
         | As a solution for this, I'll shamelessly plug my small project
         | here, that combines bittorrents with the blockchain as a
         | invulnerable piratebay-like bittorrent index server, called
         | Blockchain Bay: https://github.com/ortegaalfredo/blockchainbay
         | 
         | It's command line, and don't use any tokenomic scams. You pay
         | the blockchain only for the data you need to upload, that is
         | fortunately, very little as bittorrent magnet links are very
         | small.
        
         | vintermann wrote:
         | That's not going to happen, because "distributed" was always a
         | misnomer when it came to blockchain things. "Massively
         | redundantly replicated" would be better. If work is
         | distributed, every participant has a little piece of the work
         | to do, but in e.g. blockchain contracts, all the participants
         | need to do the whole calculation.
        
           | bluelightning2k wrote:
           | This is a very articulate and interesting way to put it.
           | 
           | "Massively redundantly replicated"
        
           | dsco wrote:
           | How would you incentivize nodes long-term without a
           | reward/token system?
        
             | vintermann wrote:
             | That's a different question. But BitTorrent does fine with
             | simple tit-for-tat rules, no tranferable tokens required.
        
             | britneybitch wrote:
             | A token system can incentivize cooperation in a hostile
             | environment with selfish nodes. But in a friendly
             | environment you might not need the same level of
             | incentives.
             | 
             | I see some similarity here to the world of private torrent
             | trackers. You want a Linux ISO, I want a Linux ISO, we're
             | all working towards the same goal. So we're already
             | incentivized to cooperate, without getting money involved.
             | And trackers also have things like minimum seeding ratios
             | to keep people honest. In the case of AI, you and I both
             | want to generate images, so we're also working towards the
             | same goal, so let's help each other out so both of our
             | workloads finish faster. Maybe idealistic, but I think it
             | could work.
        
             | CodexArcana wrote:
             | In the heyday of BitTorrent sharing was the incentive, and
             | still is I imagine. We don't need to financialize
             | everything.
        
             | wongarsu wrote:
             | Bittorrent basically runs on social norms, and people's
             | willingness to help their community. Or people go to
             | private trackers, where contributing is a requirement for
             | being in the community and having access to its resources.
             | 
             | Blockchains are built under the assumption that everyone is
             | selfish and untrustworthy. Which is a decent assumption
             | when building a crypto currency, but that doesn't mean that
             | every system has to run like that.
        
               | hazebooth wrote:
               | As much as I like private trackers, very few use the
               | ratio-less model to protect against serial leechers.
               | 
               | Typically on a tracker you're given a currency (although
               | not as sound as some e-coins) and can use that to
               | influence your upload or download statistics, which in
               | turn affect your ratio. Some trackers might employ rules
               | where your user class has to have a certain ratio, or
               | else you'll lose privileges like certain forums or even
               | the ability to download at all. (The trackers are private
               | and can control which peers you can see)
        
             | croes wrote:
             | That's more cryptocurrency than blockchain.
             | 
             | Blockchain as such has nothing to do with the costs of a
             | node and incentives to run one
        
               | aliqot wrote:
               | "That's just the nose, see the nose can exist discretely
               | without the finger to pick it."
        
           | jlokier wrote:
           | That's changing. The high replication to verify untrusted
           | peers is decreasing and there's a realistic prospect of it
           | going away due to gradual adoption and development of new zk-
           | proof techniques.
           | 
           | In zk proofs-of-computation-result, different nodes can
           | perform _different_ intensive parts of a calculation and send
           | the results along with proofs that those are the correct
           | results. Other nodes can accept the results and verify the
           | proofs with remarkable efficiency, then use those partial
           | results for further calculations. To me it still feels
           | counterintuitive and almost magical that any large, arbitrary
           | computation result can be easily verified without repeating
           | the computation, without the verifier needing much memory or
           | data.
           | 
           | For cryptocurrency blockchains this allows smart-contract
           | (computational) transactions to be accepted with only one
           | node having to execute the code, everyone else just
           | efficiently verifies the proof to accept the state change. As
           | proofs can be aggregated, this scales well: it isn't
           | necessary for every node to run all the verifications,
           | either.
           | 
           | For big, distributed calculations like the article's, the
           | whole calculation can progress using those partial results
           | without having to rely on trust and reputation, and everyone
           | can have high confidence that the final result is what it
           | should be, not undermined by subterfuge or subtly inaccurate
           | contributions.
           | 
           | This is an offshoot of zero-knowledge proofs, as ironically
           | zero-knowledge is not required for these types of
           | applications. Just the efficient verifiability part.
           | 
           | (Fwiw, I am working on large, scalable zk-proofs-of-
           | computation in my spare time, in optimised software and with
           | hardware accelaration, if anyone is interested in discussing
           | this stuff.)
        
             | vlovich123 wrote:
             | > To me it still feels counterintuitive and almost magical
             | that any large, arbitrary computation result can be easily
             | verified without repeating the computation, without the
             | verifier needing much memory or data.
             | 
             | Why counterintuitive? That's kind of all of cryptography
             | and most of computer science. Take factoring into primes
             | (which has been done for forever): it's really time
             | consuming and expensive to determine what the prime factors
             | for a number are, particularly if it's a big number and you
             | know it only has two. That's because division is very very
             | difficult and time consuming. Multiplication on the other
             | hand is super cheap so once you _tell_ me the prime
             | factors, I can confirm much more quickly whether or not
             | they're factors.
             | 
             | In computer science, one of the earliest identified
             | computation classes is NP complete which has this property.
             | Eg traveling salesman and knapsack packing problem are
             | examples. It can be insanely difficult to find a path that
             | exists between two cities in a graph under some cost. But
             | if you give me a solution I can easily confirm whether it
             | meets the criteria (global optimality testing is itself NP
             | complete but if you give me a set of solutions you can
             | verify which one is the cheapest).
             | 
             | I'm not claiming that factorization is NP btw. There are
             | complexity classes beyond NP that share this property.
             | https://cstheory.stackexchange.com/questions/159/is-
             | integer-...
             | 
             | Anyway. ZK proofs themselves are super surprising and not
             | intuitive but not because verification is fast but because
             | verification reveals nothing to the verifier about the
             | solution. That's the mind blowing result.
        
           | javajosh wrote:
           | I don't know much about it, but I would _assume_ that some
           | effort would be made by miners to look in different parts of
           | the search space - you don 't want 1000 machines looking
           | checking the same hashes, in order, after all.
        
           | z3c0 wrote:
           | Why not just "decentralized"? I'm not sure I ever saw a
           | blockchain that was posited as "distributed".
        
             | croes wrote:
             | Blockchain is a distributed ledger.
        
               | z3c0 wrote:
               | Looking into it, you're correct, but only insofar as
               | we're referring to two different meanings of distributed.
               | Bitcoin is a distributed ledger, as in "fault tolerant".
               | But the consensus mechanism is decentralized. Thus, the
               | decision-making is not distributed.
               | 
               | A distributed consensus mechanism would segment the
               | decisions amongst nodes, not poll for a unanimous
               | response.
        
           | uoaei wrote:
           | BitTorrent then, under this perspective, is "less massively,
           | less redundantly replicated" since people only hold onto and
           | seed the torrents whose files they store locally, but they do
           | not just go on and download and seed everything out there.
           | Seems like a nice compromise to me, but obviously risks data
           | becomming irrevocable in some cases.
           | 
           | Is this also how IPFS works?
        
         | spaceman_2020 wrote:
         | Well-designed tokenomics with strong utility are incredibly
         | powerful tools to not only incentivize usage, but also direct
         | governance and reissue profits as dividends to participants.
         | 
         | Of course its abused by shady operators out to make a quick
         | buck, but issuing tokens, when done right, is a great
         | innovation by itself.
        
         | comboy wrote:
         | One can't scale without the other.
        
           | [deleted]
        
       | Oras wrote:
       | I think it is a great start, and I suppose there will be many
       | iterations to ensure fair usage.
       | 
       | It would be interesting to reach a point to be similar to docker
       | where you don't need to load each layer again and you only need
       | your specific layer. The shared models layers would be already
       | loaded, and running multiple models at once would consume less
       | GPU memory.
        
       | thomastjeffery wrote:
       | What's the point?
       | 
       | So you can get predicted text that looks "coherent". _Then what?_
       | 
       | There is literally no place to add logic. Neural net-based
       | language models are _impressive_ , sure, but it's not hard to see
       | how _useless_ they are.
       | 
       | The only time their output is logically coherent is when they are
       | lucky, and that _seems to_ happen often because most of their
       | _input_ was logically coherent to begin with.
        
         | qznc wrote:
         | Play with https://chat.openai.com/ to experience how powerful
         | predicting text is.
        
         | fasterik wrote:
         | Whether or not the current technology is useless is an
         | empirical question. How many people are using ChatGPT, Stable
         | Diffusion, etc. for economically or personally valuable
         | activities? We actually don't know.
         | 
         | Even if we assume the technology is useless in its current
         | state, it is still incremental progress. Could we have
         | predicted 10 years ago what neural networks would be capable of
         | today? Now, tell me what neural networks will be doing in 10
         | years. If you think you know the answer with any degree of
         | certainty, you're probably deluded.
        
         | borzunov wrote:
         | Chat bot interfaces are only a small part of what can be done
         | with large LMs.
         | 
         | You can use and fine-tune them to solve almost all existing
         | natural language processing tasks: machine translation,
         | recommendation/search, text classification and summarization,
         | code generation, etc.
        
       | sva_ wrote:
       | I tried the chat on http://chat.petals.ml, and it seems to
       | struggle with the current load (as per the disclaimer at the top)
       | Human: How is the weather today?                  AI: the
       | AI theAI)aultAIAI ) course )         . can?esterday to people?
       | ? is to think thatified )
       | 
       | Really cool project though, I wanted to work on something
       | similar.
        
         | metadat wrote:
         | I replied to me with:                 It is nice today.
         | 
         | Not garbled, but also extremely shallow.
        
           | borzunov wrote:
           | It varies from time to time. You can also switch to the few-
           | shot mode to try machine translation, code generation, or
           | other tasks involving longer responses
        
           | jck wrote:
           | To be fair, so was the question.
           | 
           | This is a language model, not an oracle or an interface to
           | weather forecast data.
        
         | throwaway743 wrote:
         | Won't even load for me atm
        
       | artist_eren wrote:
       | exciting, will surely check the git repo
        
       | bogwog wrote:
       | So NVLink/NVSwitch pools multiple GPU resources on a single (very
       | expensive) system. A cheaper alternative to that is "offloading",
       | which is a technique that splits the inference process into
       | smaller steps, so it can run on systems with much less resources
       | available... and Petals is a 10x faster alternative to _that_.
       | 
       | Did I get that right?
       | 
       | This AI stuff is moving very fast and it's hard to keep up, but
       | it's all fascinating.
        
         | parentheses wrote:
         | Matches my understanding also. Can someone in the know confirm?
        
           | nmitchko wrote:
           | Your understanding is correct, but I can't vouch for the
           | claim's accuracy. This could make the execution of models
           | much more accessible to people who don't have a 4x / RTX3090
           | or better in a ML or mining rig ..
        
         | cardine wrote:
         | Offloading is when the computation is done on the CPU instead
         | of the GPU. DeepSpeed is an example of this.
        
           | rolenthedeep wrote:
           | I remember when GPUs were starting to support arbitrary
           | computation and offloading meant shifting work _away_ from
           | the CPU.
        
           | borzunov wrote:
           | In case of offloading, the computations are usually still
           | performed on GPU, but the model is hosted in RAM/SSD instead
           | of the GPU memory (and its chunks are copied to the GPU
           | memory when necessary).
        
             | cardine wrote:
             | A lot of computation is offloaded to the CPU, such as
             | gradients and optimizer states. You are right though that
             | quite a bit of computation is still done on the GPU.
        
         | borzunov wrote:
         | You're right. This comment explains offloading in more detail:
         | https://news.ycombinator.com/item?id=34216213
        
       | m00dy wrote:
       | who is working on blockchain + web3 + AI inference =>
       | Decentralized AI besides me ?
        
         | prettyStandard wrote:
         | I have been interested in 2 of those 3. Where are you working
         | on them?
        
         | dpflan wrote:
         | Can you elaborate on what you are doing?
        
         | ShamelessC wrote:
         | Increasingly fewer people are interested in scamming people
         | after the FTX debacle, from what I can tell
        
       ___________________________________________________________________
       (page generated 2023-01-02 23:00 UTC)