[HN Gopher] Petals: Run 100B+ language models at home bit-torren...
___________________________________________________________________
Petals: Run 100B+ language models at home bit-torrent style
Author : antman
Score : 530 points
Date : 2023-01-02 08:28 UTC (14 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tjoff wrote:
| > _Fine-tuning and inference up to 10x faster than offloading_
|
| What is "offloading" in this context?
| taink wrote:
| It's mentioned in their paper:
| https://arxiv.org/pdf/2209.01188.pdf Several
| recent works aim to democratize LLMs by "offloading"
| model parameters to slower but cheaper memory (RAM or
| SSD), then running them on the accelerator layer by layer
| (Pudipeddi et al., 2020; Ren et al., 2021). This method
| allows running LLMs with a single low-end accelerator
| by loading parameters from RAM justin-time for each
| forward pass. Offloading can be efficient for processing
| many tokens in parallel, but it has inher- ently high
| latency: for example, generating one to- ken with
| BLOOM-176B takes at least 5.5 seconds for the fastest RAM
| offloading setup and 22 sec- onds for the fastest SSD
| offloading. In addition, many computers do not have
| enough RAM to of- fload 175B parameters.
| dpflan wrote:
| Is a mobile device / edge device a possible participant /
| source of resources?
| borzunov wrote:
| Offloading is another popular method for running large LMs when
| you don't have the GPU memory to fit the entire model. Imagine
| you have an A100 GPU with 80 GB memory and want to generate
| text with BLOOM, a 70-block transformer model with ~2.5 GB of
| weights per block. For each token, offloading will load the
| first 1/3 of the model (~27 blocks) from RAM/SSD to your GPU
| memory, run a forward pass through them, then free the memory
| and load the next 2/3, and so on.
|
| It turns out, Petals is faster than offloading even though it
| communicates over the Internet (possible, with servers far away
| from you). That's because Petals only sends NN activations
| between servers (a small amount of data), while offloading
| copies hundreds of GB of NN weights to GPU VRAM to generate
| each new token.
| madisonmay wrote:
| Interestingly it sounds like offloading could be made quite
| efficient in a batch setting if you primarily care about
| throughput rather than latency. Though I guess for most
| current LLM applications latency is quite important.
| Seattle3503 wrote:
| Could this approach be used on other types of models, such as
| image models eg stable diffusion?
| ShamelessC wrote:
| There is probably less motivation, as these models are much
| smaller. An M1 Mac can run inference in under a minute. An GPU
| in as little as 3-5 seconds. This is supposed to be as much as
| 20x faster when the distilled Stable diffusion model is
| released.
| tommica wrote:
| What an interesting concept - also makes me wonder how BitTorrent
| could be used for more de-centralizing of data, while keeping it
| accessible on-demand.
| _joel wrote:
| Sounds just like https://ipfs.tech/
| Alifatisk wrote:
| I've always wanted to download an ipfs-node and run it on my
| pc in the background but I'm worried if it will wear down my
| hard drives?
| jazzyjackson wrote:
| rather, ipfs sounds just like bittorrent
| neiman wrote:
| It has a unique identifier of data (their main feature), a
| naming system and some other features, that makes it quite
| different than bittorrent (as far as two p2p sharing data
| networks can be different, obviously).
| ShamelessC wrote:
| Right, but one works and has widespread adoption. The
| other does not. And they certainly cover similar ground.
| simongray wrote:
| What an fascinating concept. I guess this won't be useful for any
| kind of realtime feedback system, though?
| borzunov wrote:
| A Petals dev here. It is not real-time, but we think the speed
| of ~1 token/sec may be enough for some interactive apps such as
| chat bots (especially, if you show tokens to a user once they
| are generated). You can try one at http://chat.petals.ml
| (heads-up: it may be laggy right now due to lots of HN users
| trying out the system).
|
| Of course, you could do better if you have enough high-end GPUs
| to host the entire model yourself (3x A100 or 8x 3090). But if
| you don't, 1 token/sec is much faster than what you get with
| other existing methods.
| dpflan wrote:
| I have not read the technical details, apologies for
| ignorance, but is there an opportunity for caching?
| jerpint wrote:
| Probably not, since you need to compute the activations of
| unknown inputs and there could be infinitely many
| variations of them
| KaoruAoiShiho wrote:
| What are the speeds of other existing methods?
| borzunov wrote:
| Theoretical best-case for RAM offloading is 5.5 sec/token,
| for SSD offloading - 22 sec/token. Implementations we've
| tested are not faster than 10 sec/token though. See details
| in our paper: https://arxiv.org/pdf/2209.01188.pdf
| colordrops wrote:
| Why not?
| simongray wrote:
| How would one make a reliable realtime system that depends
| entirely on unknown network conditions? Perhaps inside a
| closed network it is possible.
| _joel wrote:
| That's orthoganal to a realtime system. You can infer at a
| fair speed so realtime would be possible.
| simongray wrote:
| Guarantees are not orthogonal to realtime feedback, they
| are essential. If I write a query, it is not irrelevant
| whether it takes 1 second or 1 minute to return at any
| given moment.
|
| You write that speed can be inferred, but the analogy
| that was used here is BitTorrent--and my experience with
| BitTorrent tells me that it certainly cannot be inferred.
| _joel wrote:
| If you read the article text and the response from the
| dev then yes, inference can happen at 1/s or if
| parallelised, more. I'm not sure what your parameters are
| for a realtime system. If you're talking about network
| reliability, that's a different issue. Yes it can infer
| quickly, can it do it reliably is another matter.
| rightbyte wrote:
| Can anyone get it to write code? It just says it has written the
| code to the file system when I prompt it.
| borzunov wrote:
| You can switch the chat bot (http://chat.petals.ml) to the
| "few-shot mode" and provide a couple of "task description &
| code" examples. Then you can add a new task description and
| it'll respond with code.
|
| The underlying LM, BLOOM, had a few programming languages in
| its dataset, so it works at least with Python and C++.
| [deleted]
| alexb_ wrote:
| Is the MIT license that this uses compatible with the RAIL
| license that Bloom uses? Or are there not issues with that?
| borzunov wrote:
| BLOOM is a large LM, and Petals is a tool for running large LMs
| (not necessarily BLOOM). People using Petals should still
| follow the model's terms of use regardless of how the tool is
| licensed.
| alexb_ wrote:
| Thanks for the clarification
| rolenthedeep wrote:
| So this sounds like BOINC but specifically for language neural
| nets?
|
| It's a very interesting concept, and I quite like the idea of a
| public, open compute cloud. I'd like to see more detail on
| security: if I'm going to donate time on my personal machine, I'd
| like some assurance that the workload is properly sandboxed and
| can't reasonably access my network or data.
|
| Mostly out of interest, what's the advantage to this over just
| using the existing BOINC network? I've been running BOINC on and
| off since the dialup days, it's an extremely mature platform with
| all kinds of workload capabilities.
| Roark66 wrote:
| This is a nice effort if it allows you to run bloom 170B in 1s
| per token. Just for comparison sake. With a last gen Ryzen cpu
| (16core) it takes me about 90s to run the model with 32gb ram
| (the entire model uses few GB of nvme storage too, as 32gb isn't
| enough ram).
|
| However, I wonder how they prevent abuse. The main page doesn't
| mention it. As they mentioned block chain I suspect there will be
| some sort of credits implemented. I'll definitely be watching
| where this project goes.
|
| Edit:just to clarify the 90s is not the 170b parameter model. It
| is 7b bloom version. I forgot to mention it and it puts the
| ability to run a 170B model in 1s in better perspective.
| borzunov wrote:
| A Petals dev here. At the moment, we're working on a
| centralized incentive system, no blockchain involved. It will
| award points if someone is running a server that consistently
| stays online and returns correct results. Then, users will be
| able to spend these points for prioritized inference and
| (maybe) extra features like increased sequence length/batch
| size limits. This way, the swarm will prioritize people who
| actually contribute compute and serve others in the remaining
| idle time.
| narrator wrote:
| Maybe you could get Bram Cohen to work on this. Seriously,
| reach out to him, he loves to work on these game theory sorts
| of things.
| oefrha wrote:
| I'd say his reputation suffered quite a bit after the whole
| Chia (that proof of SSD thrashing coin) BS.
| prettyStandard wrote:
| Is it possible to have the server shutdown predictably when
| it finishes tasks periodically, and not get penalized? I
| would like my machine to run while I'm not using it.
| borzunov wrote:
| Sure! People who disconnect for a while (not necessarily
| predictably) won't be penalized - it's okay if you suddenly
| decide to use your GPU for something else, then get back to
| running a server.
| pmontra wrote:
| I guess that it could be used to create a private swarm, if one
| has a lot of hardware at home.
| technocratius wrote:
| This sounds (very narrowly) similar to the Enigma network, a
| blockchain-based technology that can be used for fully encrypted
| multi-party computation (MPC). It was one of the earlier
| blockchain projects that actually had an interesting use case and
| technology in this quite "overhyped" space. They rebranded to the
| Secret network [0] a few years back and somehow I don't find this
| use case/promise back nowadays...website screams all of the Web3
| BS buzzwords it seems :(
|
| [0] https://scrt.network
| ShamelessC wrote:
| Well yeah, the whole movement is founded in deliberate
| ignorance of all the existing, _working_ solutions we already
| have. Also, apparently none of them watched the HBO comedy
| Silicon Valley.
| Kerbonut wrote:
| Anyone participating in the swarm is able to potentially log the
| tokens that get processed by their node. Obviously a security
| concern. Is there any way to implement homomorphic computing to
| securely process the tokens?
| borzunov wrote:
| A Petals dev here. Indeed, the public swarm should not be used
| for any kind of sensitive data (we have warnings about that in
| the instructions). If someone wants to process such data, we
| recommend to set up a private swarm among the orgs they trust
| (e.g., a couple of labs/small companies who don't have many
| GPUs themselves may set up a private swarm and collaborate to
| process their datasets).
|
| Regarding homomorphic encryption (HE), I'm afraid the current
| methods to run neural networks in the HE fashion involve
| 10-100x slowdown, since they are mostly not designed for
| floating-point operations. We'd love to find a way to do it
| faster though, since privacy is obviously an important issue
| for many tasks.
| Kerbonut wrote:
| Hi there, thanks for taking the time to answer questions!
| There are numerous use cases where even a 100x slowdown would
| be acceptable if it was demonstrable able to process
| sensitive data. Can you help me understand what kind of a
| slowdown that is? Could the 10-100x slowdown be overcome by
| more compute nodes, or would it require the nodes themselves
| to be 10-100x faster for e.g.?
| borzunov wrote:
| If someone wants to process sensitive data and is okay with
| 10x slowdown, it's better to use offloading. This is
| another, slower method for running large LMs locally
| without high-end GPUs, see details here:
| https://news.ycombinator.com/item?id=34216213
|
| In other words, if Petals nodes became 10-100x slower,
| Petals would lose its competitive advantage over simpler
| methods that don't communicate over the Internet.
| thot_experiment wrote:
| Is there a easy way to run a large language model and/or speech
| synthesis model locally/in colab? Stable Diffusion is easily
| accessible and has a vibrant community around AUTOMATIC1111. It's
| super straightforward to run on a Google Colab. Are there similar
| open source solutions to LLM/TTS? I believe I had GPT2 running
| locally at one point, as well as ESPNET2? Not 100% sure it's been
| a while. Wondering what the state of the art for FOSS neural LLMS
| and TTS is in 2023.
| takantri wrote:
| For LLMs, the closest thing that comes to mind is KoboldAI[1].
| The community isn't as big as Stable Diffusion's, but the
| Discord server is pretty active. I'm an active member of the
| community who likes to inform others on it (you can see my
| previous Hacker News comment was about the same thing, haha).
|
| Like Stable Diffusion, it's a web UI (vaguely reminiscent of
| NovelAI's) that uses a backend (in this case, Huggingface
| Transformers). You can use different model architectures, as
| early as GPT-2 to the newer ones like BigScience's BLOOM,
| Meta's OPT, and EleutherAI's GPT-Neo and Pythia models, just as
| long as it was implemented in Huggingface.
|
| They have official support for Google Colab[2][3]; most of the
| models shown are finetunes on novels (Janeway), choose-your-
| own-adventures (Nerys / Skein / Adventure), or erotic
| literature (Erebus / Shinen). You can use the models listed or
| provide a Huggingface URL.
|
| [1] - https://github.com/koboldai/koboldai-client (source code)
|
| [2] -
| https://colab.research.google.com/github/koboldai/KoboldAI-C...
| (TPU colab; 13B and 20B models)
|
| [3] -
| https://colab.research.google.com/github/koboldai/KoboldAI-C...
| (GPU colab; 6B models and lower)
| FireInsight wrote:
| Not sure about TTS, but I've trained GPT-2 (a pytorch
| implementation I think) on my own data and it worked pretty
| well, also tried eleutherai's 6B model but, couldn't figure out
| how to run it.. About an "easy way", I don't think such user
| interface like what Stable Diffusion has got exists as of now.
| borzunov wrote:
| Really large (GPT-3-sized) language models have much more
| parameters than diffusion models, so it's difficult to load
| them locally unless you have a server with 8x 3090/3x A100
| GPUs. Petals is the only way to fine-tune and inference 100B+
| parameter models from Colab, as far as I know.
| borzunov wrote:
| clarification: You can also use offloading on Colab, but
| inference with offloading is at least 10x slower (see other
| comment threads). So it can't really be used for interactive
| inference, but may be used for fine-tuning with large
| batches/sequence lengths.
| thot_experiment wrote:
| Interesting, how does that work with the multiple GPUs? I'm
| not familiar with the internal workings of these models, is
| there anywhere where I can get a brief rundown of how the
| processing is split. I imagine there can't me much swapping
| between GPUs as that seems prohibitively slow? How is the
| model split such that it can be worked on in parallel by
| multiple GPUs w/o being bottlenecked by IO?
| borzunov wrote:
| I think this is a relevant link for you:
| https://huggingface.co/transformers/v4.9.0/parallelism.html
|
| For large LMs, people usually use tensor-parallelism (TP)
| or pipeline-parallelism (PP). TP involves lots of
| communication, but uses all GPUs 100% of the time and works
| faster. PP requires much less communication, but may keep
| some GPUs idle while they are waiting for data from others.
|
| Usually, TP is used when you have good communication
| channels between GPUs (e.g., they are in one data center
| and connected with NVLink), while PP is used when
| communication is a bottleneck (like in Petals, where the
| data is sent over the Internet, which is much slower than
| NVLink).
| zone411 wrote:
| You can read all the gory details here:
| https://arxiv.org/pdf/2207.00032.pdf
| nmitchko wrote:
| You can split the model across devices with huggingface
| accelerate library.
|
| Check out the infer_auto_memory_map metho which will
| optimize the model for your configuration (multi gpu, ram,
| nvme) and then run dispatch model on with that memory map.
| Roark66 wrote:
| There is (for many, but not all large models). Specifically
| there is huggingface's accelerate library that let's you run
| the model partially on your gpu, partially on cpu/ram and what
| doesn't fit in ram is cached in nvme storage (a mirror of two
| fast drives recommended).
|
| I didn't have much luck with stock accelerate, but once gpu is
| disabled (so it runs only on cpu offloading to nvme storage
| where ram is insufficient) worked pretty well with me. (there
| is a small code change that has to be done as the stock
| software refuses to run without gpu-it is a simple change
| described in its github issues). My gpu is 8gb vram, but this
| way I managed to run 7b parameter models. In principle I could
| run a lot larger ones, but of course it takes a lot more time.
| The 7b bloom takes 90s for one inference and additional 60s to
| load the model (from a spinning disc array) initially.
| oersted wrote:
| Are there basic stats on real-time contributors and latency?
| vegabook wrote:
| Any plans for releasing an API spec that would allow for access
| from languages other than Python?
| borzunov wrote:
| There's a lightweight HTTP API for inference:
| https://github.com/borzunov/chat.petals.ml#http-api-methods
| 29athrowaway wrote:
| People nowadays will use 100 GB in VRAM to run a model that
| taught itself how to do quicksort.
| Alifatisk wrote:
| I sometimes tend to forget that you can use decentralization
| without using all these crypto stuff.
| ShamelessC wrote:
| So does literally the entire "web 3" movement.
| [deleted]
| NicoleJO wrote:
| What is the copyright status of these models?
| borzunov wrote:
| Petals runs BLOOM, an open-source, publicly released model of
| the same size as GPT-3. Here's a description of the data used
| to train this model: https://huggingface.co/bigscience/bloom
| (the "Training" section)
| alexb_ wrote:
| BLOOM is not open source. It has the RAIL license, which
| exists solely to place restrictions on the use of the
| software, as well as forcing people to update. Read more:
| https://bigscience.huggingface.co/blog/the-bigscience-
| rail-l...
| szundi wrote:
| Hm, so the emlolyees never turn off their personal computers
| please and I have a chatbot? Makes sense.
| [deleted]
| Labo333 wrote:
| I would love for most of the Blockchain trend to be converted in
| efforts towards BitTorrent style projects.
|
| Distributed File Sharing or computation without the whole
| tokenomics that, while interesting, creates too much attention
| from scammers.
| Pilottwave wrote:
| While I understand where you are coming from, this argument is
| basically "money attracts attention from scammers". As a
| counter I would say "Money attracts attention, period" and
| attention is an important resource to foster growth.
|
| Decentralized tech would never be where it is today if it
| weren't for investor attention and the potential for gains. We
| just have to separate the wheat from the chaff, and remain
| vigilant for bad actors.
| yunohn wrote:
| > Decentralized tech would never be where it is today if it
| weren't for investor attention
|
| That's exactly why blockchains haven't found Product Market
| Fit.
|
| Investors != Users
| 2fast4you wrote:
| *blockchain tech would never be where it is today...
|
| I'll bet blockchain is only as popular as it is because of
| the money. But other forms of decentralization like Mastodon
| or Matrix are pretty separate from the whole crypto sphere
| Labo333 wrote:
| Totally agree on that.
|
| Big corps only invest in blockchain because of the buzz
| words that are used as marketing by the consulting firms to
| sell their "expertise" and by VCs to sell their companies.
|
| Sure they hope to gain some money, like luxury brands
| wanting to sell to crypto-billionaires. But crypto was a
| useful toy, then Ponzi scheme and now it's a closed loop.
| How long will the bubble last?
| dist1ll wrote:
| Matrix is separate from the crypto sphere, because it
| solves a different problem.
|
| Federated platforms appeal to the privacy-oriented "f** big
| tech" mindset, which is pretty common in the hacker & FOSS
| crowds. I'd put it in the same category as VPNs, E2E
| messengers and TOR.
| olivierduval wrote:
| VPN are really usefull for business to link different
| locations with internet instead of using (awfully costly)
| dedicated link... or to allow remote work
|
| So VPN are not really in the same category
| mikepurvis wrote:
| VPN as a technology, yes. But I think "VPN" in this
| discussion is referring specifically to the myriad of
| consumer-oriented paid solutions (SurfShark, NordVPN,
| whatever) that are pitched as being about protecting your
| online security, pirating with impunity, and bypassing
| region-locks.
| ricardobeat wrote:
| Where exactly is decentralized tech today?
|
| Nobody around me ever uses any of it. Old p2p networks
| (gnutella, kademlia, emule) had way larger impact on society
| 20 years ago.
| pmontra wrote:
| Email and the web are more decentralized than they look.
| Just think that different FOSS and closed source user
| agents and servers interoperate without any problem,
| especially for email.
| feanaro wrote:
| And in non-web space, there's Matrix
| (https://matrix.org).
| blamestross wrote:
| BitTorrent isn't getting any smaller. Mainline DHT is still
| 10x bigger than Bitcoin.
| sdiacom wrote:
| The "chaff" and the bad actors are in it for the money.
| Without them, "decentralized tech" indeed wouldn't be where
| it is today -- meaning, it wouldn't be overwhelmingly
| associated with crypto-adjacent grifts.
|
| The _real_ decentralized tech, the one that serves a purpose
| other than emptying the wallets of naive crypto-enthusiasts,
| does just fine without a profit motive. You don 't need get-
| rich-quick promises to get an audience if you're actually
| doing something useful.
| [deleted]
| Labo333 wrote:
| What I mean is that the current signal-to-noise ratio is way
| too weak.
|
| This created a lot of bubbles. NFTs are already down by a
| lot, now yield farming
| (https://www.bloomberg.com/news/articles/2022-04-25/sam-
| bankm...) just took a big hit from the FTX case. I see way
| too many "revolutionnary" projects from fresh graduates.
| There is no way that tens of thousands of inexperienced
| people with barely enough CS education to pass programming
| interviews would magically create innovation just because VCs
| put a ton of money on them.
|
| Also, can you tell me more about where decentralized tech is
| today? BitTorrent was a revolution as a way of information
| sharing, Onion was a revolution for privacy and Bitcoin was a
| revolution for decentralized ledgers.
|
| Starting from that, IPFS is the continuation of BitTorrent
| with more features and Ethereum is a more efficient
| (especially since The Merge) and customizable (smart
| contracts are advanced checkers for write operations) ledger.
|
| But what are the real world applications of those
| technologies? What are concrete use cases of Ethereum and
| IPFS besides payments, records and file sharing?
|
| Surely there are exciting progresses to be made on the
| technical side like zk-SNARKS but how useful will they be to
| society?
|
| I think we already have all the technical blocks we need. If
| there is no real-world adoption maybe we should just wait
| another 10 years before pumping crazy amounts of money.
| O__________O wrote:
| Anyone aware of an open source token-based system that allows
| users to pool hardware assets, but allow some sort of priority
| and fairness enforcement to reduce network abuse?
| college_physics wrote:
| Imho the "print-your-own-money" siren call is only one of the
| aspects that hampered the whole blockchain world from
| delivering the disruption it so much craved. The core
| architectures themselves are somehow too overengineered for
| broad applicability. Maybe that is what was needed to support
| the digital gold use case, but it is manifestly not needed for
| all sorts of other very relevant decentralized applications
| (bittorrent, fediverse, messaging, email etc)
|
| Its a mute point whether the whole crypto/blockchain period was
| a net positive. It certainly made a noisy case for "re-
| decentralization" given the very real and mostly harmful status
| quo. One could also argue that it diverted vital resources to
| potentially dead-end or limited use areas. The recurrent scams
| may also give decentralization a bad name to an uninformed
| public that can't distinguish all the different versions.
|
| What matters next is that projects that deliver real benefits
| to users get attention and traction. Worth keeping in mind that
| the real trouble starts when you get noticed by vested
| interests as a potential threat.
| nukemandan wrote:
| https://ipfs.io sans the filecoin aspect that creates
| incentives for long term/proven storage of data is what you are
| likely asking for.
| boramalper wrote:
| > Distributed File Sharing or computation without the whole
| tokenomics
|
| They went hand in hand even back in the day: private torrent
| trackers were all about tokenomics where tokens were the number
| of bytes you've seeded (uploaded) minus you've downloaded.
|
| I'm not saying it's impossible to imagine distributed file
| sharing otherwise, but to "guarantee" the availability of
| (especially unpopular) content, you need some incentive
| mechanisms either built in to the protocol or externally
| imposed.
| birracerveza wrote:
| From the project's readme:
|
| >Please do not use the public swarm to process sensitive data.
| We ask for that because it is an open network, and it is
| technically possible for peers serving model layers to recover
| input data and model outputs or modify them in a malicious way.
| Instead, you can set up a private Petals swarm hosted by people
| and organization you trust, who are authorized to process your
| data.
|
| This is what blockchain and staking tokens is for. (Part of the
| reason, at least)
|
| You act maliciously, the network slashes your stake. "pinky
| promise not to do bad stuff" only goes so far... and it's
| really not far at all. You can trust "trusted" organizations or
| private individuals, but they have no incentives to ensure that
| the service works as intended, regardless of intent.
| amelius wrote:
| A blockchain does not magically solve security issues.
|
| In fact, it adds traceability. And data stored in it can
| never be deleted. Just to name a few issues.
| menzoic wrote:
| > A blockchain does not magically solve security issues.
|
| This is a weird statement. Blockchain security is real and
| it isn't "magic". Blockchain is specifically designed to
| secure decentralized applications.
|
| > In fact, it adds traceability. And data stored in it can
| never be deleted. Just to name a few issues.
|
| These aren't issues, these are part of the security model.
| Traceability is fine here because everything is
| pseudonymous, if you want to avoid that use a chain that
| has untraceable transactions with zero knowledge proofs
| (zero traceability).
|
| > And data stored in it can never be deleted. Just to name
| a few issues.
|
| Storing data on blockchain is extremely expensive. Only
| hashes are stored on chain, not the data itself. Hashes are
| much different from encryption because they're
| irreversible.
| birracerveza wrote:
| > A blockchain does not magically solve security issues.
|
| No, but staking is certainly an improvement over "pinky
| promise", and it requires a public blockchain.
|
| > issues
|
| I'm fairly sure those are features, not issues. You are
| free to disagree.
| taink wrote:
| Are you arguing for the processing of sensitive data on a
| public proof-of-stake blockchain?
|
| First, automating the detection of malicious acts against
| sensitive data seems pretty difficult. So this can't be
| implemented to systematically occur, and has to be determined
| after the fact by an investigation. Then, if a malicious act
| has been detected, the stake is slashed (and the acts are
| reverted where possible).
|
| Is my understanding sound so far?
|
| Because this would mean in any case where a slashed stake is
| considered an "acceptable cost" to the bad actor, then the
| sensitive data is fairly accessible -- the stake is
| effectively a paywall. And raising the stake is a difficult
| decision because higher stake means less actors and higher
| risk of collusion.
|
| I mean this is probably fine for a very large public
| blockchain where detecting malicious acts is not as difficult
| or where the malicious act is not very profitable, but
| sensitive data can, depending on its nature, be extremely
| profitable to exploit (and as I stated, I don't see how it
| could be easily detected).
|
| With sensitive data, "trusting" an organization only means
| having a legal agreement or strategic alliance with a third
| party. In these circumstances the consequences are usually
| more serious for the malicious actor than the loss of an
| arbitrary amount of money.
|
| I've seen suggestions to do sensitive (e.g. medical) data
| processing on the ethereum blockchain from some enthusiasts
| and I have never been able to understand this beyond assuming
| they have a insufficient threat model in mind for this kind
| of data.
| Galanwe wrote:
| I agree that unfortunately a lot of crypto projects are way too
| tokenomics centered, instead of utility centered.
|
| BitTorrent style projects are far more restrictive for a lot of
| applications though. If something is without cost, then it
| becomes open to abuse.
|
| Take domain names for instance. I would love to have a
| decentralized name registry, so that no country have censorship
| power on the _whole_ internet, as we've seen with recent US
| intervention at the tld level.
|
| DNS is a good example because it's quite trivial to implement
| with a plain old DHT. The problem though is how do you prevent
| scammers and squatters in this model?
|
| There needs to be a cost on a distributed database, otherwise
| after 1 year it will be fully squatted, used as free hosting,
| store illegal content, DDoS'd for fun, etc.
|
| How to set this cost though, while keeping the distributed
| nature of this database ? the simplest solution is to let the
| users decide, over the price of a token, sold by people running
| nodes, bought by people using the service.
|
| Honestly I love this idea. The problem with crypto currently is
| that a whole bunch of parasites jump on these tokens to
| speculate on their price without giving a.. about the
| underlying utility. This completely screws the price optimum
| and creates a inflated price bubble, in turn preventing
| adoption.
| scotty79 wrote:
| > ... bunch of parasites jump on these tokens to speculate on
| their price without giving a.. about the underlying utility.
| This completely screws the price optimum and creates a
| inflated price bubble, in turn preventing ...
|
| We have exactly the same problem with real life systems like
| food, raw materials and real estate.
| Galanwe wrote:
| Not quite as much I would say, because crypto tokens are
| priced based on an over speculation of their potential long
| term explosion, rather than their more down to earth
| utility function.This is because their current utility is
| still to be discovered.
|
| Take the DNS example for instance, this was implemented on
| Ethereum by "ENS", but the price of ETH/gas at the time
| made a single ".eth" domain name cost something like $500.
| scotty79 wrote:
| > Not quite as much I would say
|
| Way more in terms of money involved, just slower.
| [deleted]
| [deleted]
| aortega wrote:
| While superficially similar, BitTorrent and a blockchain are
| inherently different designs that target different problems.
| Blockchain is massive data replication, BitTorrent is massive
| data distribution (with some replication too).
|
| That's why you can actually attack and shut down a bittorrent
| network, by targeting the index servers, that are not massively
| replicated. I.E. The Piratebay is often down.
|
| As a solution for this, I'll shamelessly plug my small project
| here, that combines bittorrents with the blockchain as a
| invulnerable piratebay-like bittorrent index server, called
| Blockchain Bay: https://github.com/ortegaalfredo/blockchainbay
|
| It's command line, and don't use any tokenomic scams. You pay
| the blockchain only for the data you need to upload, that is
| fortunately, very little as bittorrent magnet links are very
| small.
| vintermann wrote:
| That's not going to happen, because "distributed" was always a
| misnomer when it came to blockchain things. "Massively
| redundantly replicated" would be better. If work is
| distributed, every participant has a little piece of the work
| to do, but in e.g. blockchain contracts, all the participants
| need to do the whole calculation.
| bluelightning2k wrote:
| This is a very articulate and interesting way to put it.
|
| "Massively redundantly replicated"
| dsco wrote:
| How would you incentivize nodes long-term without a
| reward/token system?
| vintermann wrote:
| That's a different question. But BitTorrent does fine with
| simple tit-for-tat rules, no tranferable tokens required.
| britneybitch wrote:
| A token system can incentivize cooperation in a hostile
| environment with selfish nodes. But in a friendly
| environment you might not need the same level of
| incentives.
|
| I see some similarity here to the world of private torrent
| trackers. You want a Linux ISO, I want a Linux ISO, we're
| all working towards the same goal. So we're already
| incentivized to cooperate, without getting money involved.
| And trackers also have things like minimum seeding ratios
| to keep people honest. In the case of AI, you and I both
| want to generate images, so we're also working towards the
| same goal, so let's help each other out so both of our
| workloads finish faster. Maybe idealistic, but I think it
| could work.
| CodexArcana wrote:
| In the heyday of BitTorrent sharing was the incentive, and
| still is I imagine. We don't need to financialize
| everything.
| wongarsu wrote:
| Bittorrent basically runs on social norms, and people's
| willingness to help their community. Or people go to
| private trackers, where contributing is a requirement for
| being in the community and having access to its resources.
|
| Blockchains are built under the assumption that everyone is
| selfish and untrustworthy. Which is a decent assumption
| when building a crypto currency, but that doesn't mean that
| every system has to run like that.
| hazebooth wrote:
| As much as I like private trackers, very few use the
| ratio-less model to protect against serial leechers.
|
| Typically on a tracker you're given a currency (although
| not as sound as some e-coins) and can use that to
| influence your upload or download statistics, which in
| turn affect your ratio. Some trackers might employ rules
| where your user class has to have a certain ratio, or
| else you'll lose privileges like certain forums or even
| the ability to download at all. (The trackers are private
| and can control which peers you can see)
| croes wrote:
| That's more cryptocurrency than blockchain.
|
| Blockchain as such has nothing to do with the costs of a
| node and incentives to run one
| aliqot wrote:
| "That's just the nose, see the nose can exist discretely
| without the finger to pick it."
| jlokier wrote:
| That's changing. The high replication to verify untrusted
| peers is decreasing and there's a realistic prospect of it
| going away due to gradual adoption and development of new zk-
| proof techniques.
|
| In zk proofs-of-computation-result, different nodes can
| perform _different_ intensive parts of a calculation and send
| the results along with proofs that those are the correct
| results. Other nodes can accept the results and verify the
| proofs with remarkable efficiency, then use those partial
| results for further calculations. To me it still feels
| counterintuitive and almost magical that any large, arbitrary
| computation result can be easily verified without repeating
| the computation, without the verifier needing much memory or
| data.
|
| For cryptocurrency blockchains this allows smart-contract
| (computational) transactions to be accepted with only one
| node having to execute the code, everyone else just
| efficiently verifies the proof to accept the state change. As
| proofs can be aggregated, this scales well: it isn't
| necessary for every node to run all the verifications,
| either.
|
| For big, distributed calculations like the article's, the
| whole calculation can progress using those partial results
| without having to rely on trust and reputation, and everyone
| can have high confidence that the final result is what it
| should be, not undermined by subterfuge or subtly inaccurate
| contributions.
|
| This is an offshoot of zero-knowledge proofs, as ironically
| zero-knowledge is not required for these types of
| applications. Just the efficient verifiability part.
|
| (Fwiw, I am working on large, scalable zk-proofs-of-
| computation in my spare time, in optimised software and with
| hardware accelaration, if anyone is interested in discussing
| this stuff.)
| vlovich123 wrote:
| > To me it still feels counterintuitive and almost magical
| that any large, arbitrary computation result can be easily
| verified without repeating the computation, without the
| verifier needing much memory or data.
|
| Why counterintuitive? That's kind of all of cryptography
| and most of computer science. Take factoring into primes
| (which has been done for forever): it's really time
| consuming and expensive to determine what the prime factors
| for a number are, particularly if it's a big number and you
| know it only has two. That's because division is very very
| difficult and time consuming. Multiplication on the other
| hand is super cheap so once you _tell_ me the prime
| factors, I can confirm much more quickly whether or not
| they're factors.
|
| In computer science, one of the earliest identified
| computation classes is NP complete which has this property.
| Eg traveling salesman and knapsack packing problem are
| examples. It can be insanely difficult to find a path that
| exists between two cities in a graph under some cost. But
| if you give me a solution I can easily confirm whether it
| meets the criteria (global optimality testing is itself NP
| complete but if you give me a set of solutions you can
| verify which one is the cheapest).
|
| I'm not claiming that factorization is NP btw. There are
| complexity classes beyond NP that share this property.
| https://cstheory.stackexchange.com/questions/159/is-
| integer-...
|
| Anyway. ZK proofs themselves are super surprising and not
| intuitive but not because verification is fast but because
| verification reveals nothing to the verifier about the
| solution. That's the mind blowing result.
| javajosh wrote:
| I don't know much about it, but I would _assume_ that some
| effort would be made by miners to look in different parts of
| the search space - you don 't want 1000 machines looking
| checking the same hashes, in order, after all.
| z3c0 wrote:
| Why not just "decentralized"? I'm not sure I ever saw a
| blockchain that was posited as "distributed".
| croes wrote:
| Blockchain is a distributed ledger.
| z3c0 wrote:
| Looking into it, you're correct, but only insofar as
| we're referring to two different meanings of distributed.
| Bitcoin is a distributed ledger, as in "fault tolerant".
| But the consensus mechanism is decentralized. Thus, the
| decision-making is not distributed.
|
| A distributed consensus mechanism would segment the
| decisions amongst nodes, not poll for a unanimous
| response.
| uoaei wrote:
| BitTorrent then, under this perspective, is "less massively,
| less redundantly replicated" since people only hold onto and
| seed the torrents whose files they store locally, but they do
| not just go on and download and seed everything out there.
| Seems like a nice compromise to me, but obviously risks data
| becomming irrevocable in some cases.
|
| Is this also how IPFS works?
| spaceman_2020 wrote:
| Well-designed tokenomics with strong utility are incredibly
| powerful tools to not only incentivize usage, but also direct
| governance and reissue profits as dividends to participants.
|
| Of course its abused by shady operators out to make a quick
| buck, but issuing tokens, when done right, is a great
| innovation by itself.
| comboy wrote:
| One can't scale without the other.
| [deleted]
| Oras wrote:
| I think it is a great start, and I suppose there will be many
| iterations to ensure fair usage.
|
| It would be interesting to reach a point to be similar to docker
| where you don't need to load each layer again and you only need
| your specific layer. The shared models layers would be already
| loaded, and running multiple models at once would consume less
| GPU memory.
| thomastjeffery wrote:
| What's the point?
|
| So you can get predicted text that looks "coherent". _Then what?_
|
| There is literally no place to add logic. Neural net-based
| language models are _impressive_ , sure, but it's not hard to see
| how _useless_ they are.
|
| The only time their output is logically coherent is when they are
| lucky, and that _seems to_ happen often because most of their
| _input_ was logically coherent to begin with.
| qznc wrote:
| Play with https://chat.openai.com/ to experience how powerful
| predicting text is.
| fasterik wrote:
| Whether or not the current technology is useless is an
| empirical question. How many people are using ChatGPT, Stable
| Diffusion, etc. for economically or personally valuable
| activities? We actually don't know.
|
| Even if we assume the technology is useless in its current
| state, it is still incremental progress. Could we have
| predicted 10 years ago what neural networks would be capable of
| today? Now, tell me what neural networks will be doing in 10
| years. If you think you know the answer with any degree of
| certainty, you're probably deluded.
| borzunov wrote:
| Chat bot interfaces are only a small part of what can be done
| with large LMs.
|
| You can use and fine-tune them to solve almost all existing
| natural language processing tasks: machine translation,
| recommendation/search, text classification and summarization,
| code generation, etc.
| sva_ wrote:
| I tried the chat on http://chat.petals.ml, and it seems to
| struggle with the current load (as per the disclaimer at the top)
| Human: How is the weather today? AI: the
| AI theAI)aultAIAI ) course ) . can?esterday to people?
| ? is to think thatified )
|
| Really cool project though, I wanted to work on something
| similar.
| metadat wrote:
| I replied to me with: It is nice today.
|
| Not garbled, but also extremely shallow.
| borzunov wrote:
| It varies from time to time. You can also switch to the few-
| shot mode to try machine translation, code generation, or
| other tasks involving longer responses
| jck wrote:
| To be fair, so was the question.
|
| This is a language model, not an oracle or an interface to
| weather forecast data.
| throwaway743 wrote:
| Won't even load for me atm
| artist_eren wrote:
| exciting, will surely check the git repo
| bogwog wrote:
| So NVLink/NVSwitch pools multiple GPU resources on a single (very
| expensive) system. A cheaper alternative to that is "offloading",
| which is a technique that splits the inference process into
| smaller steps, so it can run on systems with much less resources
| available... and Petals is a 10x faster alternative to _that_.
|
| Did I get that right?
|
| This AI stuff is moving very fast and it's hard to keep up, but
| it's all fascinating.
| parentheses wrote:
| Matches my understanding also. Can someone in the know confirm?
| nmitchko wrote:
| Your understanding is correct, but I can't vouch for the
| claim's accuracy. This could make the execution of models
| much more accessible to people who don't have a 4x / RTX3090
| or better in a ML or mining rig ..
| cardine wrote:
| Offloading is when the computation is done on the CPU instead
| of the GPU. DeepSpeed is an example of this.
| rolenthedeep wrote:
| I remember when GPUs were starting to support arbitrary
| computation and offloading meant shifting work _away_ from
| the CPU.
| borzunov wrote:
| In case of offloading, the computations are usually still
| performed on GPU, but the model is hosted in RAM/SSD instead
| of the GPU memory (and its chunks are copied to the GPU
| memory when necessary).
| cardine wrote:
| A lot of computation is offloaded to the CPU, such as
| gradients and optimizer states. You are right though that
| quite a bit of computation is still done on the GPU.
| borzunov wrote:
| You're right. This comment explains offloading in more detail:
| https://news.ycombinator.com/item?id=34216213
| m00dy wrote:
| who is working on blockchain + web3 + AI inference =>
| Decentralized AI besides me ?
| prettyStandard wrote:
| I have been interested in 2 of those 3. Where are you working
| on them?
| dpflan wrote:
| Can you elaborate on what you are doing?
| ShamelessC wrote:
| Increasingly fewer people are interested in scamming people
| after the FTX debacle, from what I can tell
___________________________________________________________________
(page generated 2023-01-02 23:00 UTC)