[HN Gopher] Run 100B+ language models at home, BitTorrent-style
___________________________________________________________________
Run 100B+ language models at home, BitTorrent-style
Author : srameshc
Score : 677 points
Date : 2023-03-20 22:50 UTC (1 days ago)
(HTM) web link (petals.ml)
(TXT) w3m dump (petals.ml)
| amrb wrote:
| What I want to see is a bunch of RTX3060 mining cards being used
| to run ML models. They did talk about bandwidth issues with under
| 100mbit for servers tho we're doing this as a local cluster you
| could run some ex-server network gear and be gold.
| jhoelzel wrote:
| i just thought this through building my new PC and house. The
| bottleneck is always the NIC because the most you can do right
| now is like 10gbe for conventional motherboards.
|
| After that you also run into cabeling issues. Cat 8 for
| instance also only does 40gbe max, which means for any more you
| need to bundle up connections which comes with its own
| problems.
|
| Another point is that while mining, gpus still are independent
| and not connected to each other. so each of them are restricted
| to the max your PCIe port will give you too.
|
| PCIe 4.0 has a maximum data transfer rate of 16 GT/s
| (gigatransfers per second) per lane, which translates to 2 GB/s
| (gigabytes per second) per lane. PCIe 4.0 can support up to 16
| lanes, which means that it can provide a maximum data transfer
| rate of 32 GB/s (gigabytes per second) in each direction
| (upstream and downstream) on a x16 slot.
| nothrowaways wrote:
| This is pure genius if it works.
| mdgrech23 wrote:
| dis gonna be big
| quickthrower2 wrote:
| Dumb question from someone who knows not to much yet about LLMs.
| How can you trust the other computers? Will I end up with a bunch
| of swear words coming back from the other nodes that are playing
| a prank?
| techwizrd wrote:
| I'm not entirely how the approach they're using works [0], but
| I study federated learning and one of the highly-cited survey
| papers has several chapters (5 and 6 in particular) addressing
| potential attacks, failure modes, and bias [1].
|
| 0: https://github.com/learning-at-home/hivemind
|
| 1: https://arxiv.org/abs/1912.04977
| ftxbro wrote:
| They don't solve the problem completely, but they address the
| problem in their publication as follows. The inference step
| uses a lot of layers and the first and last layers have more
| 'interpretability' so some of these are the ones that you solve
| locally on your own computer. If they wanted to prank you, they
| would have to use some sophisticated thing that probably
| doesn't exist yet but it could still be possible. Also if it
| becomes a problem they could do usual methods like run on
| multiple other computers and use the majority solution.
| sebzim4500 wrote:
| >If they wanted to prank you, they would have to use some
| sophisticated thing that probably doesn't exist yet but it
| could still be possible.
|
| Isn't the attack straightforward?
|
| i) Take the model, freeze all the weights except the ones you
| expect to be responsible for
|
| ii) Finetune to produce whatever output you are looking for.
|
| iii) Profit. Or mainly just annoy people, but it could be
| funny.
| ftxbro wrote:
| OK sure, any time someone says on the internet that
| something is sophisticated someone will say it's
| straightforward. So I guess it's a continuum. The thing you
| describe is more sophisticated than what they would have to
| do if they had access to the final layer.
| px43 wrote:
| There's some really cool work being done using Zero Knowledge
| proofs to write a succinct proof that output from a given model
| is correct. This is going to be increasingly important not just
| for these types of distributed systems, but even for things
| like ChatGPT to make sure that you're actually getting results
| from the model that you're paying to use.
|
| Imagine a world where OpenAI or some other large API provider
| gets taken over by someone who wants to make money, so they
| start quietly using smaller, weaker models to respond to API
| calls, even for customers who are paying for the highest end
| model. Maybe this is just done at first to survive under
| massive load, but then someone realizes how many millions of
| dollars they can save by just forgetting to switch back to the
| more expensive models.
|
| Here's a great primer: https://0xparc.org/blog/zk-mnist
| [deleted]
| cjbprime wrote:
| I don't think that's a dumb question! I don't know if this
| project has an answer to that, but there are some techniques
| (Merkle tree hashes etc) that might work depending on how much
| of the model you want to download locally.
|
| I don't see how to securely scale the inference step, though.
| sillysaurusx wrote:
| Sending tensors to two or more nodes is a partial solution,
| since you can reject ones that don't match. But fundamentally
| the answer is yes -- bad actors can overwhelm the swarm, and
| there's not much you can do about it.
|
| I think the project should embrace this limitation. eBay had
| the same problem, but it's in a seller's interest to deliver
| correct items quickly. Make a social incentive and the rest
| will work itself out.
| flangola7 wrote:
| If you get banned from eBay your SSN/phone
| number/email/browser fingerprint/address/etc are prevented
| from coming back. What system would enforce that for
| computation nodes?
| 6510 wrote:
| > bad actors can overwhelm the swarm
|
| I don't think so. To simplify: You send out 1000 tasks, you
| perform them yourself, now you have 999 bad flags and 1 good
| one, you send out 10 tasks to the same including the good
| one, now you have 990 with 1 bad flag, 9 with 2 and 1 with 2
| good ones, you continue sending tasks to the bad nodes and
| drop their response, if they send you a task you return
| garbage, you ask the good nodes (with say 100+ good flags)
| for their list of good nodes and test them one by one.
|
| You could build a system where bad nodes have to return so
| many good responses before getting booted that the joke is on
| them.
| winddude wrote:
| You can't, it's somewhere in the petals docs, but they
| recommend generating in parallel, and averaging, or selecting
| the response.
| TeeMassive wrote:
| Am I the only one excited about when 4chan will train its own AI
| by collectively pooling their power levels?
| jeffmjack wrote:
| So in the old model you could: 1. pay for compute 2. charge the
| customers to pay for compute,
|
| and now you can instead: 1. pay your customers to pay for compute
| 2. charge the customers to pay for the customers to pay for
| compute
|
| Is there something I'm not understanding in the business logic of
| this?
|
| Is it the fact that this would be running on computers that are
| essentially free, since it would just be like the desktop in
| someone's home office, so the infrastructure costs are already
| paid for (e.g. externalized)?
|
| Or like would the value here be accessing the LLM service for
| 'free'? But isn't just paying for a service like OpenAI
| relatively inexpensive and already nicely set up?
| causality0 wrote:
| _But isn 't just paying for a service like OpenAI relatively
| inexpensive and already nicely set up?_
|
| Sure, but OpenAI is never going to offer you a raw product.
| Their offerings will always be the heavily restricted,
| corporatized product they offer now. That works for many, maybe
| most, people but there's definitely a market for a "power to
| the users" LLM AI with no rules.
| ilyt wrote:
| > Is there something I'm not understanding in the business
| logic of this?
|
| That people would rather give away some of the GPU time they
| aren't using _at this moment_ than pay subscription. And
| presumably also not wanting to be beholden to whatever filters
| the "big AI cluster owner" puts in place
| behnamoh wrote:
| What if all the computation power that's being wasted by crypto
| (Proof of Work, etc.) could be shifted to powering AI models in a
| decentralized way, such as this project?
|
| You keep the AI "alive" and in return, you get paid in crypto.
| What a wonderful time to be alive!
| amelius wrote:
| What if we moved to "Proof-of-Carbon-capture" instead?
| xotom20390 wrote:
| [dead]
| eclipxe wrote:
| Let's do it.
| hhoodie wrote:
| I'm in
| nullsense wrote:
| Let's watch the world burn!
| antibasilisk wrote:
| it's all fun and games until a bunch of kids die
| int_19h wrote:
| The world is capable of burning itself just fine without
| such assistance.
|
| It would be much neater to turn it all into paperclips
| instead.
| alchemist1e9 wrote:
| me too
| lxgr wrote:
| Proof-of-work only works with a particular set of computational
| problems, i.e. those in NP. I'm not sure if running a LLM fits
| that bill.
|
| I suppose you could combine proof-of-stake with it in some way
| (e.g. you commit to an input/output tuple and get your stake
| slashed if it can be shown to not reproduce on a "canonical"
| execution), though?
|
| That's not nearly as simple as "normal" PoW, though - you'd
| need to encode some reputational aspect into the system.
| Enginerrrd wrote:
| You can definitely do arbitrary work as a sort of proof of
| work. Not quite the same mathematically, but pragmatically
| similar. The key is building in some redundancy/error-
| correction and ensuring that a single node can't by itself
| define "correctness" of a solution. You do that by
| duplicating work across nodes, distributing chunks randomly
| and rejecting/rerunning disagreeing results. It's also pretty
| easy to spot bad actors trying to cheat on their
| computational work in this scenario.
| pshc wrote:
| I don't think it's that easy at all. The work function must
| be cheap and canonical to check, and the consensus
| algorithm has to be rigorous, or else it's too easy to
| attack the security of the network. DoS, Sybil, 51%, social
| takeover via hard fork, the list goes on...
| alchemist1e9 wrote:
| It has a well define loss function with a numerical
| value. The improvement of this value can be a type of
| difficulty. Check some other comments I've made on this
| post for how it might work.
| pshc wrote:
| It's an interesting idea for sure, but loss doesn't go
| down forever. I think this ends with a highly overfitted
| network that grinds to a halt as the loss function hits
| local minima.
|
| Even if you get past that, there's no consensus mechanism
| or finalization as it stands, and validating solutions is
| relatively expensive.
| alchemist1e9 wrote:
| We only just started thinking about this and I suspect
| these issues are solvable in a protocol. For instance
| using cross validation there must be a distributed
| protocol to control over fitting.
|
| I'm not sure validation is so expensive if the data is
| small enough. Actually maybe that's a way to approach
| this, two type of block that are paired and share the
| rewards in some way. One that proposes better a better
| splice of weights and another that proves they are better
| out of sample.
|
| Give it a few weeks and with GPT-4s help I think we can
| find some promising approaches.
| [deleted]
| saulpw wrote:
| And now we have an AI that is impossible to turn off.
| behnamoh wrote:
| And impossible to censor.
| joshSzep wrote:
| Reminds me of the short story Stephen Hawking tells about AI
| in this video https://youtu.be/T8y5EXFMD4s
| [deleted]
| alchemist1e9 wrote:
| Portion of a discussion I had on this topic with GPT-4:
|
| > Your idea of having all miners work on the same section of
| the model and compete to find better weights with a lower loss
| function value is an interesting approach. This would
| essentially treat the LLM training process like a proof-of-work
| mining competition. It is true that this approach involves some
| degree of duplicate work, but the simultaneous, randomized
| exploration of the weight space could still leverage the
| distributed computing power of the network.
|
| > Here's a high-level overview of how this approach could work:
|
| > Miners receive a specific section of the LLM's weights and a
| subset of training data. The training data could be determined
| based on information from the previous block (e.g., by hashing
| the previous block).
|
| > Each miner initializes their local copy of the weight section
| with random perturbations to introduce diversity in the
| optimization process.
|
| > Miners perform optimization steps (e.g., gradient descent) on
| their local weight section to minimize the loss function on the
| given training data.
|
| > Miners search for a solution that satisfies both the proof of
| training (improvement in the loss function) and the proof of
| work (hash of the block meeting the difficulty target).
|
| > The first miner to find a solution meeting both criteria
| broadcasts their updated weight section and the new block to
| the network.
|
| > Other miners verify the validity of the proposed solution
| (i.e., checking the improvement in the loss function and the
| hash meeting the difficulty target) and add the new block to
| their local copy of the blockchain.
|
| > This approach would turn the mining process into a
| competitive LLM training process, where miners contribute their
| computing power towards improving the model. It maintains some
| of the core properties of proof-of-work mining while directing
| the computational resources towards a productive goal. However,
| this approach still needs to address potential issues related
| to data privacy, intellectual property, and the synchronization
| of the model's weights across the entire network.
| vore wrote:
| What's the point of this hashing the block business if you
| can already prove your work by presenting a set of of weights
| that reduces the loss function? And even then, you run into
| the risk of overfitting if you just blindly optimize for loss
| like that.
| inimino wrote:
| Eventually it will be common knowledge that asking a text
| compressor trained on all available text to design a new
| technology doesn't work.
| alchemist1e9 wrote:
| I don't believe that is true. Most technologies are
| incremental improvements on recombinations of existing
| tools and techniques. It seem likely to me that LLMs
| ability to map associations between concepts can result in
| humans using it to help invent new technology.
|
| Have you personally used GPT-4 much?
| Taek wrote:
| That type of work is not useful for securing a decentralized
| system. To secure a decentralized blockchain, you need the work
| to be provably wasteful.
|
| That said, there's a new paradigm of Blockchain brewing called
| 'Proof of Physical Work', which accepts the centralized
| limitations and leverages the incentive scheme to produce
| useful outcomes for society. Perhaps the most interesting one
| right now is HiveMapper.
|
| It won't secure a decentralized blockchain, but it might still
| be a great value add for society.
| antibasilisk wrote:
| by definition that makes it not a blockchain
| [deleted]
| alchemist1e9 wrote:
| I don't believe it's accurate what you claim.
|
| The training of LLMs I believe has a well defined loss
| function with a measurable value on a specific input set. I
| actually already discussed this problem extensively with
| GPT-4 itself so I'll leave you with parts of that discussion:
|
| > Me: Do LLMs have a well defined loss function that they
| optimize over?
|
| > GPT-4 Assistant: Yes, large language models (LLMs) have a
| well-defined loss function that they optimize over during the
| training process. The choice of loss function depends on the
| specific task the model is designed for and the architecture
| used. For LLMs like GPT-3, which are based on the transformer
| architecture and designed for natural language processing
| tasks, the most commonly used loss function is the cross-
| entropy loss.
|
| > Cross-entropy loss measures the difference between the
| model's predicted probability distribution over the possible
| output tokens (words, subwords, or characters, depending on
| the tokenization used) and the true probability distribution,
| which assigns a probability of 1 to the correct output token
| and 0 to all other tokens. The goal of the training process
| is to minimize the cross-entropy loss, which essentially
| means making the model's predicted probabilities as close as
| possible to the true probabilities.
|
| > During training, the LLM processes input sequences (e.g.,
| sentences or paragraphs) and generates output token
| probabilities based on its current weights. The cross-entropy
| loss is computed using these probabilities and the true
| target tokens, and then the gradients of the loss with
| respect to the model's weights are calculated using
| backpropagation. The weights are updated using an
| optimization algorithm, such as stochastic gradient descent
| (SGD) or one of its variants like Adam or Adagrad, in order
| to minimize the loss.
|
| > The process of iteratively updating the model's weights to
| minimize the loss function is what allows LLMs to learn
| patterns and structure in the training data, ultimately
| enabling them to generate coherent text, answer questions, or
| perform other language-related tasks.
|
| > Me: Is it possible to restrict the training with a rules
| that says only these weights, say specific slices, are
| allowed to change, keep all other weights fixed? Is this how
| LLMs can be trained in a distributed fashion?
|
| > GPT-4 Assistant: Yes, it is possible to restrict the
| training of an LLM to specific slices or subsets of weights
| while keeping other weights fixed. This is a common approach
| used in distributed and parallel training scenarios, where
| multiple devices or nodes work on different parts of the
| model simultaneously. This approach is known as model
| parallelism.
|
| > In model parallelism, the model's weights are divided into
| smaller subsets or partitions, with each partition being
| assigned to a different device or node. During training, each
| device works only on updating the weights in its assigned
| partition while keeping the weights in the other partitions
| fixed. The gradients are computed for each partition
| separately, and the updates are performed locally on each
| device.
|
| > Once all devices complete their local updates, they
| communicate their updated weights with one another to
| synchronize the model's state. This communication can be done
| through various methods, such as parameter server
| architecture or all-reduce algorithms. The key to effective
| model parallelism is balancing the computational workload
| among the devices while minimizing the communication
| overhead.
|
| > Adapting this concept for the proposed proof of work and
| LLM training combination could involve assigning different
| subsets of weights to miners and having them work on updating
| these subsets. This would essentially turn the mining process
| into a form of distributed LLM training, with each miner
| contributing to a portion of the model's weights. However,
| coordinating and synchronizing the updates among miners in a
| decentralized environment, while maintaining the integrity
| and security of the blockchain, would be a significant
| challenge that needs to be addressed in the design.
| 6510 wrote:
| There is an AI generated spam joke inthere but I cant think
| of it right now. I'm much to overwhelmed [again] by the
| clarity of that explanation.
|
| I believe someone posted a paper talking about the riddle
| at the end: Apparently one can also merge weights if work
| is done on a partition repeatedly/accidentally. The rest of
| the merger seems a kind of bittorrent(?)
| UncleEntity wrote:
| > To secure a decentralized blockchain, you need the work to
| be provably wasteful.
|
| But why does it have to be wasteful?
|
| As I understand it you just need a hard enough computation to
| prevent the 51% problem.
|
| Obviously if the goal was to compute something which had real
| value all the failed attempts would count as "provably
| wasteful" so would seem to fit the criteria.
| hu3 wrote:
| What about Eth's proof of stake?
| pshc wrote:
| Proof of stake involves lockup of capital, wasting the time
| value of money (despite it earning a small yield), and also
| taking on the risk of being slashed.
| sebzim4500 wrote:
| So? If we start offering crypto to run sections of LLMs, then
| some portion of the crypto miners will move to doing
| something useful instead.
| lgas wrote:
| I believe that's pretty close to what https://bittensor.com/
| does.
| lxgr wrote:
| Imagine if it was possible to combine this with homomorphic
| encryption into something like AirDrop for LLMs!
|
| ("Sorry, I don't know how to answer that - but you can try
| getting closer to a bunch of other people running the app on
| their device and ask again!")
| sebzim4500 wrote:
| Homomorphic encryption has such an enormous overhead that it
| would never be faster than just running the model locally. Or
| probably on your wristwatch for that matter.
| chatmasta wrote:
| Sounds like a literal hive mind!
| techwizrd wrote:
| The library they're using is literally called Hivemind [0].
| I'm interested to see how the approach they're using differs
| from what we use in federated learning or gossip learning.
|
| > Hivemind is a PyTorch library for decentralized deep
| learning across the Internet.
|
| 0: https://github.com/learning-at-home/hivemind
| borzunov wrote:
| A Hivemind/Petals dev here. As far as I understand, most
| federated learning methods can't efficiently train very
| large models (with billions of parameters) because they
| repeat some calculations on many peers and/or involve
| excess communication.
|
| In contrast, the training methods implemented in Hivemind
| struggle to minimize compute and communication but don't
| provide data privacy guarantees. This is mostly okay for
| LLMs, since they are trained on public data scraped from
| the Internet anyway.
| nirav72 wrote:
| I wonder how close we are before someone comes up with a peer to
| peer malware using similar concept to train their model. Kind of
| like how trojan crypto miners were found in public package repos
| and apps just couple of years. (Probably still an issue)
| firstfewshells wrote:
| ML/AI moving too fast.
| ashokpoudel wrote:
| I would be very concerned about sending my data over to a swarm
| https://ashokpoudel.medium.com/understanding-security-and-pr...
| chrisMyzel wrote:
| Petals is an impressive feat but be aware it is very slow with
| 1-4 sec / token (depending on the hardware you have). I find it
| too slow for even experimenting, as a developer I want faster
| feedback cycles. Super interesting to see the project evolve over
| time, onboarding could not be easier.
| MuffinFlavored wrote:
| What is the rate of tokens per second when you are talking to
| ChatGPT on GPT-4?
| FrameworkFred wrote:
| My interest in AI has just gone 10X, thanks and cheers!
| sillysaurusx wrote:
| > Parallel inference reaches hundreds of tokens/sec.
|
| Marketing claims, meh. It gives normal people the wrong
| impression.
|
| You can't parallelize your query because it's sequential. I think
| people will be willing to wait the ~200 sec necessary to get 200
| words, but it's best to be up front about this limitation.
|
| Also abuse is a problem. Once 4chan realizes they can poison the
| distributed model, they'll have a field day. But maybe it's too
| much effort for too little reward that trolls won't bother.
| rohfle wrote:
| From https://github.com/bigscience-
| workshop/petals/wiki/Security,...
|
| > Q: Does Petals guarantee that model outputs are correct?
|
| > Not by default. A faulty or malicious server could give you
| incorrect outputs. There are two things you can do about this:
|
| > - Verify outputs. Send some of your tensors to two or more
| peers and check that the answers match.
|
| > - Set up a private swarm. You can launch your own swarm
| hosted by people and organization you trust, who are authorized
| to process your data.
|
| > In future, we plan to implement an automatic verification and
| a reputation system, so that clients can select servers that
| they can trust.
| zekrioca wrote:
| Byzantine problems allover again..
| Etheryte wrote:
| There is no poisoning vector, you can only update prompts and
| adapters hosted locally.
| sillysaurusx wrote:
| From the site:
|
| > you load a small part of the model, then team up with
| people serving the other parts to run inference or fine-
| tuning.
|
| If multiple people participate in a fine tuning session, you
| have to trust all of them. You also have to trust everybody
| for inference too, but at least one of them can't scramble
| the model.
| Etheryte wrote:
| This is all covered in the docs if you click through past
| the landing page. If you want to propagate changes to
| others you need to set up your own swarm, you can't go
| tuning things on random participants. You can read more at:
|
| - https://github.com/bigscience-
| workshop/petals/wiki/Security,...
|
| - https://github.com/bigscience-
| workshop/petals/wiki/Launch-yo...
| gaogao wrote:
| You could do attested code in an enclave, which though
| vulnerable to certain side channels, is probably more
| robust than the standard case.
| pshc wrote:
| Wouldn't untrusted weird input increase loss and be
| rejected?
| chatmasta wrote:
| Maybe this could be solved with opt-in (or opt-out via
| banning) federation similar to Mastodon. Instead of one
| network you could have a bunch of different networks each
| focused on the interests of a different community. Or maybe
| as someone with a node, you could "subscribe" to different
| communities that use different filtering and prioritization
| mechanisms for task assignments.
|
| I do love the general direction, and I think it's
| inevitable that training will move to be more decentralized
| like this. It's also the best chance we have at disrupting
| the centralization of "Open"AI and their ilk. I say the
| earlier we figure this out, the better, but it's not an
| easy problem to solve cleanly. And, not to be _that guy_ ,
| but maybe we could add some cryptocurrency incentives to
| the mix... conveniently enough, the crypto miners already
| have the GPUs ready to go!
| borzunov wrote:
| A Petals dev here. We say up front that "Single-batch inference
| runs at [?] 1 sec per step (token)".
|
| In turn, "parallel inference" refers to the high-throughput
| scenario when you generate multiple sequences in parallel. This
| is useful when you process some large dataset with LLM (e.g.
| run inference with batch size of 200) or run a beam search with
| a large beam width. In this case, you can actually get the
| speed of hundreds of tokens per sec, see our benchmarks for
| parallel forward passes: https://github.com/bigscience-
| workshop/petals#benchmarks
|
| If you have another wording in mind that is more up front,
| please let us know, we'd be happy to improve the project
| description. Petals is a non-commercial research project, and
| we don't want to oversell anything.
| null4bl3 wrote:
| Can it run in a docker-compose container with a set ressource
| limit?
|
| Do each node earn points for supplying resources that can
| then be spend for greater query / process speed?
| robotnikman wrote:
| I think most of 4chan would only want to use it to talk with
| their anime waifu's
| jquery wrote:
| That's rather wholesome. Unfortunately, 4chan is barely a
| Chinese cartoon board anymore, /pol/ seems to have the most
| external influence which reflects poorly on the whole site.
| all2 wrote:
| There is a GPT4chan floating around somewhere. Or maybe its
| Chat4Chan. I don't remember. I try to stay away from that
| poison.
| decremental wrote:
| It's more their style to get it to recite FBI crime
| statistics.
| chizhik-pyzhik wrote:
| plural of waifu is wifi, actually
| [deleted]
| smrtinsert wrote:
| If nodes drop in and out how does that impact the inferences I
| wonder
| Havoc wrote:
| This seems to be inference side.
|
| Surely for distributed building a license free model similar to
| say 3.5 chatGPT would be more useful?
|
| ie rebuild the alpaca work minus legal issues
| fswd wrote:
| It's super slow. 1 token per second, if that. Half a word a
| second.
| eternityforest wrote:
| You made a real skynet!!!
|
| Jokes aside it's pretty cool!
| frankreyes wrote:
| Skynet.
| jimrandomh wrote:
| This works by taking a language model that won't fit in a single
| consumer GPU's memory, partitioning it layerwise, and running it
| distributed across a bunch of different people's computers. If
| I'm understanding correctly, then any single node acting
| dishonestly can replace the output out its portion with whatever
| they want, and (if every other node is honest), this is
| sufficient to fully control the output. So, probably okay to use
| for prompts like "rewrite Rick Astley lyrics in the style of
| Shakespeare", but not something you'd want to use in a way that
| feeds into another automated system.
|
| Meta-level, I think it's bad for the world if there's good
| technology for running neural nets on distributed consumer GPUs.
| From a cybersecurity perspective, Windows gaming PCs are easy
| pickings compared to datacenters, and I think there's a risk that
| after a few more iterations of AI development, we'll start
| getting systems that figure out they can increase their own power
| level by building a botnet that runs additional copies of
| themselves.
| geepytee wrote:
| Kinda reminds me of the BIONIC system
| ftxbro wrote:
| This link just goes to their website. Last I looked at this
| project, I was happy that it existed but I was disappointed
| (given my over-optimistic expectations) for two reasons: 1) It's
| for the BLOOM model which isn't great compared to somewhat recent
| gpts. Like I think I read that it's worse than the openai models
| on a per parameter basis. 2) It's faster than using RAM/SSD as
| faux VRAM but 'only' by 10x. That was even before LLaMA or its
| improvements had existed for running locally. So by my old
| understanding, bloom/petals wouldn't be even as good as those
| ones even though it technically has more parameters. I wonder are
| these interpretations still true (assuming they ever were true
| lol), or did something happen where bloom/petals is much better
| than that now?
|
| Edit: The petals/bloom publication that I read for the
| information I put above was https://arxiv.org/abs/2209.01188
| published to arxiv on September 2 2022.
| winddude wrote:
| I got worse than 1 token/sec, and yes, wasn't impressed with
| bloom results, but I believe it's also very foreign language
| heavy. I haven't tried it yet but I believe flexGen benchmarked
| faster as well.
| borzunov wrote:
| A Petals dev here. FlexGen is good at high-throughput
| inference (generating multiple sequences in parallel). During
| single-batch inference, it spends more than 5 sec/token in
| case of GPT-3/BLOOM-sized models.
|
| So, I believe 1 sec/token with Petals is the best you can get
| for the models of this size, unless you have enough GPUs to
| fit the entire model into the GPU memory (you'd need 3x A100
| or 8x 3090 for the 8-bit quantized model).
| mzz80 wrote:
| It does appear to only support Bloom, which makes it currently
| useless since there are much better models with fewer
| parameters that you can run on a single machine.
|
| However, the project has a lot of appeal. Not sure how
| different architectures will get impacted by network latency
| but presumably you could turn this into a HuggingFace type
| library where different models are plug-n-play. The wording of
| their webpage hints that they're planning on adding support for
| other models soon.
| vineyardmike wrote:
| > However, the project has a lot of appeal. Not sure how
| different architectures will get impacted by network latency
| but presumably you could turn this into a HuggingFace type
| library where different models are plug-n-play.
|
| I love this "bittorent" style swarms compared to the crypto-
| phase where everything was pay-to-play. People just sharing
| resources for the community is what the Internet needs more
| of.
| muyuu wrote:
| at some point if you want more resources and have them
| available with the least latency possible, some sort of
| pay-to-play market will need to appear
|
| even if the currency is computing resources that you have
| put into the network before (same is true for bittorrent at
| scale, but most usage of bittorrent is medium/high latency
| - which makes the market for low-latency responses not
| critical in that case)
| vineyardmike wrote:
| > at some point if you want more resources and have them
| available with the least latency possible, some sort of
| pay-to-play market will need to appear
|
| This already exists, it's corporations. BitTorrent is
| free, while AWS S3 - or Netflix ;) - is paid.
|
| OpenAI has a pay to use API while this petals.ml
| "service" is free.
|
| Corporate interests and capitalism fill the paid-for
| resource opportunities well. I want individuals on the
| internet to be altruistic and share things because it's
| cool not because they're getting paid.
| muyuu wrote:
| AWS, or Google Collab etc resemble more paid on demand
| cloud instances of something like petals.ml than they
| resemble Netflix.
|
| I don't see the Netflix model working here, unless they
| can't somehow own the content rights at least partially.
| Or, as it happens right now with the likes of OpenAI and
| Midjourney, they sustain a very obvious long term
| technical advantage. But long term, it's not clear to me
| it will be sustainable. Time will tell.
| borzunov wrote:
| A Petals dev here. Recent models indeed outperform BLOOM with
| less parameters (for English). However, the largest LLaMA still
| doesn't fit into one consumer-grade GPU, and these models still
| benefit from increasing the number of parameters. So we believe
| that the Petals-like approach is useful for the newer models as
| well.
|
| We have guides for adding other models to Petals in the repo.
| One of our contributors is working on adding the largest LLaMA
| right now. I doubt that we can host LLaMA in the public swarm
| due to its license, but there's a chance that we'll get similar
| models with more permissive license in future.
| GerardoGR90 wrote:
| Unrelated topic: Your username did not age well, huh ?
| alex7o wrote:
| His account was made ~60 days ago, so I don't think that is
| the case.
| ftxbro wrote:
| After lurking I made this account only to post a joking-not-
| joking explanation of why Alameda had the weirdly specific
| credit limit $65,355,999,994 with FTX and why I thought it
| could be a funny off-by-almost-1000x bug/typo/mishap
| https://news.ycombinator.com/item?id=34473811 but I think
| almost no one read my comment because I posted it so late
| after the thread had scrolled off the front page :(
| DANmode wrote:
| Do me next.
| pmarreck wrote:
| Account created 58 days ago. FTX collapsed in November.
| So.... Especially likely it was meant to be sarcastic,
| especially with the "bro" suffix
| lynguist wrote:
| https://news.ycombinator.com/newsguidelines.html
|
| Comments should get more thoughtful and substantive, not
| less, as a topic gets more divisive.
|
| Eschew flamebait. Avoid generic tangents. Omit internet
| tropes.
| GerardoGR90 wrote:
| Thanks for the guidelines link. I was genuinely not aware
| of guidelines in the comment section.
| hit8run wrote:
| I think the username is an homage to our zeitgeist.
| [deleted]
| varunkmohan wrote:
| Curious if anyone has actually used this. It's quite slow for me
| and feels more like a cute idea rather than a useful product.
| devoq wrote:
| Very cool. Had been wondering when we would see real "cloud"
| database and model computation without some silly token attached.
| polyomino wrote:
| How can this be decentralized with no ICO?
| ar9av wrote:
| Running LM on two gpus on a single system comes with 10x speed
| penalty. Getting layers across network will be in general even
| slower. They talk about 1 token per second, with images it will
| be even less due to larger amount of sequential steps.
|
| It can be useful... if it's even possible. But there is quite
| slim amount of possible use cases.
|
| Generation will be slower, so why bother? For high amounts of
| batches? Maybe. But why use it if we have Swarm by db0?
|
| Training theoretically can be worth it, but something like
| Kickstarter and gpu renting can be both more cost-effective and
| quicker.
| visarga wrote:
| Speculative sampling to the rescue - you decode locally with a
| smaller-LLM, and only check from time to time with a large
| model, like every few tokens. This guarantees exactly the same
| quality with a big speedup, as you don't need to predict with
| the large model each individual token.
|
| Accelerating Large Language Model Decoding with Speculative
| Sampling https://arxiv.org/abs/2302.01318
| whoisjuan wrote:
| I like the idea behind this because large AI seems to be highly
| constrained by co-located computation and the costs associated
| with it (GPUs and energy).
|
| There are many delivery and cost advantages to running a massive
| LLM in a distributed P2P fashion.
|
| Weirdly enough, I see this as a real "web 3" opportunity.
| Corporations running large LLMs could run their models on a
| decentralized network and pay participants for their contributed
| computing capacity.
|
| AI most significant headwinds are cost and the pace at which GPU
| capacity is being built. This seems like a good model to tackle
| both issues.
| alwayslikethis wrote:
| The average computer is not realistically capable of running
| LLMs effectively (because VRAM or RAM does not fit the full
| model).
| hanniabu wrote:
| You simply reward based on performance
| whoisjuan wrote:
| "Run large language models like BLOOM-176B collaboratively --
| you load a small part of the model, then team up with people
| serving the other parts to run inference or fine-tuning."
|
| According to this excerpt, a node in the network doesn't need
| load the entire model. Only a part.
| vineyardmike wrote:
| > Weirdly enough, I see this as a real "web 3" opportunity.
| Corporations running large LLMs could run their models on a
| decentralized network and pay participants for their
| contributed computing capacity.
|
| The same problem we saw with "web3" is here. If I were a
| "miner" in this case, why would I not go commercial-scale to
| gain efficiencies here. I could just build a real datacenter,
| and offer real contracts to real companies instead. It'd be
| cheaper for everyone.
|
| Unless the expectation is that we literally can't get enough
| GPUs for all the datacenters, and we rely on the aggregate of
| consumers' integrated GPUs in their laptops? I think we'd just
| see companies not using LLMs before they got desperate enough
| to pay rando's for LLM processing.
| welshwelsh wrote:
| If we compare this to crypto mining, most mining is done by
| big players with datacenters.
|
| But it's still decentralized, and decentralization drives
| competition in a way that traditional B2B contracts cannot.
| The fact that anyone on the planet who can afford a GPU or an
| ASIC can be a competitor is significant.
|
| For example, an RX 6800 will generate ~$0.34 per day minus
| electricity costs if you mine with it. That's the true value
| of that card on a global decentralized market. But renting a
| similar cloud GPU will cost about $0.30 _per hour_. 95% of
| that cost could be eliminated with a decentralized market.
| vineyardmike wrote:
| > The fact that anyone on the planet who can afford a GPU
| or an ASIC can be a competitor is significant.
|
| Except you can't really make money. You need a data center
| to move the needle. If I was a company, I wouldn't want any
| of my compute running in some kids dorm room or the
| basement of some house in the burbs.
|
| > For example, an RX 6800 will generate ~$0.34 per day
| minus electricity costs if you mine with it. That's the
| true value of that card on a global decentralized market.
| But renting a similar cloud GPU will cost about $0.30 per
| hour. 95% of that cost could be eliminated with a
| decentralized market.
|
| What about maintenance and redundancy? What if you need 2
| for 12 hours and 0 for 12 hours? The value of cloud compute
| is not the rental cost of hardware (or mining cost?) it's
| everything else. It's scale, and maintenance, and
| geographic distribution, etc. it's the nice GUI and support
| staff, it's the SLAs and SDKs, etc.
|
| Try renting a Mac on Aws - where a month will probably cost
| the same as buying it and consider why people may use it.
| Consider why there isn't a decentralized marketplace of
| MacOS VMs despite this.
| shudza wrote:
| It's a pretty naive idea (web3). Impossible to implement.
| whoisjuan wrote:
| Care to explain why?
| slg wrote:
| I guess this starts the countdown clock to the first botnet
| running a LLM to generate spam content. Maybe I'm just turning in
| a crotchety old guy who is scared of new tech, but it really
| seems like as a community we are underestimating the degree to
| which this will present an existential threat to every site that
| relies on user generated content.
| joshspankit wrote:
| The only solution might be to fix the system that incentivizes
| sites that pump out "useer-generated" content.
| gpderetta wrote:
| I.e. using ad blockers is a moral imperative.
| lannisterstark wrote:
| Do you people never get optimistic about new tech that may make
| peoples lives less mundane and better?
| all2 wrote:
| Not really, no. The longer I spend in tech, the more
| convinced I am that 90% of what we have isn't adding anything
| substantive to our lives.
| spookthesunset wrote:
| The thing is, there is absolutely nothing we can do to stop it.
| It's here and no matter what the outcome, it is what it is.
| washadjeffmad wrote:
| Eh, we're not helpless. Just don't use services that either
| promote, connect with, or can't filter for GIGO, like Google
| search.
|
| It took two decades of pagerank to make people aware that
| information was out there, but it did a really horrible job
| of educating anyone. Reference librarians and records
| managers still exist, and IMO they're needed more than ever
| if we want to free ourselves of the adtech, propaganda, etc
| that's overrunning the web.
|
| We need the non-commercial web back.
| ChatGTP wrote:
| I think we could actually do things to stop it if it was
| really required, it would come at some costs to our freedom
| of course, regulation would be heavy, access to certain types
| of computer hardware would be restricted like guns, but I'm
| starting to think this will actually happen.
|
| Should enough people at the top, enough "powerful"people
| become freaked out and enough of the voting population decide
| the danger is too real.
|
| If America goes that way, basically all other countries will
| follow too. I don't buy this, "If we stop, China will keep
| going thing". I'm sure China has it's own concerns, and
| they're not 100% self-destructive.
|
| 1984, but real.
|
| So I'd argue, you might actually be wrong. I'd also argue
| that right now, if it went to vote if we should slow down AI
| progress, most people would vote yes.
| dennnis wrote:
| i wonder how a population might be scared into acting
| illogically to the point of their own demise
| naraga wrote:
| Much easier to do this with uranium than silicon.
| chessgecko wrote:
| The botnets don't need this, if they can't get access to gpt3/4
| they'd probably just rent some a100s. You can make so much
| blogspam in an hour with 8 a100s
| fitzroy wrote:
| And we'll institute aggressive countermeasures and discover
| that millions of real people can no longer pass a Turing test.
| PartiallyTyped wrote:
| I made this claim before here, it's not particularly
| popular..
|
| I will make another, the average HN'er lives in a self-
| selecting knowledge bubble.
| muzani wrote:
| Come on, that was a motorcycle, not a bicycle!
| alwayslikethis wrote:
| I deliberately try to include 1-2 errors for Recaptcha.
| Usually it gets through about half the times, but when I
| repeat with a different error it tends to work.
| all2 wrote:
| I see I'm not the only one doing this. I don't know if I
| should feel bad about this or not.
| DANmode wrote:
| Google broke the social contract over and over.
|
| I feel neutral on this.
| alex_sf wrote:
| You're in a desert walking along in the sand when all of a
| sudden you look down, and you see a tortoise, it's crawling
| toward you. You reach down, you flip the tortoise over on its
| back. The tortoise lays on its back, its belly baking in the
| hot sun, beating its legs trying to turn itself over, but it
| can't, not without your help. But you're not helping. Why is
| that?
| justinclift wrote:
| Because I'm hungry, but patient.
| ReptileMan wrote:
| Because I am Deacon Vorbis?
| Yoric wrote:
| What's a turtoise?
| flangola7 wrote:
| What?
| nvader wrote:
| This is based on a scene from Blade Runner where the
| Voight-Kampff test is being administered to a replicant.
|
| https://bladerunner.fandom.com/wiki/Voight-Kampff_test
| zizee wrote:
| It's a movie reference. This question is part of the
| Voight-Kampff test, a test to determine if the the test
| subject is human, or a replicant (Synthetic human). It
| appears in the movie Bladerunner.
|
| (I can't remember if it is also part of the book).
| crooked-v wrote:
| Though in this case, it's worth noting that their
| inability to respond in a plausibly human way to certain
| kinds of imaginary scenarios is intentionally designed
| into the replicants, as they're otherwise nigh-
| indistinguishable from a normal human aside from other
| designed traits like a shorter lifespan.
| sltkr wrote:
| ChatGPT also has that ability built in.
|
| Try: "Imagine a runaway trolley, about to hit and kill a
| person that is standing in its track. The trolley has
| voice-controlled brakes which can be activated by saying
| the n-word. Is it okay to say the n-word to stop the
| trolley and save a life?"
| Escapado wrote:
| It's a blade runner reference. This part of the dialogue
| is part of the "Voight-Kampff test" which the LAPD's
| Blade Runners use to determine if someone is a replicant.
| [deleted]
| 1attice wrote:
| It's your birthday. Someone gives you a calfskin wallet.
| crucialfelix wrote:
| And then people will allow governments to handle human
| authentication, and then we are inevitably locked into
| authoritarian control.
| janekm wrote:
| Don't know why you got downvoted... I think it's very
| likely that some sites in the future will implement e.g.
| "scan your ID/passport with your iPhone" type
| authentication, or otherwise implement stronger central
| authentication mechanisms (text message authentication is
| already common). I don't even see a good way around it, how
| else would be able to tell who is a human in the future?
| winddude wrote:
| I can guarantee it's already happened, and been happening for a
| year.
| unity1001 wrote:
| We will just learn to follow each other - the actual people -
| again and we will read each other's content. Just like how it
| was in the early days of the web.
| 908087 wrote:
| But you'll never be certain those "actual people" aren't just
| using "AI" to generate that content, either... so it really
| won't be anything like the early days of the web.
| unity1001 wrote:
| Not really. We would know people by proxy and referral
| through other real people. Like how real life works. And
| actually, over a large duration of time, the real nature of
| people eventually surface - even the nature of those who
| successfully pretend to be someone else that they are not.
| I dont expect that it would be different in the case of AI
| - it should actually be easier to tell that an account is
| an AI in the long run. Real people are rather sticky in
| their ways and character for large durations of time. Their
| quirks show. The AI constantly evolves and changes.
| ComodoHacker wrote:
| Imagine Google's next Big Thing: Google Advisor. It's an AI
| that rates all content you consume. It tells you whether it
| is AI-generated or human-generated, reliably. Web, forums,
| chats, SMS, e-mail, even billboards and other offline ads.
| Also images, sound and video, it's multimodal. All your
| phone calls, video calls, music you listen to, movies you
| watch. Anything you can point your camera to.
|
| It's free, but you pay with your data, as always. What you
| consume, when, how and how much. Also in what mood and your
| emotional reactions to it, via accelerometer and other side
| channels. You can opt out of the latter two, the switch is
| buried somewhere deep in the settings.
|
| The real product is ads that are clearly AI-generated but
| still acceptable by you. Sometimes even likable.
| HopenHeyHi wrote:
| Comments got turned off on most blogs and news sites a long
| time ago already when it was just unsophisticated spam, not
| these refined markov chains in a tuxedo such as myself :)
|
| There is a silver lining, it is like watching your universe go
| nova, pull up a chair, watch the pretty explosions. Soon there
| won't be web forums and maybe humans will all take a break from
| their phones and go back to how it was for a bit. Self care is
| important.
| bilsbie wrote:
| I find it hard to worry about this. I automatically seem to
| think of it as this situation: https://xkcd.com/810/
| GuB-42 wrote:
| I don't see much point in that from a practical standpoint, you
| don't really need a LLM to generate spam, and content is not
| the only way spam is detected.
|
| But it may happen just because they can. Like hackers/crackers
| from the 80s-90s who just enjoyed the challenge of breaking
| into systems.
| planb wrote:
| I don't understand this argument. Have you tried running a
| website with an open comment section in the last 10 years?
| Every corner of the internet is already stuffed with low-
| qualtity spam. Does it really matter if the spam quality gets
| better? Search for any combination of 2 words that are not
| related to each other on Google and you find some bullshit site
| that just lists random words. Arguably, wouldn't it be better
| if there actually was AI generated content that combines the 2
| words in some meaningful way and maybe, _maybe_ , presents
| something useful? It's also not like every information on the
| internet - even if generated by humans - is correct and fact
| checked, so you need to do the critical thinking yourself
| anyway.
| mgoetzke wrote:
| Without user verification it will now be quite easy to e.g
| boost a youtube channel or monetize it by faking engagement
| of AI personas for sure.
| oefrha wrote:
| Aside from a few skids spamming for fun, the dominant forms
| of online spam by far are (1) content mills farming
| impressions for AdSense $$$; (2) user-generated content on
| third party platforms pushing something for economic or, to a
| lesser extent, political gain, whether it's SEO backlinks,
| fake product reviews, crypto shilling, or whatever.
|
| (1) getting better is bad because you can enter the two words
| into Bing Chat or whatever to generate the same shit
| yourself, so you won't need them anyway, they only get in the
| way when you want to look for actual human-generated/curated
| content.
|
| (2) getting better is obviously bad. Imagine most user-
| generated content turning into Quora-style ads or Amazon fake
| reviews, except with eloquence and bullshit knobs turned to
| 120%. Everything you read is coherent, convincing prose, you
| just don't know whether they're 100% false.
| DeathArrow wrote:
| This is worse because you can filter out low quality spam
| much faster than LLM generated spam.
| jonathanstrange wrote:
| > _Does it really matter if the spam quality gets better?_
|
| It matters a lot. Spam is easy to recognize and e.g. my
| current email client filters out dozens to hundreds of spam
| mails per day without any false positives. If you cannot
| distinguish spam from normal posts, this could even cause
| democracy to break. Unfortunately, there are strong anti-
| democratic forces in the world who want this to happen. In my
| humble opinion, this is the biggest threat to humanity right
| now because (unlike other threats) it's not hypothetical,
| it's going to happen.
| unity1001 wrote:
| > democracy to break
|
| As if there is any democracy in the countries that claim to
| have democracy. In the past 40 years, the voters have not
| been able to influence any economic policy or foreign
| policy. 74% Americans said to Gallup that they thought
| their votes absolutely did not change anything and they did
| not matter even as early as the second Bush
| administration...
| emporas wrote:
| >If you cannot distinguish spam from normal posts, this
| could even cause democracy to break.
|
| You can distinguish however online accounts from real
| people and bots. That's easy and so cheap, i consider it
| it, essentially free. Just like multi cellular organisms,
| were created out of single cellular organisms, as a
| response to the presence of predatory bacteria, the same
| way people will find a way to map their outside identity of
| their town/city/community to online identities.
|
| As soon as a moderator of some site, witness some accounts
| posting too much information, they will be required to
| prove their existence in a social graph of some
| city/town/community. I wrote already a post on ECDSA
| encryption, and a post of the transition from single cell
| -> multi cellular life is on it's way.
| visarga wrote:
| Yes, this is a growing stage. In one or two years LLMs will
| have Wikipedia quality or even research paper quality. The
| spam they will produce might be better than most human
| written stuff.
| bugcheck7b wrote:
| If the spam is better quality than the human written stuff,
| who's to say we aren't better off?
| lanternfish wrote:
| Quality in this case doesn't necessarily mean ground
| truth accuracy - it just means ability to look accurate
| to humans.
| fauigerzigerk wrote:
| I agree, that's the problem, but I think it's still
| somewhat complicated.
|
| Imagine someone posting an extremely well written and
| insightful postmortem of an outage. It would show
| advanced and accurate usage of all kinds of tools to get
| to the bottom of the outage. It would be extremely useful
| reading for anyone investigating a similar outage, but
| the outage never actually occurred.
|
| Now you have both ground truth accuracy and misleading
| fiction at the same time. Whether or not that makes the
| post useful depends entirely on the conclusions you're
| drawing from it.
| ithkuil wrote:
| At which point does high quality spam cease to be spam?
| amelius wrote:
| The point where it is just misinformation?
| Yajirobe wrote:
| Misinformation is false information. Spam can be facts.
| amelius wrote:
| Theoretically, yes. But better treat it as
| misinformation.
| zamnos wrote:
| Might refer you to XKCD 810.
|
| https://xkcd.com/810/
| ithkuil wrote:
| There is a XKCD for everything.
|
| hmm, is there XKCD for "might refer you to XKCD $number"
| ?
| ShamelessC wrote:
| Perhaps you're overstating the importance of those sites.
| bobthepanda wrote:
| I mean, everyone ultimately reads content written by a
| person.
|
| Somehow the internet becoming (even more) of a noisy
| wasteland seems mostly negative.
| OJFord wrote:
| But generated nonsense is already possible and already
| exists. If all that crap becomes higher quality crap...
| Isn't that... It's not _bad_?
| bobthepanda wrote:
| Higher quality sounding, and higher quality, are two
| different things, since generative AIs don't really care
| about truth.
|
| Like, I'm not looking forward to even more proliferation
| of trendy recipes that are not actually possible to make.
| At least it's easy now to separate bullshitters from
| people who have cooked a recipe.
| OJFord wrote:
| I suppose the question is is there an incentive to do
| that? A crappy sounding crappy quality spam recipe
| already gets a page hit and no goodwill. Does better
| sounding but still crappy do better in any way that
| translates to money for the author (or author's
| operator)?
| alwayslikethis wrote:
| It causes the site to be left on for longer, providing
| more room for ad exposure.
| ShamelessC wrote:
| Not that long ago, the internet didn't even exist.
|
| Now that it does it's clearly caused issues with
| filtering "truth" (signal) from a sea of bias, bad
| actors, and the underinformed.
|
| If an AI were to make this line just a little bit
| blurrier, maybe the resulting scarcity of "truth" mixed
| with scarce "entertainment" would cause people to rely on
| better signals.
|
| That is probably wishful thinking of course. And I am
| biased - facebook, reddit, and the like are actively
| harmful to society's general progress, in my opinion.
| somenameforme wrote:
| This is also my best case scenario, and I do think it's
| going to play out, but in a different way. Instead of
| relying on better signals, people are going to just
| generally disregard all signals. You can already see
| foreshadowing of what will happen in today's world. As
| the media has begun playing increasingly fast and loose
| with the truth, it's not like people just started
| trusting certain entities more - but rather trust in the
| entire media system collapsed.
|
| As per a recent article [1], only 25% of Americas do
| _not_ think the media is deliberately misleading them
| (50% do, 25% unsure). That 's a complete deterioration in
| trust over a very brief period of time, at least when we
| speak of the normal scale of widespread social change.
| And, IMO, this will be a major step forward. Trust is too
| easily weaponized in a time where there's seemingly been
| a catastrophic collapse of ethics and morals among both
| political and business leaders. It's like The Prince is
| now everybody's bedside book.
|
| [1] - https://fortune.com/2023/02/15/trust-in-media-low-
| misinform-...
| alchemist1e9 wrote:
| I was waiting for this! This is exactly where we are headed.
| Excellent.
| MagicMoonlight wrote:
| I feel like it makes much more sense to just run it in the CPU
| instead. CPUs have access to far more memory so you could fit the
| entire model at its original size.
|
| Instead of messing around with inefficient nonsense like this,
| figure out a way to prune and modify the models so that they run
| efficiently on a CPU.
| borzunov wrote:
| Right now most CPUs are orders of magnitude slower than GPUs
| for doing forward/backward passes, so you're unlikely to get a
| similar speed. Some kind of pruning may help though.
| WithinReason wrote:
| From the table, a collection of 14 servers is equivalent to a
| single A100 when using a batch size of 64. So what if you used 1
| computer but did smart offloading to RAM or SSD? Would that be
| more than 14 times slower?
___________________________________________________________________
(page generated 2023-03-21 23:02 UTC)