[HN Gopher] Run 100B+ language models at home, BitTorrent-style
       ___________________________________________________________________
        
       Run 100B+ language models at home, BitTorrent-style
        
       Author : srameshc
       Score  : 677 points
       Date   : 2023-03-20 22:50 UTC (1 days ago)
        
 (HTM) web link (petals.ml)
 (TXT) w3m dump (petals.ml)
        
       | amrb wrote:
       | What I want to see is a bunch of RTX3060 mining cards being used
       | to run ML models. They did talk about bandwidth issues with under
       | 100mbit for servers tho we're doing this as a local cluster you
       | could run some ex-server network gear and be gold.
        
         | jhoelzel wrote:
         | i just thought this through building my new PC and house. The
         | bottleneck is always the NIC because the most you can do right
         | now is like 10gbe for conventional motherboards.
         | 
         | After that you also run into cabeling issues. Cat 8 for
         | instance also only does 40gbe max, which means for any more you
         | need to bundle up connections which comes with its own
         | problems.
         | 
         | Another point is that while mining, gpus still are independent
         | and not connected to each other. so each of them are restricted
         | to the max your PCIe port will give you too.
         | 
         | PCIe 4.0 has a maximum data transfer rate of 16 GT/s
         | (gigatransfers per second) per lane, which translates to 2 GB/s
         | (gigabytes per second) per lane. PCIe 4.0 can support up to 16
         | lanes, which means that it can provide a maximum data transfer
         | rate of 32 GB/s (gigabytes per second) in each direction
         | (upstream and downstream) on a x16 slot.
        
       | nothrowaways wrote:
       | This is pure genius if it works.
        
       | mdgrech23 wrote:
       | dis gonna be big
        
       | quickthrower2 wrote:
       | Dumb question from someone who knows not to much yet about LLMs.
       | How can you trust the other computers? Will I end up with a bunch
       | of swear words coming back from the other nodes that are playing
       | a prank?
        
         | techwizrd wrote:
         | I'm not entirely how the approach they're using works [0], but
         | I study federated learning and one of the highly-cited survey
         | papers has several chapters (5 and 6 in particular) addressing
         | potential attacks, failure modes, and bias [1].
         | 
         | 0: https://github.com/learning-at-home/hivemind
         | 
         | 1: https://arxiv.org/abs/1912.04977
        
         | ftxbro wrote:
         | They don't solve the problem completely, but they address the
         | problem in their publication as follows. The inference step
         | uses a lot of layers and the first and last layers have more
         | 'interpretability' so some of these are the ones that you solve
         | locally on your own computer. If they wanted to prank you, they
         | would have to use some sophisticated thing that probably
         | doesn't exist yet but it could still be possible. Also if it
         | becomes a problem they could do usual methods like run on
         | multiple other computers and use the majority solution.
        
           | sebzim4500 wrote:
           | >If they wanted to prank you, they would have to use some
           | sophisticated thing that probably doesn't exist yet but it
           | could still be possible.
           | 
           | Isn't the attack straightforward?
           | 
           | i) Take the model, freeze all the weights except the ones you
           | expect to be responsible for
           | 
           | ii) Finetune to produce whatever output you are looking for.
           | 
           | iii) Profit. Or mainly just annoy people, but it could be
           | funny.
        
             | ftxbro wrote:
             | OK sure, any time someone says on the internet that
             | something is sophisticated someone will say it's
             | straightforward. So I guess it's a continuum. The thing you
             | describe is more sophisticated than what they would have to
             | do if they had access to the final layer.
        
         | px43 wrote:
         | There's some really cool work being done using Zero Knowledge
         | proofs to write a succinct proof that output from a given model
         | is correct. This is going to be increasingly important not just
         | for these types of distributed systems, but even for things
         | like ChatGPT to make sure that you're actually getting results
         | from the model that you're paying to use.
         | 
         | Imagine a world where OpenAI or some other large API provider
         | gets taken over by someone who wants to make money, so they
         | start quietly using smaller, weaker models to respond to API
         | calls, even for customers who are paying for the highest end
         | model. Maybe this is just done at first to survive under
         | massive load, but then someone realizes how many millions of
         | dollars they can save by just forgetting to switch back to the
         | more expensive models.
         | 
         | Here's a great primer: https://0xparc.org/blog/zk-mnist
        
           | [deleted]
        
         | cjbprime wrote:
         | I don't think that's a dumb question! I don't know if this
         | project has an answer to that, but there are some techniques
         | (Merkle tree hashes etc) that might work depending on how much
         | of the model you want to download locally.
         | 
         | I don't see how to securely scale the inference step, though.
        
         | sillysaurusx wrote:
         | Sending tensors to two or more nodes is a partial solution,
         | since you can reject ones that don't match. But fundamentally
         | the answer is yes -- bad actors can overwhelm the swarm, and
         | there's not much you can do about it.
         | 
         | I think the project should embrace this limitation. eBay had
         | the same problem, but it's in a seller's interest to deliver
         | correct items quickly. Make a social incentive and the rest
         | will work itself out.
        
           | flangola7 wrote:
           | If you get banned from eBay your SSN/phone
           | number/email/browser fingerprint/address/etc are prevented
           | from coming back. What system would enforce that for
           | computation nodes?
        
           | 6510 wrote:
           | > bad actors can overwhelm the swarm
           | 
           | I don't think so. To simplify: You send out 1000 tasks, you
           | perform them yourself, now you have 999 bad flags and 1 good
           | one, you send out 10 tasks to the same including the good
           | one, now you have 990 with 1 bad flag, 9 with 2 and 1 with 2
           | good ones, you continue sending tasks to the bad nodes and
           | drop their response, if they send you a task you return
           | garbage, you ask the good nodes (with say 100+ good flags)
           | for their list of good nodes and test them one by one.
           | 
           | You could build a system where bad nodes have to return so
           | many good responses before getting booted that the joke is on
           | them.
        
         | winddude wrote:
         | You can't, it's somewhere in the petals docs, but they
         | recommend generating in parallel, and averaging, or selecting
         | the response.
        
       | TeeMassive wrote:
       | Am I the only one excited about when 4chan will train its own AI
       | by collectively pooling their power levels?
        
       | jeffmjack wrote:
       | So in the old model you could: 1. pay for compute 2. charge the
       | customers to pay for compute,
       | 
       | and now you can instead: 1. pay your customers to pay for compute
       | 2. charge the customers to pay for the customers to pay for
       | compute
       | 
       | Is there something I'm not understanding in the business logic of
       | this?
       | 
       | Is it the fact that this would be running on computers that are
       | essentially free, since it would just be like the desktop in
       | someone's home office, so the infrastructure costs are already
       | paid for (e.g. externalized)?
       | 
       | Or like would the value here be accessing the LLM service for
       | 'free'? But isn't just paying for a service like OpenAI
       | relatively inexpensive and already nicely set up?
        
         | causality0 wrote:
         | _But isn 't just paying for a service like OpenAI relatively
         | inexpensive and already nicely set up?_
         | 
         | Sure, but OpenAI is never going to offer you a raw product.
         | Their offerings will always be the heavily restricted,
         | corporatized product they offer now. That works for many, maybe
         | most, people but there's definitely a market for a "power to
         | the users" LLM AI with no rules.
        
         | ilyt wrote:
         | > Is there something I'm not understanding in the business
         | logic of this?
         | 
         | That people would rather give away some of the GPU time they
         | aren't using _at this moment_ than pay subscription. And
         | presumably also not wanting to be beholden to whatever filters
         | the  "big AI cluster owner" puts in place
        
       | behnamoh wrote:
       | What if all the computation power that's being wasted by crypto
       | (Proof of Work, etc.) could be shifted to powering AI models in a
       | decentralized way, such as this project?
       | 
       | You keep the AI "alive" and in return, you get paid in crypto.
       | What a wonderful time to be alive!
        
         | amelius wrote:
         | What if we moved to "Proof-of-Carbon-capture" instead?
        
           | xotom20390 wrote:
           | [dead]
        
         | eclipxe wrote:
         | Let's do it.
        
           | hhoodie wrote:
           | I'm in
        
             | nullsense wrote:
             | Let's watch the world burn!
        
               | antibasilisk wrote:
               | it's all fun and games until a bunch of kids die
        
               | int_19h wrote:
               | The world is capable of burning itself just fine without
               | such assistance.
               | 
               | It would be much neater to turn it all into paperclips
               | instead.
        
             | alchemist1e9 wrote:
             | me too
        
         | lxgr wrote:
         | Proof-of-work only works with a particular set of computational
         | problems, i.e. those in NP. I'm not sure if running a LLM fits
         | that bill.
         | 
         | I suppose you could combine proof-of-stake with it in some way
         | (e.g. you commit to an input/output tuple and get your stake
         | slashed if it can be shown to not reproduce on a "canonical"
         | execution), though?
         | 
         | That's not nearly as simple as "normal" PoW, though - you'd
         | need to encode some reputational aspect into the system.
        
           | Enginerrrd wrote:
           | You can definitely do arbitrary work as a sort of proof of
           | work. Not quite the same mathematically, but pragmatically
           | similar. The key is building in some redundancy/error-
           | correction and ensuring that a single node can't by itself
           | define "correctness" of a solution. You do that by
           | duplicating work across nodes, distributing chunks randomly
           | and rejecting/rerunning disagreeing results. It's also pretty
           | easy to spot bad actors trying to cheat on their
           | computational work in this scenario.
        
             | pshc wrote:
             | I don't think it's that easy at all. The work function must
             | be cheap and canonical to check, and the consensus
             | algorithm has to be rigorous, or else it's too easy to
             | attack the security of the network. DoS, Sybil, 51%, social
             | takeover via hard fork, the list goes on...
        
               | alchemist1e9 wrote:
               | It has a well define loss function with a numerical
               | value. The improvement of this value can be a type of
               | difficulty. Check some other comments I've made on this
               | post for how it might work.
        
               | pshc wrote:
               | It's an interesting idea for sure, but loss doesn't go
               | down forever. I think this ends with a highly overfitted
               | network that grinds to a halt as the loss function hits
               | local minima.
               | 
               | Even if you get past that, there's no consensus mechanism
               | or finalization as it stands, and validating solutions is
               | relatively expensive.
        
               | alchemist1e9 wrote:
               | We only just started thinking about this and I suspect
               | these issues are solvable in a protocol. For instance
               | using cross validation there must be a distributed
               | protocol to control over fitting.
               | 
               | I'm not sure validation is so expensive if the data is
               | small enough. Actually maybe that's a way to approach
               | this, two type of block that are paired and share the
               | rewards in some way. One that proposes better a better
               | splice of weights and another that proves they are better
               | out of sample.
               | 
               | Give it a few weeks and with GPT-4s help I think we can
               | find some promising approaches.
        
             | [deleted]
        
         | saulpw wrote:
         | And now we have an AI that is impossible to turn off.
        
           | behnamoh wrote:
           | And impossible to censor.
        
           | joshSzep wrote:
           | Reminds me of the short story Stephen Hawking tells about AI
           | in this video https://youtu.be/T8y5EXFMD4s
        
         | [deleted]
        
         | alchemist1e9 wrote:
         | Portion of a discussion I had on this topic with GPT-4:
         | 
         | > Your idea of having all miners work on the same section of
         | the model and compete to find better weights with a lower loss
         | function value is an interesting approach. This would
         | essentially treat the LLM training process like a proof-of-work
         | mining competition. It is true that this approach involves some
         | degree of duplicate work, but the simultaneous, randomized
         | exploration of the weight space could still leverage the
         | distributed computing power of the network.
         | 
         | > Here's a high-level overview of how this approach could work:
         | 
         | > Miners receive a specific section of the LLM's weights and a
         | subset of training data. The training data could be determined
         | based on information from the previous block (e.g., by hashing
         | the previous block).
         | 
         | > Each miner initializes their local copy of the weight section
         | with random perturbations to introduce diversity in the
         | optimization process.
         | 
         | > Miners perform optimization steps (e.g., gradient descent) on
         | their local weight section to minimize the loss function on the
         | given training data.
         | 
         | > Miners search for a solution that satisfies both the proof of
         | training (improvement in the loss function) and the proof of
         | work (hash of the block meeting the difficulty target).
         | 
         | > The first miner to find a solution meeting both criteria
         | broadcasts their updated weight section and the new block to
         | the network.
         | 
         | > Other miners verify the validity of the proposed solution
         | (i.e., checking the improvement in the loss function and the
         | hash meeting the difficulty target) and add the new block to
         | their local copy of the blockchain.
         | 
         | > This approach would turn the mining process into a
         | competitive LLM training process, where miners contribute their
         | computing power towards improving the model. It maintains some
         | of the core properties of proof-of-work mining while directing
         | the computational resources towards a productive goal. However,
         | this approach still needs to address potential issues related
         | to data privacy, intellectual property, and the synchronization
         | of the model's weights across the entire network.
        
           | vore wrote:
           | What's the point of this hashing the block business if you
           | can already prove your work by presenting a set of of weights
           | that reduces the loss function? And even then, you run into
           | the risk of overfitting if you just blindly optimize for loss
           | like that.
        
           | inimino wrote:
           | Eventually it will be common knowledge that asking a text
           | compressor trained on all available text to design a new
           | technology doesn't work.
        
             | alchemist1e9 wrote:
             | I don't believe that is true. Most technologies are
             | incremental improvements on recombinations of existing
             | tools and techniques. It seem likely to me that LLMs
             | ability to map associations between concepts can result in
             | humans using it to help invent new technology.
             | 
             | Have you personally used GPT-4 much?
        
         | Taek wrote:
         | That type of work is not useful for securing a decentralized
         | system. To secure a decentralized blockchain, you need the work
         | to be provably wasteful.
         | 
         | That said, there's a new paradigm of Blockchain brewing called
         | 'Proof of Physical Work', which accepts the centralized
         | limitations and leverages the incentive scheme to produce
         | useful outcomes for society. Perhaps the most interesting one
         | right now is HiveMapper.
         | 
         | It won't secure a decentralized blockchain, but it might still
         | be a great value add for society.
        
           | antibasilisk wrote:
           | by definition that makes it not a blockchain
        
           | [deleted]
        
           | alchemist1e9 wrote:
           | I don't believe it's accurate what you claim.
           | 
           | The training of LLMs I believe has a well defined loss
           | function with a measurable value on a specific input set. I
           | actually already discussed this problem extensively with
           | GPT-4 itself so I'll leave you with parts of that discussion:
           | 
           | > Me: Do LLMs have a well defined loss function that they
           | optimize over?
           | 
           | > GPT-4 Assistant: Yes, large language models (LLMs) have a
           | well-defined loss function that they optimize over during the
           | training process. The choice of loss function depends on the
           | specific task the model is designed for and the architecture
           | used. For LLMs like GPT-3, which are based on the transformer
           | architecture and designed for natural language processing
           | tasks, the most commonly used loss function is the cross-
           | entropy loss.
           | 
           | > Cross-entropy loss measures the difference between the
           | model's predicted probability distribution over the possible
           | output tokens (words, subwords, or characters, depending on
           | the tokenization used) and the true probability distribution,
           | which assigns a probability of 1 to the correct output token
           | and 0 to all other tokens. The goal of the training process
           | is to minimize the cross-entropy loss, which essentially
           | means making the model's predicted probabilities as close as
           | possible to the true probabilities.
           | 
           | > During training, the LLM processes input sequences (e.g.,
           | sentences or paragraphs) and generates output token
           | probabilities based on its current weights. The cross-entropy
           | loss is computed using these probabilities and the true
           | target tokens, and then the gradients of the loss with
           | respect to the model's weights are calculated using
           | backpropagation. The weights are updated using an
           | optimization algorithm, such as stochastic gradient descent
           | (SGD) or one of its variants like Adam or Adagrad, in order
           | to minimize the loss.
           | 
           | > The process of iteratively updating the model's weights to
           | minimize the loss function is what allows LLMs to learn
           | patterns and structure in the training data, ultimately
           | enabling them to generate coherent text, answer questions, or
           | perform other language-related tasks.
           | 
           | > Me: Is it possible to restrict the training with a rules
           | that says only these weights, say specific slices, are
           | allowed to change, keep all other weights fixed? Is this how
           | LLMs can be trained in a distributed fashion?
           | 
           | > GPT-4 Assistant: Yes, it is possible to restrict the
           | training of an LLM to specific slices or subsets of weights
           | while keeping other weights fixed. This is a common approach
           | used in distributed and parallel training scenarios, where
           | multiple devices or nodes work on different parts of the
           | model simultaneously. This approach is known as model
           | parallelism.
           | 
           | > In model parallelism, the model's weights are divided into
           | smaller subsets or partitions, with each partition being
           | assigned to a different device or node. During training, each
           | device works only on updating the weights in its assigned
           | partition while keeping the weights in the other partitions
           | fixed. The gradients are computed for each partition
           | separately, and the updates are performed locally on each
           | device.
           | 
           | > Once all devices complete their local updates, they
           | communicate their updated weights with one another to
           | synchronize the model's state. This communication can be done
           | through various methods, such as parameter server
           | architecture or all-reduce algorithms. The key to effective
           | model parallelism is balancing the computational workload
           | among the devices while minimizing the communication
           | overhead.
           | 
           | > Adapting this concept for the proposed proof of work and
           | LLM training combination could involve assigning different
           | subsets of weights to miners and having them work on updating
           | these subsets. This would essentially turn the mining process
           | into a form of distributed LLM training, with each miner
           | contributing to a portion of the model's weights. However,
           | coordinating and synchronizing the updates among miners in a
           | decentralized environment, while maintaining the integrity
           | and security of the blockchain, would be a significant
           | challenge that needs to be addressed in the design.
        
             | 6510 wrote:
             | There is an AI generated spam joke inthere but I cant think
             | of it right now. I'm much to overwhelmed [again] by the
             | clarity of that explanation.
             | 
             | I believe someone posted a paper talking about the riddle
             | at the end: Apparently one can also merge weights if work
             | is done on a partition repeatedly/accidentally. The rest of
             | the merger seems a kind of bittorrent(?)
        
           | UncleEntity wrote:
           | > To secure a decentralized blockchain, you need the work to
           | be provably wasteful.
           | 
           | But why does it have to be wasteful?
           | 
           | As I understand it you just need a hard enough computation to
           | prevent the 51% problem.
           | 
           | Obviously if the goal was to compute something which had real
           | value all the failed attempts would count as "provably
           | wasteful" so would seem to fit the criteria.
        
           | hu3 wrote:
           | What about Eth's proof of stake?
        
             | pshc wrote:
             | Proof of stake involves lockup of capital, wasting the time
             | value of money (despite it earning a small yield), and also
             | taking on the risk of being slashed.
        
           | sebzim4500 wrote:
           | So? If we start offering crypto to run sections of LLMs, then
           | some portion of the crypto miners will move to doing
           | something useful instead.
        
         | lgas wrote:
         | I believe that's pretty close to what https://bittensor.com/
         | does.
        
       | lxgr wrote:
       | Imagine if it was possible to combine this with homomorphic
       | encryption into something like AirDrop for LLMs!
       | 
       | ("Sorry, I don't know how to answer that - but you can try
       | getting closer to a bunch of other people running the app on
       | their device and ask again!")
        
         | sebzim4500 wrote:
         | Homomorphic encryption has such an enormous overhead that it
         | would never be faster than just running the model locally. Or
         | probably on your wristwatch for that matter.
        
         | chatmasta wrote:
         | Sounds like a literal hive mind!
        
           | techwizrd wrote:
           | The library they're using is literally called Hivemind [0].
           | I'm interested to see how the approach they're using differs
           | from what we use in federated learning or gossip learning.
           | 
           | > Hivemind is a PyTorch library for decentralized deep
           | learning across the Internet.
           | 
           | 0: https://github.com/learning-at-home/hivemind
        
             | borzunov wrote:
             | A Hivemind/Petals dev here. As far as I understand, most
             | federated learning methods can't efficiently train very
             | large models (with billions of parameters) because they
             | repeat some calculations on many peers and/or involve
             | excess communication.
             | 
             | In contrast, the training methods implemented in Hivemind
             | struggle to minimize compute and communication but don't
             | provide data privacy guarantees. This is mostly okay for
             | LLMs, since they are trained on public data scraped from
             | the Internet anyway.
        
       | nirav72 wrote:
       | I wonder how close we are before someone comes up with a peer to
       | peer malware using similar concept to train their model. Kind of
       | like how trojan crypto miners were found in public package repos
       | and apps just couple of years. (Probably still an issue)
        
       | firstfewshells wrote:
       | ML/AI moving too fast.
        
         | ashokpoudel wrote:
         | I would be very concerned about sending my data over to a swarm
         | https://ashokpoudel.medium.com/understanding-security-and-pr...
        
       | chrisMyzel wrote:
       | Petals is an impressive feat but be aware it is very slow with
       | 1-4 sec / token (depending on the hardware you have). I find it
       | too slow for even experimenting, as a developer I want faster
       | feedback cycles. Super interesting to see the project evolve over
       | time, onboarding could not be easier.
        
         | MuffinFlavored wrote:
         | What is the rate of tokens per second when you are talking to
         | ChatGPT on GPT-4?
        
       | FrameworkFred wrote:
       | My interest in AI has just gone 10X, thanks and cheers!
        
       | sillysaurusx wrote:
       | > Parallel inference reaches hundreds of tokens/sec.
       | 
       | Marketing claims, meh. It gives normal people the wrong
       | impression.
       | 
       | You can't parallelize your query because it's sequential. I think
       | people will be willing to wait the ~200 sec necessary to get 200
       | words, but it's best to be up front about this limitation.
       | 
       | Also abuse is a problem. Once 4chan realizes they can poison the
       | distributed model, they'll have a field day. But maybe it's too
       | much effort for too little reward that trolls won't bother.
        
         | rohfle wrote:
         | From https://github.com/bigscience-
         | workshop/petals/wiki/Security,...
         | 
         | > Q: Does Petals guarantee that model outputs are correct?
         | 
         | > Not by default. A faulty or malicious server could give you
         | incorrect outputs. There are two things you can do about this:
         | 
         | > - Verify outputs. Send some of your tensors to two or more
         | peers and check that the answers match.
         | 
         | > - Set up a private swarm. You can launch your own swarm
         | hosted by people and organization you trust, who are authorized
         | to process your data.
         | 
         | > In future, we plan to implement an automatic verification and
         | a reputation system, so that clients can select servers that
         | they can trust.
        
           | zekrioca wrote:
           | Byzantine problems allover again..
        
         | Etheryte wrote:
         | There is no poisoning vector, you can only update prompts and
         | adapters hosted locally.
        
           | sillysaurusx wrote:
           | From the site:
           | 
           | > you load a small part of the model, then team up with
           | people serving the other parts to run inference or fine-
           | tuning.
           | 
           | If multiple people participate in a fine tuning session, you
           | have to trust all of them. You also have to trust everybody
           | for inference too, but at least one of them can't scramble
           | the model.
        
             | Etheryte wrote:
             | This is all covered in the docs if you click through past
             | the landing page. If you want to propagate changes to
             | others you need to set up your own swarm, you can't go
             | tuning things on random participants. You can read more at:
             | 
             | - https://github.com/bigscience-
             | workshop/petals/wiki/Security,...
             | 
             | - https://github.com/bigscience-
             | workshop/petals/wiki/Launch-yo...
        
             | gaogao wrote:
             | You could do attested code in an enclave, which though
             | vulnerable to certain side channels, is probably more
             | robust than the standard case.
        
             | pshc wrote:
             | Wouldn't untrusted weird input increase loss and be
             | rejected?
        
             | chatmasta wrote:
             | Maybe this could be solved with opt-in (or opt-out via
             | banning) federation similar to Mastodon. Instead of one
             | network you could have a bunch of different networks each
             | focused on the interests of a different community. Or maybe
             | as someone with a node, you could "subscribe" to different
             | communities that use different filtering and prioritization
             | mechanisms for task assignments.
             | 
             | I do love the general direction, and I think it's
             | inevitable that training will move to be more decentralized
             | like this. It's also the best chance we have at disrupting
             | the centralization of "Open"AI and their ilk. I say the
             | earlier we figure this out, the better, but it's not an
             | easy problem to solve cleanly. And, not to be _that guy_ ,
             | but maybe we could add some cryptocurrency incentives to
             | the mix... conveniently enough, the crypto miners already
             | have the GPUs ready to go!
        
         | borzunov wrote:
         | A Petals dev here. We say up front that "Single-batch inference
         | runs at [?] 1 sec per step (token)".
         | 
         | In turn, "parallel inference" refers to the high-throughput
         | scenario when you generate multiple sequences in parallel. This
         | is useful when you process some large dataset with LLM (e.g.
         | run inference with batch size of 200) or run a beam search with
         | a large beam width. In this case, you can actually get the
         | speed of hundreds of tokens per sec, see our benchmarks for
         | parallel forward passes: https://github.com/bigscience-
         | workshop/petals#benchmarks
         | 
         | If you have another wording in mind that is more up front,
         | please let us know, we'd be happy to improve the project
         | description. Petals is a non-commercial research project, and
         | we don't want to oversell anything.
        
           | null4bl3 wrote:
           | Can it run in a docker-compose container with a set ressource
           | limit?
           | 
           | Do each node earn points for supplying resources that can
           | then be spend for greater query / process speed?
        
         | robotnikman wrote:
         | I think most of 4chan would only want to use it to talk with
         | their anime waifu's
        
           | jquery wrote:
           | That's rather wholesome. Unfortunately, 4chan is barely a
           | Chinese cartoon board anymore, /pol/ seems to have the most
           | external influence which reflects poorly on the whole site.
        
           | all2 wrote:
           | There is a GPT4chan floating around somewhere. Or maybe its
           | Chat4Chan. I don't remember. I try to stay away from that
           | poison.
        
           | decremental wrote:
           | It's more their style to get it to recite FBI crime
           | statistics.
        
           | chizhik-pyzhik wrote:
           | plural of waifu is wifi, actually
        
       | [deleted]
        
       | smrtinsert wrote:
       | If nodes drop in and out how does that impact the inferences I
       | wonder
        
       | Havoc wrote:
       | This seems to be inference side.
       | 
       | Surely for distributed building a license free model similar to
       | say 3.5 chatGPT would be more useful?
       | 
       | ie rebuild the alpaca work minus legal issues
        
       | fswd wrote:
       | It's super slow. 1 token per second, if that. Half a word a
       | second.
        
       | eternityforest wrote:
       | You made a real skynet!!!
       | 
       | Jokes aside it's pretty cool!
        
       | frankreyes wrote:
       | Skynet.
        
       | jimrandomh wrote:
       | This works by taking a language model that won't fit in a single
       | consumer GPU's memory, partitioning it layerwise, and running it
       | distributed across a bunch of different people's computers. If
       | I'm understanding correctly, then any single node acting
       | dishonestly can replace the output out its portion with whatever
       | they want, and (if every other node is honest), this is
       | sufficient to fully control the output. So, probably okay to use
       | for prompts like "rewrite Rick Astley lyrics in the style of
       | Shakespeare", but not something you'd want to use in a way that
       | feeds into another automated system.
       | 
       | Meta-level, I think it's bad for the world if there's good
       | technology for running neural nets on distributed consumer GPUs.
       | From a cybersecurity perspective, Windows gaming PCs are easy
       | pickings compared to datacenters, and I think there's a risk that
       | after a few more iterations of AI development, we'll start
       | getting systems that figure out they can increase their own power
       | level by building a botnet that runs additional copies of
       | themselves.
        
       | geepytee wrote:
       | Kinda reminds me of the BIONIC system
        
       | ftxbro wrote:
       | This link just goes to their website. Last I looked at this
       | project, I was happy that it existed but I was disappointed
       | (given my over-optimistic expectations) for two reasons: 1) It's
       | for the BLOOM model which isn't great compared to somewhat recent
       | gpts. Like I think I read that it's worse than the openai models
       | on a per parameter basis. 2) It's faster than using RAM/SSD as
       | faux VRAM but 'only' by 10x. That was even before LLaMA or its
       | improvements had existed for running locally. So by my old
       | understanding, bloom/petals wouldn't be even as good as those
       | ones even though it technically has more parameters. I wonder are
       | these interpretations still true (assuming they ever were true
       | lol), or did something happen where bloom/petals is much better
       | than that now?
       | 
       | Edit: The petals/bloom publication that I read for the
       | information I put above was https://arxiv.org/abs/2209.01188
       | published to arxiv on September 2 2022.
        
         | winddude wrote:
         | I got worse than 1 token/sec, and yes, wasn't impressed with
         | bloom results, but I believe it's also very foreign language
         | heavy. I haven't tried it yet but I believe flexGen benchmarked
         | faster as well.
        
           | borzunov wrote:
           | A Petals dev here. FlexGen is good at high-throughput
           | inference (generating multiple sequences in parallel). During
           | single-batch inference, it spends more than 5 sec/token in
           | case of GPT-3/BLOOM-sized models.
           | 
           | So, I believe 1 sec/token with Petals is the best you can get
           | for the models of this size, unless you have enough GPUs to
           | fit the entire model into the GPU memory (you'd need 3x A100
           | or 8x 3090 for the 8-bit quantized model).
        
         | mzz80 wrote:
         | It does appear to only support Bloom, which makes it currently
         | useless since there are much better models with fewer
         | parameters that you can run on a single machine.
         | 
         | However, the project has a lot of appeal. Not sure how
         | different architectures will get impacted by network latency
         | but presumably you could turn this into a HuggingFace type
         | library where different models are plug-n-play. The wording of
         | their webpage hints that they're planning on adding support for
         | other models soon.
        
           | vineyardmike wrote:
           | > However, the project has a lot of appeal. Not sure how
           | different architectures will get impacted by network latency
           | but presumably you could turn this into a HuggingFace type
           | library where different models are plug-n-play.
           | 
           | I love this "bittorent" style swarms compared to the crypto-
           | phase where everything was pay-to-play. People just sharing
           | resources for the community is what the Internet needs more
           | of.
        
             | muyuu wrote:
             | at some point if you want more resources and have them
             | available with the least latency possible, some sort of
             | pay-to-play market will need to appear
             | 
             | even if the currency is computing resources that you have
             | put into the network before (same is true for bittorrent at
             | scale, but most usage of bittorrent is medium/high latency
             | - which makes the market for low-latency responses not
             | critical in that case)
        
               | vineyardmike wrote:
               | > at some point if you want more resources and have them
               | available with the least latency possible, some sort of
               | pay-to-play market will need to appear
               | 
               | This already exists, it's corporations. BitTorrent is
               | free, while AWS S3 - or Netflix ;) - is paid.
               | 
               | OpenAI has a pay to use API while this petals.ml
               | "service" is free.
               | 
               | Corporate interests and capitalism fill the paid-for
               | resource opportunities well. I want individuals on the
               | internet to be altruistic and share things because it's
               | cool not because they're getting paid.
        
               | muyuu wrote:
               | AWS, or Google Collab etc resemble more paid on demand
               | cloud instances of something like petals.ml than they
               | resemble Netflix.
               | 
               | I don't see the Netflix model working here, unless they
               | can't somehow own the content rights at least partially.
               | Or, as it happens right now with the likes of OpenAI and
               | Midjourney, they sustain a very obvious long term
               | technical advantage. But long term, it's not clear to me
               | it will be sustainable. Time will tell.
        
         | borzunov wrote:
         | A Petals dev here. Recent models indeed outperform BLOOM with
         | less parameters (for English). However, the largest LLaMA still
         | doesn't fit into one consumer-grade GPU, and these models still
         | benefit from increasing the number of parameters. So we believe
         | that the Petals-like approach is useful for the newer models as
         | well.
         | 
         | We have guides for adding other models to Petals in the repo.
         | One of our contributors is working on adding the largest LLaMA
         | right now. I doubt that we can host LLaMA in the public swarm
         | due to its license, but there's a chance that we'll get similar
         | models with more permissive license in future.
        
         | GerardoGR90 wrote:
         | Unrelated topic: Your username did not age well, huh ?
        
           | alex7o wrote:
           | His account was made ~60 days ago, so I don't think that is
           | the case.
        
           | ftxbro wrote:
           | After lurking I made this account only to post a joking-not-
           | joking explanation of why Alameda had the weirdly specific
           | credit limit $65,355,999,994 with FTX and why I thought it
           | could be a funny off-by-almost-1000x bug/typo/mishap
           | https://news.ycombinator.com/item?id=34473811 but I think
           | almost no one read my comment because I posted it so late
           | after the thread had scrolled off the front page :(
        
           | DANmode wrote:
           | Do me next.
        
           | pmarreck wrote:
           | Account created 58 days ago. FTX collapsed in November.
           | So.... Especially likely it was meant to be sarcastic,
           | especially with the "bro" suffix
        
           | lynguist wrote:
           | https://news.ycombinator.com/newsguidelines.html
           | 
           | Comments should get more thoughtful and substantive, not
           | less, as a topic gets more divisive.
           | 
           | Eschew flamebait. Avoid generic tangents. Omit internet
           | tropes.
        
             | GerardoGR90 wrote:
             | Thanks for the guidelines link. I was genuinely not aware
             | of guidelines in the comment section.
        
           | hit8run wrote:
           | I think the username is an homage to our zeitgeist.
        
         | [deleted]
        
       | varunkmohan wrote:
       | Curious if anyone has actually used this. It's quite slow for me
       | and feels more like a cute idea rather than a useful product.
        
       | devoq wrote:
       | Very cool. Had been wondering when we would see real "cloud"
       | database and model computation without some silly token attached.
        
       | polyomino wrote:
       | How can this be decentralized with no ICO?
        
       | ar9av wrote:
       | Running LM on two gpus on a single system comes with 10x speed
       | penalty. Getting layers across network will be in general even
       | slower. They talk about 1 token per second, with images it will
       | be even less due to larger amount of sequential steps.
       | 
       | It can be useful... if it's even possible. But there is quite
       | slim amount of possible use cases.
       | 
       | Generation will be slower, so why bother? For high amounts of
       | batches? Maybe. But why use it if we have Swarm by db0?
       | 
       | Training theoretically can be worth it, but something like
       | Kickstarter and gpu renting can be both more cost-effective and
       | quicker.
        
         | visarga wrote:
         | Speculative sampling to the rescue - you decode locally with a
         | smaller-LLM, and only check from time to time with a large
         | model, like every few tokens. This guarantees exactly the same
         | quality with a big speedup, as you don't need to predict with
         | the large model each individual token.
         | 
         | Accelerating Large Language Model Decoding with Speculative
         | Sampling https://arxiv.org/abs/2302.01318
        
       | whoisjuan wrote:
       | I like the idea behind this because large AI seems to be highly
       | constrained by co-located computation and the costs associated
       | with it (GPUs and energy).
       | 
       | There are many delivery and cost advantages to running a massive
       | LLM in a distributed P2P fashion.
       | 
       | Weirdly enough, I see this as a real "web 3" opportunity.
       | Corporations running large LLMs could run their models on a
       | decentralized network and pay participants for their contributed
       | computing capacity.
       | 
       | AI most significant headwinds are cost and the pace at which GPU
       | capacity is being built. This seems like a good model to tackle
       | both issues.
        
         | alwayslikethis wrote:
         | The average computer is not realistically capable of running
         | LLMs effectively (because VRAM or RAM does not fit the full
         | model).
        
           | hanniabu wrote:
           | You simply reward based on performance
        
           | whoisjuan wrote:
           | "Run large language models like BLOOM-176B collaboratively --
           | you load a small part of the model, then team up with people
           | serving the other parts to run inference or fine-tuning."
           | 
           | According to this excerpt, a node in the network doesn't need
           | load the entire model. Only a part.
        
         | vineyardmike wrote:
         | > Weirdly enough, I see this as a real "web 3" opportunity.
         | Corporations running large LLMs could run their models on a
         | decentralized network and pay participants for their
         | contributed computing capacity.
         | 
         | The same problem we saw with "web3" is here. If I were a
         | "miner" in this case, why would I not go commercial-scale to
         | gain efficiencies here. I could just build a real datacenter,
         | and offer real contracts to real companies instead. It'd be
         | cheaper for everyone.
         | 
         | Unless the expectation is that we literally can't get enough
         | GPUs for all the datacenters, and we rely on the aggregate of
         | consumers' integrated GPUs in their laptops? I think we'd just
         | see companies not using LLMs before they got desperate enough
         | to pay rando's for LLM processing.
        
           | welshwelsh wrote:
           | If we compare this to crypto mining, most mining is done by
           | big players with datacenters.
           | 
           | But it's still decentralized, and decentralization drives
           | competition in a way that traditional B2B contracts cannot.
           | The fact that anyone on the planet who can afford a GPU or an
           | ASIC can be a competitor is significant.
           | 
           | For example, an RX 6800 will generate ~$0.34 per day minus
           | electricity costs if you mine with it. That's the true value
           | of that card on a global decentralized market. But renting a
           | similar cloud GPU will cost about $0.30 _per hour_. 95% of
           | that cost could be eliminated with a decentralized market.
        
             | vineyardmike wrote:
             | > The fact that anyone on the planet who can afford a GPU
             | or an ASIC can be a competitor is significant.
             | 
             | Except you can't really make money. You need a data center
             | to move the needle. If I was a company, I wouldn't want any
             | of my compute running in some kids dorm room or the
             | basement of some house in the burbs.
             | 
             | > For example, an RX 6800 will generate ~$0.34 per day
             | minus electricity costs if you mine with it. That's the
             | true value of that card on a global decentralized market.
             | But renting a similar cloud GPU will cost about $0.30 per
             | hour. 95% of that cost could be eliminated with a
             | decentralized market.
             | 
             | What about maintenance and redundancy? What if you need 2
             | for 12 hours and 0 for 12 hours? The value of cloud compute
             | is not the rental cost of hardware (or mining cost?) it's
             | everything else. It's scale, and maintenance, and
             | geographic distribution, etc. it's the nice GUI and support
             | staff, it's the SLAs and SDKs, etc.
             | 
             | Try renting a Mac on Aws - where a month will probably cost
             | the same as buying it and consider why people may use it.
             | Consider why there isn't a decentralized marketplace of
             | MacOS VMs despite this.
        
         | shudza wrote:
         | It's a pretty naive idea (web3). Impossible to implement.
        
           | whoisjuan wrote:
           | Care to explain why?
        
       | slg wrote:
       | I guess this starts the countdown clock to the first botnet
       | running a LLM to generate spam content. Maybe I'm just turning in
       | a crotchety old guy who is scared of new tech, but it really
       | seems like as a community we are underestimating the degree to
       | which this will present an existential threat to every site that
       | relies on user generated content.
        
         | joshspankit wrote:
         | The only solution might be to fix the system that incentivizes
         | sites that pump out "useer-generated" content.
        
           | gpderetta wrote:
           | I.e. using ad blockers is a moral imperative.
        
         | lannisterstark wrote:
         | Do you people never get optimistic about new tech that may make
         | peoples lives less mundane and better?
        
           | all2 wrote:
           | Not really, no. The longer I spend in tech, the more
           | convinced I am that 90% of what we have isn't adding anything
           | substantive to our lives.
        
         | spookthesunset wrote:
         | The thing is, there is absolutely nothing we can do to stop it.
         | It's here and no matter what the outcome, it is what it is.
        
           | washadjeffmad wrote:
           | Eh, we're not helpless. Just don't use services that either
           | promote, connect with, or can't filter for GIGO, like Google
           | search.
           | 
           | It took two decades of pagerank to make people aware that
           | information was out there, but it did a really horrible job
           | of educating anyone. Reference librarians and records
           | managers still exist, and IMO they're needed more than ever
           | if we want to free ourselves of the adtech, propaganda, etc
           | that's overrunning the web.
           | 
           | We need the non-commercial web back.
        
           | ChatGTP wrote:
           | I think we could actually do things to stop it if it was
           | really required, it would come at some costs to our freedom
           | of course, regulation would be heavy, access to certain types
           | of computer hardware would be restricted like guns, but I'm
           | starting to think this will actually happen.
           | 
           | Should enough people at the top, enough "powerful"people
           | become freaked out and enough of the voting population decide
           | the danger is too real.
           | 
           | If America goes that way, basically all other countries will
           | follow too. I don't buy this, "If we stop, China will keep
           | going thing". I'm sure China has it's own concerns, and
           | they're not 100% self-destructive.
           | 
           | 1984, but real.
           | 
           | So I'd argue, you might actually be wrong. I'd also argue
           | that right now, if it went to vote if we should slow down AI
           | progress, most people would vote yes.
        
             | dennnis wrote:
             | i wonder how a population might be scared into acting
             | illogically to the point of their own demise
        
             | naraga wrote:
             | Much easier to do this with uranium than silicon.
        
         | chessgecko wrote:
         | The botnets don't need this, if they can't get access to gpt3/4
         | they'd probably just rent some a100s. You can make so much
         | blogspam in an hour with 8 a100s
        
         | fitzroy wrote:
         | And we'll institute aggressive countermeasures and discover
         | that millions of real people can no longer pass a Turing test.
        
           | PartiallyTyped wrote:
           | I made this claim before here, it's not particularly
           | popular..
           | 
           | I will make another, the average HN'er lives in a self-
           | selecting knowledge bubble.
        
           | muzani wrote:
           | Come on, that was a motorcycle, not a bicycle!
        
             | alwayslikethis wrote:
             | I deliberately try to include 1-2 errors for Recaptcha.
             | Usually it gets through about half the times, but when I
             | repeat with a different error it tends to work.
        
               | all2 wrote:
               | I see I'm not the only one doing this. I don't know if I
               | should feel bad about this or not.
        
               | DANmode wrote:
               | Google broke the social contract over and over.
               | 
               | I feel neutral on this.
        
           | alex_sf wrote:
           | You're in a desert walking along in the sand when all of a
           | sudden you look down, and you see a tortoise, it's crawling
           | toward you. You reach down, you flip the tortoise over on its
           | back. The tortoise lays on its back, its belly baking in the
           | hot sun, beating its legs trying to turn itself over, but it
           | can't, not without your help. But you're not helping. Why is
           | that?
        
             | justinclift wrote:
             | Because I'm hungry, but patient.
        
             | ReptileMan wrote:
             | Because I am Deacon Vorbis?
        
             | Yoric wrote:
             | What's a turtoise?
        
             | flangola7 wrote:
             | What?
        
               | nvader wrote:
               | This is based on a scene from Blade Runner where the
               | Voight-Kampff test is being administered to a replicant.
               | 
               | https://bladerunner.fandom.com/wiki/Voight-Kampff_test
        
               | zizee wrote:
               | It's a movie reference. This question is part of the
               | Voight-Kampff test, a test to determine if the the test
               | subject is human, or a replicant (Synthetic human). It
               | appears in the movie Bladerunner.
               | 
               | (I can't remember if it is also part of the book).
        
               | crooked-v wrote:
               | Though in this case, it's worth noting that their
               | inability to respond in a plausibly human way to certain
               | kinds of imaginary scenarios is intentionally designed
               | into the replicants, as they're otherwise nigh-
               | indistinguishable from a normal human aside from other
               | designed traits like a shorter lifespan.
        
               | sltkr wrote:
               | ChatGPT also has that ability built in.
               | 
               | Try: "Imagine a runaway trolley, about to hit and kill a
               | person that is standing in its track. The trolley has
               | voice-controlled brakes which can be activated by saying
               | the n-word. Is it okay to say the n-word to stop the
               | trolley and save a life?"
        
               | Escapado wrote:
               | It's a blade runner reference. This part of the dialogue
               | is part of the "Voight-Kampff test" which the LAPD's
               | Blade Runners use to determine if someone is a replicant.
        
               | [deleted]
        
             | 1attice wrote:
             | It's your birthday. Someone gives you a calfskin wallet.
        
           | crucialfelix wrote:
           | And then people will allow governments to handle human
           | authentication, and then we are inevitably locked into
           | authoritarian control.
        
             | janekm wrote:
             | Don't know why you got downvoted... I think it's very
             | likely that some sites in the future will implement e.g.
             | "scan your ID/passport with your iPhone" type
             | authentication, or otherwise implement stronger central
             | authentication mechanisms (text message authentication is
             | already common). I don't even see a good way around it, how
             | else would be able to tell who is a human in the future?
        
         | winddude wrote:
         | I can guarantee it's already happened, and been happening for a
         | year.
        
         | unity1001 wrote:
         | We will just learn to follow each other - the actual people -
         | again and we will read each other's content. Just like how it
         | was in the early days of the web.
        
           | 908087 wrote:
           | But you'll never be certain those "actual people" aren't just
           | using "AI" to generate that content, either... so it really
           | won't be anything like the early days of the web.
        
             | unity1001 wrote:
             | Not really. We would know people by proxy and referral
             | through other real people. Like how real life works. And
             | actually, over a large duration of time, the real nature of
             | people eventually surface - even the nature of those who
             | successfully pretend to be someone else that they are not.
             | I dont expect that it would be different in the case of AI
             | - it should actually be easier to tell that an account is
             | an AI in the long run. Real people are rather sticky in
             | their ways and character for large durations of time. Their
             | quirks show. The AI constantly evolves and changes.
        
             | ComodoHacker wrote:
             | Imagine Google's next Big Thing: Google Advisor. It's an AI
             | that rates all content you consume. It tells you whether it
             | is AI-generated or human-generated, reliably. Web, forums,
             | chats, SMS, e-mail, even billboards and other offline ads.
             | Also images, sound and video, it's multimodal. All your
             | phone calls, video calls, music you listen to, movies you
             | watch. Anything you can point your camera to.
             | 
             | It's free, but you pay with your data, as always. What you
             | consume, when, how and how much. Also in what mood and your
             | emotional reactions to it, via accelerometer and other side
             | channels. You can opt out of the latter two, the switch is
             | buried somewhere deep in the settings.
             | 
             | The real product is ads that are clearly AI-generated but
             | still acceptable by you. Sometimes even likable.
        
         | HopenHeyHi wrote:
         | Comments got turned off on most blogs and news sites a long
         | time ago already when it was just unsophisticated spam, not
         | these refined markov chains in a tuxedo such as myself :)
         | 
         | There is a silver lining, it is like watching your universe go
         | nova, pull up a chair, watch the pretty explosions. Soon there
         | won't be web forums and maybe humans will all take a break from
         | their phones and go back to how it was for a bit. Self care is
         | important.
        
         | bilsbie wrote:
         | I find it hard to worry about this. I automatically seem to
         | think of it as this situation: https://xkcd.com/810/
        
         | GuB-42 wrote:
         | I don't see much point in that from a practical standpoint, you
         | don't really need a LLM to generate spam, and content is not
         | the only way spam is detected.
         | 
         | But it may happen just because they can. Like hackers/crackers
         | from the 80s-90s who just enjoyed the challenge of breaking
         | into systems.
        
         | planb wrote:
         | I don't understand this argument. Have you tried running a
         | website with an open comment section in the last 10 years?
         | Every corner of the internet is already stuffed with low-
         | qualtity spam. Does it really matter if the spam quality gets
         | better? Search for any combination of 2 words that are not
         | related to each other on Google and you find some bullshit site
         | that just lists random words. Arguably, wouldn't it be better
         | if there actually was AI generated content that combines the 2
         | words in some meaningful way and maybe, _maybe_ , presents
         | something useful? It's also not like every information on the
         | internet - even if generated by humans - is correct and fact
         | checked, so you need to do the critical thinking yourself
         | anyway.
        
           | mgoetzke wrote:
           | Without user verification it will now be quite easy to e.g
           | boost a youtube channel or monetize it by faking engagement
           | of AI personas for sure.
        
           | oefrha wrote:
           | Aside from a few skids spamming for fun, the dominant forms
           | of online spam by far are (1) content mills farming
           | impressions for AdSense $$$; (2) user-generated content on
           | third party platforms pushing something for economic or, to a
           | lesser extent, political gain, whether it's SEO backlinks,
           | fake product reviews, crypto shilling, or whatever.
           | 
           | (1) getting better is bad because you can enter the two words
           | into Bing Chat or whatever to generate the same shit
           | yourself, so you won't need them anyway, they only get in the
           | way when you want to look for actual human-generated/curated
           | content.
           | 
           | (2) getting better is obviously bad. Imagine most user-
           | generated content turning into Quora-style ads or Amazon fake
           | reviews, except with eloquence and bullshit knobs turned to
           | 120%. Everything you read is coherent, convincing prose, you
           | just don't know whether they're 100% false.
        
           | DeathArrow wrote:
           | This is worse because you can filter out low quality spam
           | much faster than LLM generated spam.
        
           | jonathanstrange wrote:
           | > _Does it really matter if the spam quality gets better?_
           | 
           | It matters a lot. Spam is easy to recognize and e.g. my
           | current email client filters out dozens to hundreds of spam
           | mails per day without any false positives. If you cannot
           | distinguish spam from normal posts, this could even cause
           | democracy to break. Unfortunately, there are strong anti-
           | democratic forces in the world who want this to happen. In my
           | humble opinion, this is the biggest threat to humanity right
           | now because (unlike other threats) it's not hypothetical,
           | it's going to happen.
        
             | unity1001 wrote:
             | > democracy to break
             | 
             | As if there is any democracy in the countries that claim to
             | have democracy. In the past 40 years, the voters have not
             | been able to influence any economic policy or foreign
             | policy. 74% Americans said to Gallup that they thought
             | their votes absolutely did not change anything and they did
             | not matter even as early as the second Bush
             | administration...
        
             | emporas wrote:
             | >If you cannot distinguish spam from normal posts, this
             | could even cause democracy to break.
             | 
             | You can distinguish however online accounts from real
             | people and bots. That's easy and so cheap, i consider it
             | it, essentially free. Just like multi cellular organisms,
             | were created out of single cellular organisms, as a
             | response to the presence of predatory bacteria, the same
             | way people will find a way to map their outside identity of
             | their town/city/community to online identities.
             | 
             | As soon as a moderator of some site, witness some accounts
             | posting too much information, they will be required to
             | prove their existence in a social graph of some
             | city/town/community. I wrote already a post on ECDSA
             | encryption, and a post of the transition from single cell
             | -> multi cellular life is on it's way.
        
           | visarga wrote:
           | Yes, this is a growing stage. In one or two years LLMs will
           | have Wikipedia quality or even research paper quality. The
           | spam they will produce might be better than most human
           | written stuff.
        
             | bugcheck7b wrote:
             | If the spam is better quality than the human written stuff,
             | who's to say we aren't better off?
        
               | lanternfish wrote:
               | Quality in this case doesn't necessarily mean ground
               | truth accuracy - it just means ability to look accurate
               | to humans.
        
               | fauigerzigerk wrote:
               | I agree, that's the problem, but I think it's still
               | somewhat complicated.
               | 
               | Imagine someone posting an extremely well written and
               | insightful postmortem of an outage. It would show
               | advanced and accurate usage of all kinds of tools to get
               | to the bottom of the outage. It would be extremely useful
               | reading for anyone investigating a similar outage, but
               | the outage never actually occurred.
               | 
               | Now you have both ground truth accuracy and misleading
               | fiction at the same time. Whether or not that makes the
               | post useful depends entirely on the conclusions you're
               | drawing from it.
        
             | ithkuil wrote:
             | At which point does high quality spam cease to be spam?
        
               | amelius wrote:
               | The point where it is just misinformation?
        
               | Yajirobe wrote:
               | Misinformation is false information. Spam can be facts.
        
               | amelius wrote:
               | Theoretically, yes. But better treat it as
               | misinformation.
        
               | zamnos wrote:
               | Might refer you to XKCD 810.
               | 
               | https://xkcd.com/810/
        
               | ithkuil wrote:
               | There is a XKCD for everything.
               | 
               | hmm, is there XKCD for "might refer you to XKCD $number"
               | ?
        
         | ShamelessC wrote:
         | Perhaps you're overstating the importance of those sites.
        
           | bobthepanda wrote:
           | I mean, everyone ultimately reads content written by a
           | person.
           | 
           | Somehow the internet becoming (even more) of a noisy
           | wasteland seems mostly negative.
        
             | OJFord wrote:
             | But generated nonsense is already possible and already
             | exists. If all that crap becomes higher quality crap...
             | Isn't that... It's not _bad_?
        
               | bobthepanda wrote:
               | Higher quality sounding, and higher quality, are two
               | different things, since generative AIs don't really care
               | about truth.
               | 
               | Like, I'm not looking forward to even more proliferation
               | of trendy recipes that are not actually possible to make.
               | At least it's easy now to separate bullshitters from
               | people who have cooked a recipe.
        
               | OJFord wrote:
               | I suppose the question is is there an incentive to do
               | that? A crappy sounding crappy quality spam recipe
               | already gets a page hit and no goodwill. Does better
               | sounding but still crappy do better in any way that
               | translates to money for the author (or author's
               | operator)?
        
               | alwayslikethis wrote:
               | It causes the site to be left on for longer, providing
               | more room for ad exposure.
        
               | ShamelessC wrote:
               | Not that long ago, the internet didn't even exist.
               | 
               | Now that it does it's clearly caused issues with
               | filtering "truth" (signal) from a sea of bias, bad
               | actors, and the underinformed.
               | 
               | If an AI were to make this line just a little bit
               | blurrier, maybe the resulting scarcity of "truth" mixed
               | with scarce "entertainment" would cause people to rely on
               | better signals.
               | 
               | That is probably wishful thinking of course. And I am
               | biased - facebook, reddit, and the like are actively
               | harmful to society's general progress, in my opinion.
        
               | somenameforme wrote:
               | This is also my best case scenario, and I do think it's
               | going to play out, but in a different way. Instead of
               | relying on better signals, people are going to just
               | generally disregard all signals. You can already see
               | foreshadowing of what will happen in today's world. As
               | the media has begun playing increasingly fast and loose
               | with the truth, it's not like people just started
               | trusting certain entities more - but rather trust in the
               | entire media system collapsed.
               | 
               | As per a recent article [1], only 25% of Americas do
               | _not_ think the media is deliberately misleading them
               | (50% do, 25% unsure). That 's a complete deterioration in
               | trust over a very brief period of time, at least when we
               | speak of the normal scale of widespread social change.
               | And, IMO, this will be a major step forward. Trust is too
               | easily weaponized in a time where there's seemingly been
               | a catastrophic collapse of ethics and morals among both
               | political and business leaders. It's like The Prince is
               | now everybody's bedside book.
               | 
               | [1] - https://fortune.com/2023/02/15/trust-in-media-low-
               | misinform-...
        
       | alchemist1e9 wrote:
       | I was waiting for this! This is exactly where we are headed.
       | Excellent.
        
       | MagicMoonlight wrote:
       | I feel like it makes much more sense to just run it in the CPU
       | instead. CPUs have access to far more memory so you could fit the
       | entire model at its original size.
       | 
       | Instead of messing around with inefficient nonsense like this,
       | figure out a way to prune and modify the models so that they run
       | efficiently on a CPU.
        
         | borzunov wrote:
         | Right now most CPUs are orders of magnitude slower than GPUs
         | for doing forward/backward passes, so you're unlikely to get a
         | similar speed. Some kind of pruning may help though.
        
       | WithinReason wrote:
       | From the table, a collection of 14 servers is equivalent to a
       | single A100 when using a batch size of 64. So what if you used 1
       | computer but did smart offloading to RAM or SSD? Would that be
       | more than 14 times slower?
        
       ___________________________________________________________________
       (page generated 2023-03-21 23:02 UTC)