[HN Gopher] Llama.cpp 30B runs with only 6GB of RAM now
___________________________________________________________________
Llama.cpp 30B runs with only 6GB of RAM now
Author : msoad
Score : 329 points
Date : 2023-03-31 20:37 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| TaylorAlexander wrote:
| Great to see this advancing! I'm curious if anyone knows what the
| best repo is for running this stuff on an Nvidia GPU with 16GB
| vram. I ran the official repo with the leaked weights and the
| best I could run was the 7B parameter model. I'm curious if
| people have found ways to fit the larger models on such a system.
| terafo wrote:
| I'd _assume_ that 33B model should fit with this(only repo that
| I know of that implements SparseGPT and GPTQ for LLaMa), I,
| personally, haven 't tried though. But you can try your luck
| https://github.com/lachlansneff/sparsellama
| enlyth wrote:
| https://github.com/oobabooga/text-generation-webui
| w1nk wrote:
| Does anyone know how/why this change decreases memory consumption
| (and isn't a bug in the inference code)?
|
| From my understanding of the issue, mmap'ing the file is showing
| that inference is only accessing a fraction of the weight data.
|
| Doesn't the forward pass necessitate accessing all the weights
| and not a fraction of them?
| matsemann wrote:
| Maybe lots of the data is embedding values or tokenizer stuff,
| where a single prompt uses a fraction of those values. And then
| the rest of the model is quite small.
| w1nk wrote:
| That shouldn't be the case. 30B is a number that directly
| represents the size of the model, not the size of the other
| components.
| detrites wrote:
| The pace of collaborative OSS development on these projects is
| amazing, but the _rate_ of optimisations being achieved is almost
| unbelievable. What has everyone been doing wrong all these years
| _cough_ sorry, I mean to say weeks?
|
| Ok I answered my own question.
| datadeft wrote:
| I have predicted that LLaMA will be available on mobile phones
| before the end of this year. We are very close.
| terafo wrote:
| You mean in contained app? It can already run on a phone. GPU
| acceleration would be nice at this point, though.
| rickrollin wrote:
| People have actually ran it on phones.
| politician wrote:
| Roughly: OpenAIs don't employ enough jarts.
|
| In other words, the groups of folks working on training models
| don't necessarily have access to the sort of optimization
| engineers that are working in other areas.
|
| When all of this leaked into the open, it caused a lot of
| people knowledgeable in different areas to put their own
| expertise to the task. Some of those efforts (mmap) pay off
| spectacularly. Expect industry to copy the best of these
| improvements.
| bee_rider wrote:
| The professional optimizes well enough to get management off
| their back, the hobbyist can be irrationally good.
| hedgehog wrote:
| They have very good people but those people have other
| priorities.
| kmeisthax wrote:
| >What has everyone been doing wrong all these years
|
| So it's important to note that all of these improvements are
| the kinds of things that are cheap to run on a pretrained
| model. And all of the developments involving large language
| models recently have been the product of hundreds of thousands
| of dollars in rented compute time. Once you start putting six
| digits on a pile of model weights, that becomes a capital cost
| that the business either needs to recuperate or turn into a
| competitive advantage. So everyone who scales up to this point
| doesn't release model weights.
|
| The model in question - LLaMA - isn't even a public model. It
| leaked and people copied[0] it. But because such a large model
| leaked, now people can actually work on iterative improvements
| again.
|
| Unfortunately we don't really have a way for the FOSS community
| to pool together that much money to buy compute from cloud
| providers. Contributions-in-kind through distributed computing
| (e.g. a "GPT@home" project) would require significant changes
| to training methodology[1]. Further compounding this, the
| state-of-the-art is actually kind of a trade secret now. Exact
| training code isn't always available, and OpenAI has even gone
| so far as to refuse to say anything about GPT-4's architecture
| or training set to prevent open replication.
|
| [0] I'm avoiding the use of the verb "stole" here, not just
| because I support filesharing, but because copyright law likely
| does not protect AI model weights alone.
|
| [1] AI training has very high minimum requirements to get in
| the door. If your GPU has 12GB of VRAM and your model and
| gradients require 13GB, you can't train the model. CPUs don't
| have this limitation but they are ridiculously inefficient for
| any training task. There are techniques like ZeRO to give
| pagefile-like state partitioning to GPU training, but that
| requires additional engineering.
| seydor wrote:
| > we don't really have a way for the FOSS community to pool
| together that much money
|
| There must be open source projects with enough money to pool
| into such a project. I wonder whether wikimedia or apache are
| considering anything.
| terafo wrote:
| _AI training has very high minimum requirements to get in the
| door. If your GPU has 12GB of VRAM and your model and
| gradients require 13GB, you can 't train the model. CPUs
| don't have this limitation but they are ridiculously
| inefficient for any training task. There are techniques like
| ZeRO to give pagefile-like state partitioning to GPU
| training, but that requires additional engineering._
|
| You can't if you have one 12gb gpu. You can if you have
| couple of dozens. And then petals-style training is possible.
| It is all very very new and there are many unsolved hurdles,
| but I think it can be done.
| webnrrd2k wrote:
| Maybe a good candidate for the SETI@home treatment?
| terafo wrote:
| It is a good candidate. Tech is good 6-18 months away,
| though.
| dplavery92 wrote:
| Sure, but when one 12gb GPU costs ~$800 new (e.g. for the
| 3080 LHR), "a couple of dozens" of them is a big barrier to
| entry to the hobbyist, student, or freelancer. And cloud
| computing offers an alternative route, but, as stated,
| distribution introduces a new engineering task, and the
| month-to-month bills for the compute nodes you are using
| can still add up surprisingly quickly.
| terafo wrote:
| We are talking groups, not individuals. I think it is
| quite possible for couple of hundreds of people to
| cooperate and train something at least as big as LLaMa 7B
| in a week or two.
| xienze wrote:
| > but the rate of optimisations being achieved is almost
| unbelievable. What has everyone been doing wrong all these
| years cough sorry, I mean to say weeks?
|
| It's several things:
|
| * Cutting-edge code, not overly concerned with optimization
|
| * Code written by scientists, who aren't known for being the
| world's greatest programmers
|
| * The obsession the research world has with using Python
|
| Not surprising that there's a lot of low-hanging fruit that can
| be optimized.
| Miraste wrote:
| Why does Python get so much flak for inefficiencies? It's
| really not that slow, and in ML the speed-sensitive parts are
| libraries in lower level languages anyway. Half of the
| optimization from this very post is in Python.
| wkat4242 wrote:
| Wow I continue being amazed by the progress being made on
| language models in the scope of weeks. I didn't expect
| optimisations to move this quickly. Only a few weeks ago we were
| amazed with ChatGPT knowing it would never be something to run at
| home, requiring $100.000 in hardware (8xA100 card).
| kossTKR wrote:
| Does this mean that we can also run the 60B model on a 16GB ram
| computer now?
|
| I have the M2 air and can't wait until further optimisation with
| the Neural Engine / multicore gpu + shared ram etc.
|
| I find it absolutely mind boggling that GPT-3.5(4?) level quality
| may be within reach locally on my $1500 laptop / $800 m2 mini.
| thomastjeffery wrote:
| I doubt it: text size and text _pattern_ size don 't scale
| linearly.
| kossTKR wrote:
| Interesting, i wonder what the scaling function is.
| abujazar wrote:
| I love how LLMs have got the attention of proper programmers such
| that the Python mess is getting cleaned up.
| jart wrote:
| Author here. For additional context, please read
| https://github.com/ggerganov/llama.cpp/discussions/638#discu...
| The loading time performance has been a huge win for usability,
| and folks have been having the most wonderful reactions after
| using this change. But we don't have a compelling enough theory
| yet to explain the RAM usage miracle. So please don't get too
| excited just yet! Yes things are getting more awesome, but like
| all things in science a small amount of healthy skepticism is
| warranted.
| conradev wrote:
| > But we don't have a compelling enough theory yet to explain
| the RAM usage miracle.
|
| My guess would be that the model is faulted into memory lazily
| page by page (4K or 16K chunks) as the model is used, so only
| the actual parts that are needed are loaded.
|
| The kernel also removes old pages from the page cache to make
| room for new ones, and especially so if the computer is using a
| lot of its RAM. As with all performance things, this approach
| trades off inference speed for memory usage, but likely faster
| overall because you don't have to read the entire thing from
| disk at the start. Each input will take a different path
| through the model, and will require loading more of it.
|
| The cool part is that this memory architecture should work just
| fine with hardware acceleration, too, as long as the computer
| has unified memory (anything with an integrated GPU). This
| approach likely won't be possible with dedicated GPUs/VRAM.
|
| This approach _does_ still work to run a dense model with
| limited memory, but the time/memory savings would just be less.
| The GPU doesn't multiply every matrix in the file literally
| simultaneously, so the page cache doesn't need to contain the
| entire model at once.
| jart wrote:
| I don't think it's actually trading away inference speed. You
| can pass an --mlock flag, which calls mlock() on the entire
| 20GB model (you need root to do it), then htop still reports
| only like 4GB of RAM is in use. My change helps inference go
| faster. For instance, I've been getting inference speeds of
| 30ms per token after my recent change on the 7B model, and I
| normally get 200ms per eval on the 30B model.
| conradev wrote:
| Very cool! Are you testing after a reboot / with an empty
| page cache?
| jart wrote:
| Pretty much. I do my work on a headless workstation that
| I SSH into, so it's not like competing with Chrome tabs
| or anything like that. But I do it mostly because that's
| what I've always done. The point of my change is you
| won't have to be like me anymore. Many of the devs who
| contacted after using my change have been saying stuff
| like, "yes! I can actually run LLaMA without having to
| close all my apps!" and they're so happy.
| Miraste wrote:
| This is incredible, great work. Have you tried it with the
| 65B model? Previously I didn't have a machine that could
| run it. I'd love to know the numbers on that one.
| liuliu wrote:
| Metal only recent versions (macOS 13 / iOS 16) supports mmap
| and use that in GPU directly. CUDA does have unified memory
| mode even it is dedicated GPU, would be interesting to try
| that out. Probably going to slow down quite a bit, but still
| interesting to have that possibility.
| zone411 wrote:
| It really shouldn't act as a sparse model. I would bet on
| something being off.
| world2vec wrote:
| >I'm glad you're happy with the fact that LLaMA 30B (a 20gb
| file) can be evaluated with only 4gb of memory usage!
|
| Isn't LLaMA 30B a set of 4 files (60,59Gb)?
|
| -edit- nvm, It's quantized. My bad
| smaddox wrote:
| Based on that discussion, it definitely sounds like some sort
| of bug is hiding. Perhaps run some evaluations to compare
| perplexity to the standard implementation?
| nynx wrote:
| Why is it behaving sparsely? There are only dense operations,
| right?
| w1nk wrote:
| I also have this question, yes it should be. The forward pass
| should require accessing all the weights AFAIK.
| [deleted]
| thomastjeffery wrote:
| How diverse is the training corpus?
| dchest wrote:
| https://arxiv.org/abs/2302.13971
| eternalban wrote:
| Great work. Is the new file format described anywhere? Skimming
| the issue comments I have a vague sense that r/o matter was
| colocated somewhere for zero copy mmap or is there more to it?
| sillysaurusx wrote:
| Hey, I saw your thoughtful comment before you deleted it. I
| just wanted to apologize -- I had no idea this was a de facto
| Show HN, and certainly didn't mean to make it about something
| other than this project.
|
| The only reason I posted it is because Facebook had been
| DMCAing a few repos, and I wanted to reassure everyone that
| they can hack freely without worry. That's all.
|
| I'm really sorry if I overshadowed your moment on HN, and I
| feel terrible about that. I'll try to read the room a little
| better before posting from now on.
|
| Please have a wonderful weekend, and thanks so much for your
| hard work on LLaMA!
|
| EDIT: The mods have mercifully downweighted my comment, which
| is a relief. Thank you for speaking up about that, and sorry
| again.
|
| If you'd like to discuss any of the topics you originally
| posted about, you had some great points.
| d3nj4l wrote:
| Maybe off topic, but I just wanted to say that you're an
| inspiration!
| htrp wrote:
| Just shows how inefficient some of the ML research code can be
| robrenaud wrote:
| Training tends to require a lot more precision and hence
| memory than inference. I bet many of the tricks here won't
| work well for training.
| sr-latch wrote:
| Have you tried running it against a quantized model on
| HuggingFace with identical inputs and deterministic sampling to
| check if the outputs you're getting are identical? I think that
| should confirm/eliminate any concern of the model being
| evaluated incorrectly.
| intelVISA wrote:
| Didn't expect to see two titans today: ggerganov AND jart. Can
| ya'll slow down you make us mortals look bad :')
|
| Seeing such clever use of mmap makes me dread to imagine how
| much Python spaghetti probably tanks OpenAI's and other "big
| ML" shops' infra when they should've trusted in zero copy
| solutions.
|
| Perhaps SWE is dead after all, but LLMs didn't kill it...
| brucethemoose2 wrote:
| Does that also mean 6GB VRAM?
|
| And does that include Alpaca models like this?
| https://huggingface.co/elinas/alpaca-30b-lora-int4
| terafo wrote:
| No(llama.cpp is cpu-only) and no(you need to requantize the
| model).
| sp332 wrote:
| According to
| https://mobile.twitter.com/JustineTunney/status/164190201019...
| you can probably use the conversion tools from the repo on
| Alpaca and get the same result.
|
| If you want to run larger Alpaca models on a low VRAM GPU, try
| FlexGen. I think https://github.com/oobabooga/text-generation-
| webui/ is one of the easier ways to get that going.
| brucethemoose2 wrote:
| Yeah, or deepspeed presumably. Maybe torch.compile too.
|
| I dunno why I thought llama. _cpp_ would support gpus.
| _shrug_
| lukev wrote:
| Has anyone done any comprehensive analysis on exactly how much
| quantization affects the quality of model output? I haven't seen
| any more than people running it and being impressed (or not) by a
| few sample outputs.
|
| I would be very curious about some contrastive benchmarks between
| a quantized and non-quantized version of the same model.
| corvec wrote:
| Define "comprehensive?"
|
| There are some benchmarks here:
| https://www.reddit.com/r/LocalLLaMA/comments/1248183/i_am_cu...
| and here: https://nolanoorg.substack.com/p/int-4-llama-is-not-
| enough-i...
|
| Check out the original paper on quantization, which has some
| benchmarks: https://arxiv.org/pdf/2210.17323.pdf and this
| paper, which also has benchmarks and explains how they
| determined that 4-bit quantization is optimal compared to
| 3-bit: https://arxiv.org/pdf/2212.09720.pdf
|
| I also think the discussion of that second paper here is
| interesting, though it doesn't have its own benchmarks:
| https://github.com/oobabooga/text-generation-webui/issues/17...
| mlgoatherder wrote:
| I've done some experiments here with Llama 13B, in my
| subjective experience the original fp16 model is significantly
| better (particularly on coding tasks). There are a bunch of
| synthetic benchmarks such a wikitext2 PPL and all the whiz bang
| quantization schemes seem to score well but subjectively
| something is missing.
|
| I've been able to compare 4 bit GPTQ, naive int8, LLM.int8,
| fp16, and fp32. LLM.int8 does impressively well but inference
| is 4-5x slower than native fp16.
|
| Oddly I recently ran a fork of the model on the ONNX runtime,
| I'm convinced that the model performed better than
| pytorch/transformers, perhaps subtle differences in floating
| point behavior etc between kernels on different hardware
| significantly influence performance.
|
| The most promising next step in the quantization space IMO has
| to be fp8, there's a lot of hardware vendors adding support,
| and there's a lot of reasons to believe fp8 will outperform
| most current quantization schemes [1][2]. Particularly when
| combined with quantization aware training / fine tuning (I
| think OpenAI did something similar for GPT3.5 "turbo").
|
| If anybody is interested I'm currently working on an open
| source fp8 emulation library for pytorch, hoping to build
| something equivalent to bitsandbytes. If you are interested in
| collaborating my email is in my profile.
|
| 1. https://arxiv.org/abs/2208.09225 2.
| https://arxiv.org/abs/2209.05433
| bakkoting wrote:
| Some results here:
| https://github.com/ggerganov/llama.cpp/discussions/406
|
| tl;dr quantizing the 13B model gives up about 30% of the
| improvement you get from moving from 7B to 13B - so quantized
| 13B is still much better than unquantized 7B. Similar results
| for the larger models.
| terafo wrote:
| I wonder where such difference between llama.cpp and [1] repo
| comes from. F16 difference in perplexity is .3 on 7B model,
| which is not insignificant. ggml quirks are definitely need
| to be fixed.
|
| [1] https://github.com/qwopqwop200/GPTQ-for-LLaMa
| bakkoting wrote:
| I'd guess the GPTQ-for-LLaMa repo is using a larger context
| size. Poking around it looks like GPTQ-for-llama is
| specifying 2048 [1] vs the default 512 for llama.cpp [2].
| You can just specify a longer size on the CLI for llama.cpp
| if you are OK with the extra memory.
|
| [1] https://github.com/qwopqwop200/GPTQ-for-
| LLaMa/blob/934034c8e...
|
| [2] https://github.com/ggerganov/llama.cpp/tree/3525899277d
| 2e2bd...
| gliptic wrote:
| GPTQ-for-LLaMa recently implemented some quantization
| tricks suggested by the GPTQ authors that improved 7B
| especially. Maybe llama.cpp hasn't been evaluated with
| those in place?
| terafo wrote:
| For this specific implementation here's info from llama.cpp
| repo:
|
| _Perplexity - model options
|
| 5.5985 - 13B, q4_0
|
| 5.9565 - 7B, f16
|
| 6.3001 - 7B, q4_1
|
| 6.5949 - 7B, q4_0
|
| 6.5995 - 7B, q4_0, --memory_f16_
|
| According to this repo[1] difference is about 3% in their
| implementation with right group size. If you'd like to know
| more, I think you should read GPTQ paper[2].
|
| [1] https://github.com/qwopqwop200/GPTQ-for-LLaMa
|
| [2] https://arxiv.org/abs/2210.17323
| bsaul wrote:
| how is llama performance relative to chatgpt ? is it as good as
| chatgpt3 or even 4 ?
| terafo wrote:
| It is as good as GPT-3 at most sizes. Instruct layer needs to
| be put on top in order for it to compete with GPT 3.5(which
| powers ChatGPT). It can be done with comparatively little
| amount of compute(couple hundred bucks worth of compute for
| small models, I'd assume low thousands for 65B).
| arka2147483647 wrote:
| What is lama? What can it do?
| terafo wrote:
| Read readme in repo.
| UncleOxidant wrote:
| What's the difference between llama.cpp and alpaca.cpp?
| cubefox wrote:
| I assume the former is just the foundation model (which only
| predicts text) while the latter is instruction tuned.
| [deleted]
| [deleted]
| danShumway wrote:
| I messed around with 7B and 13B and they gave interesting
| results, although not quite consistent enough results for me to
| figure out what to do with them. I'm curious to try out the 30B
| model.
|
| Start time was also a huge issue with building anything usable,
| so I'm glad to see that being worked on. There's potential here,
| but I'm still waiting on more direct API/calling access. Context
| size is also a little bit of a problem. I think categorization is
| a potentially great use, but without additional alignment
| training and with the context size fairly low, I had trouble
| figuring out where I could make use of tagging/summarizing.
|
| So in general, as it stands I had a lot of trouble figuring out
| what I could personally build with this that would be genuinely
| useful to run locally and where it wouldn't be preferable to
| build a separate tool that didn't use AI at all. But I'm very
| excited to see it continue to get optimized; I think locally
| running models are very important right now.
| cubefox wrote:
| I don't understand. I thought each parameter was 16 bit (two
| bytes) which would predict minimally 60GB of RAM for a 30 billion
| parameter model. Not 6GB.
| gamegoblin wrote:
| Parameters have been quantized down to 4 bits per parameter,
| and not all parameters are needed at the same time.
| heap_perms wrote:
| I was thinking something similar. Turns out that you don't need
| all the weights for any given prompt.
|
| > LLaMA 30B appears to be a sparse model. While there's 20GB of
| weights, depending on your prompt I suppose only a small
| portion of that needs to be used at evaluation time [...]
|
| Found the answer from the author of this amazing pull request:
| https://github.com/ggerganov/llama.cpp/discussions/638#discu...
| qwertox wrote:
| Is the 30B model clearly better than the 7B?
|
| I played with Pi3141/alpaca-lora-7B-ggml two days ago and it was
| super disappointing. In percentage between 0% = alpaca-
| lora-7B-ggml and 100% GPT-3.5, where would LLaMA 30B be
| positioned?
| Rzor wrote:
| I haven't been able to run it myself yet, but according to what
| I read so far from people who did, the 30B model is where the
| "magic" starts to happen.
| singularity2001 wrote:
| Does that only happen with the quantized model or also with the
| float16 / float32 model? Is there any reason to use float models
| at all?
| ducktective wrote:
| I wonder if Georgi or jart use GPT in their programming and
| design. I guess the training data was lacking for the sort of
| stuff they do due to their field of work especially jart.
| jart wrote:
| Not yet. GPT-4 helped answer some questions I had about the
| WIN32 API but that's the most use I've gotten out of it so far.
| I'd love for it to be able to help me more, and GPT-4 is
| absolutely 10x better than GPT 3.5. But it's just not strong
| enough at the kinds of coding I do that it can give me
| something that I won't want to change completely. They should
| just train a ChatJustine on my code.
| Dwedit wrote:
| > 6GB of RAM
|
| > Someone mentioning "32-bit systems"
|
| Um no, you're not mapping 6GB on RAM on a 32-bit system. The
| address space simply doesn't exist.
| jiggawatts wrote:
| Windows Server could use up to 64 GB for a 32-bit operating
| system. Individual processes couldn't map more than 4 GB, but
| the total could be larger:
| https://en.wikipedia.org/wiki/Physical_Address_Extension
| sillysaurusx wrote:
| On the legal front, I've been working with counsel to draft a
| counterclaim to Meta's DMCA against llama-dl. (GPT-4 is
| surprisingly capable, but I'm talking to a few attorneys:
| https://twitter.com/theshawwn/status/1641841064800600070?s=6...)
|
| An anonymous HN user named L pledged $200k for llama-dl's legal
| defense:
| https://twitter.com/theshawwn/status/1641804013791215619?s=6...
|
| This may not seem like much vs Meta, but it's enough to get the
| issue into the court system where it can be settled. The tweet
| chain has the details.
|
| The takeaway for you is that you'll soon be able to use LLaMA
| without worrying that Facebook will knock you offline for it. (I
| wouldn't push your luck by trying to use it for commercial
| purposes though.)
|
| Past discussion: https://news.ycombinator.com/item?id=35288415
|
| I'd also like to take this opportunity to thank all of the
| researchers at MetaAI for their tremendous work. It's because of
| them that we have access to such a wonderful model in the first
| place. They have no say over the legal side of things. One day
| we'll all come together again, and this will just be a small
| speedbump in the rear view mirror.
|
| EDIT: Please do me a favor and skip ahead to this comment:
| https://news.ycombinator.com/item?id=35393615
|
| It's from jart, the author of the PR the submission points to. I
| really had no idea that this was a de facto Show HN, and it's
| terribly rude to post my comment in that context. I only meant to
| reassure everyone that they can freely hack on llama, not make a
| huge splash and detract from their moment on HN. (I feel awful
| about that; it's wonderful to be featured on HN, and no one
| should have to share their spotlight when it's a Show HN.
| Apologies.)
| terafo wrote:
| Wish you all luck in the world. We need much more clarity in
| legal status of these models.
| sillysaurusx wrote:
| Thanks! HN is pretty magical. I think they saw
| https://news.ycombinator.com/item?id=35288534 and decided to
| fund it.
|
| I'm grateful for the opportunity to help protect open source
| projects such as this one. It will at least give Huggingface
| a basis to resist DMCAs in the short term.
| [deleted]
| [deleted]
| sheeshkebab wrote:
| All models trained on public data need to be made public. As it
| is their outputs are not copyrightable, it's not a stretch to
| say models are public domain.
| sillysaurusx wrote:
| I'm honestly not sure. RLHF seems particularly tricky --- if
| someone is shaping a model by hand, it seems reasonable to
| extend copyright protection to them.
|
| For the moment, I'm just happy to disarm corporations from
| using DMCAs against open source projects. The long term
| implications will be interesting.
| xoa wrote:
| You seem to be mixing a few different things together here.
| There's a huge leap from something not being copyrightable to
| saying there is grounds for it to be _made_ public. No
| copyright would greatly limit the ability of model makers to
| legally restrict distribution if they made it to the public,
| but they 'd be fully within their rights to keep them as
| trade secrets to the best of their ability. Trade secret law
| and practice is its own thing separate from copyright, lots
| of places have private data that isn't copyrightable (pure
| facts) but that's not the same as it being made public.
| Indeed part of the historic idea of certain areas of IP like
| patents was to encourage more stuff to be made public vs kept
| secret.
|
| > _As it is their outputs are not copyrightable, it's not a
| stretch to say models are public domain._
|
| With all respect this is kind of nonsensical. "Public domain"
| only applies to stuff that is copyrightable, if they simply
| aren't then it just never enters into the picture. And it not
| being patentable or copyrightable doesn't mean there is any
| requirement to share it. If it does get out though then
| that's mostly their own problem is all (though depending on
| jurisdiction and contract whoever did the leaking might get
| in trouble), and anyone else is free to figure it out on
| their own and share that and they can't do anything.
| sheeshkebab wrote:
| Public domain applies to uncopyrightable works, among other
| things (including previously copyrighted works). In this
| case models are uncopyrightable, and I think FB (or any of
| these newfangled ai cos) would have interesting time
| proving otherwise, if they ever try.
|
| https://en.m.wikipedia.org/wiki/Public_domain
| electricmonk wrote:
| _IANYL - This is not legal advice._
|
| As you may be aware, a counter-notice that meets the statutory
| requirements will result in reinstatement unless Meta sues over
| it. So the question isn't so much whether your counter-notice
| covers all the potential defenses as whether Meta is willing to
| sue.
|
| The primary hurdle you're going to face is your argument that
| weights are not creative works, and not copyrightable. That
| argument is unlikely to succeed for the the following reasons
| (just off the top of my head): (i) The act of selecting
| training data is more akin to an encyclopedia than the white
| pages example you used on Twitter, and encyclopedias are
| copyrightable as to the arrangement and specific descriptions
| of facts, even though the underlying facts are not; and (ii)
| LLaMA, GPT-N, Bard, etc, all have different weights, different
| numbers of parameters, different amounts of training data, and
| different tuning, which puts paid to the idea that there is
| only one way to express the underlying ideas, or that all of it
| is necessarily controlled by the specific math involved.
|
| In addition, Meta has the financial wherewithal to crush you
| even were you legally on sound footing.
|
| The upshot of all of this is that you may win for now if Meta
| doesn't want to file a rush lawsuit, but in the long run, you
| likely lose.
| sva_ wrote:
| Thank you for putting your ass on the line and deciding to
| challenge $megacorp on their claims of owning the copyright on
| NN weights that have been trained on public (and probably, to
| some degree, also copyrighted) data. This seems to very much be
| uncharted territory in the legal space, so there are a lot of
| unknowns.
|
| I don't consider it ethical to compress the corpus of human
| knowledge into some NN weights and then closing those weights
| behind proprietary doors, and I hope that legislators will see
| this similarly.
|
| My only worry is that they'll get you on some technicality,
| like that (some version of) your program used their servers
| afaik.
| cubefox wrote:
| Even if using LLaMA turns out to be legal, I very much doubt it
| is ethical. The model got leaked while it was only intended for
| research purposes. Meta engineered and paid for the training of
| this model. It's theirs.
| Uupis wrote:
| I feel like most-everything about these models gets really
| ethically-grey -- at worst -- very quickly.
| willcipriano wrote:
| What did they train it on?
| cubefox wrote:
| On partly copyrighted text. Same as you and me.
| faeriechangling wrote:
| Did Meta ask permission from every user they trained their
| model on? Did all those users consent, and when I say consent
| I'm saying was there a meeting of minds not something buried
| in page 89 of a EULA, to Meta building an AI with their data?
|
| Turnabout is fair play. I don't feel the least bit sorry for
| Meta.
| terafo wrote:
| LLaMa was trained on data of Meta users, though.
| cubefox wrote:
| But it doesn't copy any text one to one. The largest one
| was trained on 1.4 trillion tokens, if I recall correctly,
| but the model size is just 65 billion parameters. (I
| believe they use 16 bit per token and parameter.) It seems
| to be more like a human who has read large parts of the
| internet, but doesn't remember anything word by word.
| Learning from reading stuff was never considered a
| copyright violation.
| Avicebron wrote:
| > It seems to be more like a human who has read large
| parts of the internet, but doesn't remember anything word
| by word. Learning from reading stuff was never considered
| a copyright violation.
|
| This is one of the most common talking points I see
| brought up, especially when defending things like ai
| "learning" from the style of artists and then being able
| to replicate that style. On the surface we can say, oh
| it's similar to a human learning from an art style and
| replicating it. But that implies that the program is
| functioning like a human mind (as far as I know the jury
| is still out on that and I doubt we know exactly how a
| human mind actually "learns" (I'm not a neuroscientist)).
|
| Let's say for the sake of experiment I ask you to cut out
| every word of pride and prejudice, and keep them all
| sorted. Then when asked to write a story in the style of
| jane austen you pull from that pile of snipped out words
| and arranged them in a pattern that most resembles her
| writing, did you transform it? Sure maybe, if a human did
| that I bet they could even copyright it, but I think that
| as a machine, it took those words, phrases, and applied
| an algorithm to generating output, even with stochastic
| elements the direct backwards traceability albeit a 65B
| convolution of it means that the essence of the
| copyrighted materials has been directly translated.
|
| From what I can see we can't prove the human mind is
| strictly deterministic. But an ai very well might be in
| many senses. So the transference of non-deterministic
| material (the original) through a deterministic transform
| has to root back to the non-deterministic model (the
| human mind and therefore the original copyright holder).
| shepardrtc wrote:
| They don't ask permission when they're stealing users'
| data, so why should users ask permission for stealing their
| data?
|
| https://www.usatoday.com/story/tech/2022/09/22/facebook-
| meta...
| seydor wrote:
| It's an index of the web and our own comments, barely
| something they can claim ownership on , and especially to
| resell.
|
| But OTOH, by preventing commercial use, they have sparked the
| creation of an open source ecosystem where people are
| building on top of it because it's fun, not because they want
| to build a moat to fill it with sweet VC $$$money.
|
| It's great to see that ecosystem being built around it, and
| soon someone will train a fully open source model to replace
| Llama
| dodslaser wrote:
| Meta as a company has shown pretty blatantly that they don't
| really care about ethitcs, nor the law for that sake.
| [deleted]
| victor96 wrote:
| Less memory than most Electron apps!
| terafo wrote:
| With all my dislike to Electron, I struggle to remember even
| one Electron app that managed to use 6 gigs.
| baobabKoodaa wrote:
| I assume it was a joke
| mrtksn wrote:
| I've seen WhatsApp doing it. It start with 1.5G anyway, so
| after some images and stuff it inflates quite a lot.
| yodsanklai wrote:
| Total noob questions.
|
| 1. How does this compare with ChatGPT3
|
| 2. Does it mean we could eventually run a system such as ChatGPT3
| on a computer
|
| 3. Could LLM eventually replace Google (in the sense that answers
| could be correct 99.9% of the time) or is the tech inherently
| flawed
| addisonl wrote:
| Minor correction, chatGPT uses GPT-3.5 and (most recently, if
| you pay $20/month) GPT-4. Their branding definitely needs some
| work haha. We are in track for you to be able to run something
| like chatGPT locally!
___________________________________________________________________
(page generated 2023-03-31 23:00 UTC)