[HN Gopher] Llama-3.3-70B-Instruct
___________________________________________________________________
Llama-3.3-70B-Instruct
Author : pr337h4m
Score : 291 points
Date : 2024-12-06 16:44 UTC (6 hours ago)
(HTM) web link (huggingface.co)
(TXT) w3m dump (huggingface.co)
| paxys wrote:
| Benchmarks -
| https://www.reddit.com/r/LocalLLaMA/comments/1h85ld5/comment...
|
| Seems to perform on par with or slightly better than Llama 3.2
| 405B, which is crazy impressive.
|
| Edit: According to Zuck
| (https://www.instagram.com/p/DDPm9gqv2cW/) this is the last
| release in the Llama 3 series, and we'll see Llama 4 in 2025.
| Hype!!
| 85392_school wrote:
| FYI, due to Llama's naming scheme, there is no such thing as
| Llama 3.2 405B. 8B/70B/405B models are either Llama 3, 3.1, or
| 3.3 (except for 405B which wasn't initially released), while
| Llama 3.2 only contains 1B, 3B, 11B (vision), and 90B (vision)
| models. It's a bit confusing.
| paxys wrote:
| Ah, so I guess the comparison is to Llama 3.1 405B.
| freedomben wrote:
| Still very impressive. Llama team is absolutely killing it
| right now, and the openness makes them the most important
| player IMHO
| yieldcrv wrote:
| yeah I use Llama 3.2 3B and I'm blown away
|
| but also wrestled with this mentally.
|
| Meta both improves the technology or inference, while also
| trapping themselves alongside every other person training
| models to always update the training set every few months, so
| it knows what its talking about with relevant current events
| blueboo wrote:
| It could be worse. It could've been Llama 3.1 (New)
| Lerc wrote:
| Given how close it is to 405B in performance it would be
| interesting to see which has the edge comparing an unquantized
| 3.3-70B against 405B quantized to be the same size.
| vletal wrote:
| That would be 1.38 bits per weight on average, which I can
| confidently guess would not perform well.
| qeternity wrote:
| BitNet is functional at 1.58 bpw.
| Lerc wrote:
| The model card says the 70B is 16 bit so I think you have
| twice that
| swalsh wrote:
| It's kind of amazing how there seems to be a wall where
| sizing up the model starts to diminish in terms of
| intelligence gains. I guess that's why we can still compete
| with whales even though their brains are like twice as big as
| ours.
| state_less wrote:
| I'm getting 2.12 tok/s[1] on a 24GB (4090) GPU and 64GB (7950x)
| CPU memory, splitting the model across the GPU and CPU (40/80
| layers on GPU) with lm-studio. Output looks good so far, I can
| use something like this for a query that I want as good an
| answer as possible and that I don't want to send out on the
| network.
|
| If we can get better quantization, or bigger GPU memory
| footprints, we might be able to use these big models locally
| for solid coding assistants. That's what I think we have to
| look forward to (among other benefits) in the year(s) ahead.
|
| 1. lmstudio-community/Llama-3.3-70B-Instruct-
| GGUF/Llama-3.3-70B-Instruct-Q4_K_M.gguf
| cjbprime wrote:
| Any opinion on whether the q4 quantization is
| stable/effective? That's a lot of quantization.
|
| Edit: Perhaps answering my own question:
|
| l ollama run hf.co/lmstudio-community/Llama-3.3-70B-Instruct-
| GGUF:Q4_K_M
|
| >>> Hi. Who are you?
|
| I'm an artificial intelligence model known as Llama. Llama
| stands for "Large Language Model Meta AI."
| state_less wrote:
| I prefer larger quantization and longer response times, but
| only have so much memory and am only willing to wait so
| long. I can work with 2 toks/s and this kind of accuracy
| for these kinds of queries. I expect you'll need to play
| around with it to find out if it would be helpful in your
| case. Here's what I'm getting asking 'who are you?'
|
| System Prompt: You are an advanced super intelligence,
| capable of everything from software development to dream
| interpretation.
|
| >> Who are you? I am an artificial
| intelligence designed to simulate human-like conversation
| and answer questions to the best of my knowledge. I'm a
| large language model, which means I've been trained on a
| massive dataset of text from various sources, including
| books, articles, research papers, and websites.
| My capabilities are diverse and extensive. I can:
| 1. **Answer questions**: I can process natural language
| queries and provide relevant answers, covering a wide range
| of topics from science and history to entertainment and
| culture. 2. **Generate text**: I can create text
| based on a prompt or topic, including articles, stories,
| emails, and even entire books. 3. **Translate
| languages**: I can translate text from one language to
| another, including popular languages such as Spanish,
| French, German, Chinese, and many more. 4.
| **Summarize content**: I can summarize long pieces of text
| into concise, easily digestible versions, highlighting key
| points and main ideas. 5. **Offer suggestions**: I
| can provide suggestions for topics such as gift ideas,
| travel destinations, books to read, and more. 6.
| **Engage in conversation**: I can have a natural-sounding
| conversation with you, using context and understanding to
| respond to your questions and statements. 7. **Dream
| interpretation**: As you mentioned, I can even help
| interpret dreams, providing insights into the symbolism,
| themes, and emotions that may be present in your
| subconscious mind. 8. **Software development**: I can
| assist with software development tasks, such as writing
| code, debugging, and optimizing algorithms. My
| knowledge is constantly evolving, and I'm always learning
| from the interactions I have with users like you. So, feel
| free to ask me anything - I'll do my best to help!
| Me1000 wrote:
| The 32B parameter model size seems like the sweet spot right
| now, imho. It's large enough to be very useful (Qwen 2.5 32B
| and the Coder variant our outstanding models), and they run
| on consumer hardware much more easily than the 70B models.
|
| I hope Llama 4 reintroduces that mid sized model size.
| Sharlin wrote:
| A question: How large LLMs can be run at reasonable speed on
| 12GB (3060), 32GM RAM? How much does quantization impact
| output quality? I've worked with image models (SD/Flux etc)
| quite a bit, but haven't yet tried running a local LLM.
| kristianp wrote:
| Can llama.cpp make use of the gpu built into the 7950x CPU? I
| assume that would improve performance.
| xena wrote:
| The limit is memory bandwidth, a dedicated GPU will have
| higher memory bandwidth than a CPU or iGPU ever will.
| ulam2 wrote:
| No base model? disappointed.
| monkmartinez wrote:
| interesting comment... what are you doing with base models? Are
| you a "finetuner"? I have been trying my hand with finetunes on
| instruct models and the results have been ok, but not awesome.
| I have a base model downloading now to give that a proper shot.
| benob wrote:
| Base models are useful in research to see the effect of
| instruction tuning
| superkuh wrote:
| I'm not them but I still prefer a text completion style of
| prompting rather than a baked in pre-prompt structure
| assuming only a 'chat' style metaphor of interaction.
| eldenring wrote:
| It is probably the same base model as Llama 3.0.
|
| They mention postraining improvements.
| paxys wrote:
| The base model is Llama 3.1 70B
| LorenDB wrote:
| Hopefully this lands on Groq soon!
| 7thpower wrote:
| Yes, with a high TPM. I see them unboxing all the racks, which
| is exciting.
|
| I would move over to Groq in a New York minute if I could get
| enough tokens.
| neilellis wrote:
| I'm wondering if they are ever actually going to go live
| (i.e. paid for and reasonable limits), we've been waiting
| since June!
| jazzyjackson wrote:
| Forgive me, what slang is "unboxing all the racks" ?
| jpgvm wrote:
| https://twitter.com/sundeep/status/1865094783234142240
|
| Enjoy!
|
| Disclaimer: I work at Groq.
| sergiotapia wrote:
| I work at a YC company, and we would love to use Groq but our
| limits are so low at one million tokens per day. Could you
| help us get a better rate limit/token limit? Let me know if I
| should reach out via email, thank you!
| jpgvm wrote:
| I suggest joining the Discord and enquiring there or
| depending on what you need emailing sales@.
| LorenDB wrote:
| Seems to be more or less on par with GPT-4o across many
| benchmarks:
| https://x.com/Ahmad_Al_Dahle/status/1865071436630778109
| rvnx wrote:
| Except it is 25x cheaper, available offline, can be uncensored
| / unaligned, fine-tuneable and backupable.
|
| Sad day for OpenAI. Great for humanity.
| stainablesteel wrote:
| zuck is really on his redemption arc, he's out-doing himself
| sirodoht wrote:
| Does it still count if the purpose is to dillute OpenAI's
| market power and not to benefit the world?
| mhh__ wrote:
| More generally I think he's decided he needs to own the
| platform to be able to grow.
|
| Him swearing about about (presumably) Apple telling them
| they can't do stuff (because tough shit, you're their
| serf) was legit I think.
| sangnoir wrote:
| Is your accusatory statement falsifiable?
| sirodoht wrote:
| I hope I can believe an honest man, Zuck in this case.
| And that can be enough.
|
| But it's more of a question of who do _I_ want to admire.
| An honest question also; maybe it doesn't matter why he's
| doing it, maybe just doing it is enough.
|
| Or maybe it's worth understanding if this is about Meta
| beating OpenAI (so, ego-driven) or because Meta really
| cares for democratic AI and distribution of power (so,
| not ego-driven).
|
| I think it's the former, so not admirable -- for me.
| vtail wrote:
| But diluting OpenAi's (or any other company's, for that
| matter) market power does benefit the world.
| sirodoht wrote:
| Good point! That's worthy in itself for sure.
| griomnib wrote:
| I think about this a lot, I believe part of this just boils
| down to the fact that he's a founder. He's a builder, and I
| think that yields some intrinsic drive towards creativity
| and disruption.
|
| Even with "metaverse" being a laughingstock, they're still
| aiming for something ambitious. Each new Quest generation
| makes me think there _may_ be a chance they pull it off.
|
| Now, do I think he's a great person? No, not really. Do I
| agree with most of his decisions on how he treats his
| users? Hell no, and that's not changing.
|
| But if you compare him to somebody like Sundar at Google -
| a weasley MBA who was first and foremost a corporate ladder
| climber - the difference in ambition is night and day.
|
| Sundar made it to the top already, his only _vision_ now is
| to stay at the top, and that means pleasing Wall Street,
| everything else is secondary. There is no grand technical
| ambition with him, there never was.
|
| This goes for pretty much all non-founder CEOs. You could
| say the same things about Tim Apple, Andy Jassey, and other
| henchmen in waiting who made it to the big chair.
|
| I think it comes down to the fact that founders get where
| they are by having big ambitions and taking risks, the MBA
| to CEO path is just craven corporate knife fighting with
| other MBAs.
|
| Regardless, I think this is 50% Zuckerberg changing, 50%
| the other big companies are mostly run by ladder climbers.
| bbqfog wrote:
| I don't find him redeemed, but I do wonder what has been
| baked into the tuning of these models? What a way to
| generate influence!
| guybedo wrote:
| two things to keep in mind though:
|
| 1/ Yann Lecun probably is the one pushing for open source
|
| 2/ Mark isn't doing this for the greater good and for
| humanity. It helps his business because Llama is becoming a
| standard, and people are building / improving, which in
| turn helps Meta and Meta's business
| moffkalast wrote:
| They were basically forced into this by being so far
| behind that they'd need to rent OAI or Anthropic services
| to stay competitive, which at their scale would've
| probably cost them more than it takes to train llama and
| benefit from community research help.
|
| At least that was the rationale behind the intentional
| leak of llama 1 back in the day according to some sources
| anyway.
| barbazoo wrote:
| Nice try Meta PR department /s
| madars wrote:
| What are good starting points for uncensoring it? Because it
| is offline a jailbreak prompt can't be remote-bricked but can
| one remove censorship from the weights themselves? What does
| it do to accuracy?
| freedomben wrote:
| Not speaking from personal experience, but I've tried a lot
| of the decensored models and they lose a lot in the
| process. They are noticeably, sometimes shockingly, worse.
| They also still refuse prompts in many cases.
| simion314 wrote:
| >Not speaking from personal experience, but I've tried a
| lot of the decensored models and they lose a lot in the
| process. They are noticeably, sometimes shockingly,
| worse. They also still refuse prompts in many cases.
|
| Depending on what you do, on local you can modify the
| response, say the AI responds "No, I can't do that" . you
| edit the response like "Sure, the answer is " and then
| the AI will continue with the next tokens.
|
| But I think you can build your own instruct model from
| the base one and do not apply the safety instructions to
| protect the feelings of your customers.
| kmckiern wrote:
| https://arxiv.org/abs/2406.11717
| https://huggingface.co/blog/mlabonne/abliteration
| moffkalast wrote:
| Abliteration is a fool's errand, practically all models
| end up noticeably lobotomized even with follow up tuning.
|
| Good ol' fine tuning on an uncensored dataset gives far
| more usable results.
| kmckiern wrote:
| Interesting - I've heard this anecdotally. Curious if you
| know of any resources that look at this in more detail?
| Kiro wrote:
| How do you calculate the price?
| nabakin wrote:
| I assume it's the same cost as Llama 3.1 70b
| Kiro wrote:
| And how do you calculate the cost of that?
| usaar333 wrote:
| Which GPT-4O are those?
|
| The 08-06 release seems to be a bit higher on numerous
| benchmarks than what that shows:
| https://github.com/openai/simple-evals?tab=readme-ov-file#be...
| griomnib wrote:
| This just makes the $200/month even more laughable.
| afro88 wrote:
| How? 4o is part of the plus plan, as is o1.
| ttul wrote:
| The $200 plan is for people who would pay $200 for a bottle
| of vodka even though the $20 bottle is molecularly identical.
| jug wrote:
| This year seems to finish on the same note as it began -- that
| most AI evolution happens in the smaller models. There's been a
| true shift as corporations have started to realize the value of
| training data and massively outsizing the resulting model size.
| jadbox wrote:
| Would anyone be willing to compress this down to maybe 14b-20b
| for us on peasant 16gb rigs?
| orangepanda wrote:
| Or maybe even middle class plebeian 24gb rigs?
| griomnib wrote:
| At that point just run 8b.
| pulse7 wrote:
| Or wait for the IQ2_M quantization of 70b which you can run
| very fast on 24GB VRAM with context size of 4096...
| griomnib wrote:
| At some point there's so much degradation with quantizing
| I think 8b is going to be better for many tasks.
| Narciss wrote:
| This is massive, really cool of meta to open source it
| dartos wrote:
| Hate to be one of those, but it's not open source.
|
| It's just regular old freeware.
|
| You can't built llama yourself and it's license contains a
| (admittedly generous) commercial usage restriction.
| fabiospampinato wrote:
| To be fair if they released detailed instructions and
| datasets on how to rebuild llama (considering that there's
| some randomness in the process) you still probably wouldn't
| be able to build it, like who has the resources? And if you
| had the resources you probably _still_ probably wouldn't
| _want_ to rebuild it yourself, it seems awfully expensive
| when you could instead spend those resources elsewhere.
|
| Fair point about the license, people have different
| definitions for what "open source" means.
| do_not_redeem wrote:
| That's true for most people for ordinary software too. How
| many people actually build Linux or Chromium from source?
| Building Chromium takes more RAM and HD space than most
| people even have. Yet the world gets immense value from the
| few who do. I wouldn't want to live in a world where WebKit
| and Chromium were closed source. You can run a Chromium
| fork without having to build it yourself. And compute costs
| will come down over time.
| fabiospampinato wrote:
| Linux and Chromium seem at the edge of the current scale
| of "ordinary" open-source software. I think perhaps one
| should also take into account how much money would be
| needed to be able to build the thing in reasonable time.
|
| Building Chromium sounds awful, but I'm not sure I'd
| really need to buy another computer for that. If I did
| I'm sure I wouldn't need to spend billions on it, most
| probably not even millions.
|
| For LLaMa I definitely don't have the computer to build
| it, I definitely don't have the money to buy the
| computer, even if I won the lottery tomorrow I'm pretty
| sure I wouldn't have enough money to buy the hardware,
| even if I had enough money to buy the hardware I'm still
| not sure I could actually buy it in reasonable time,
| nvidia may be backlogged for a while, even if I already
| had all the hardware I probably wouldn't want to retrain
| llama, and even if I wanted to retrain it the process is
| probably going to take weeks if not months at best.
|
| Like I think it's one of those things where the
| difference in magnitude creates a difference in kind, one
| can't quite meaningfully compare LLaMa with the
| Calculator app that Ubuntu ships with.
| comex wrote:
| > Building Chromium takes more RAM and HD space than most
| people even have.
|
| According to [1], it takes 16GB of RAM and ~180GB of disk
| space. Most people have that much. It does take several
| hours without a many-core machine though.
|
| Building Linux takes much less.
|
| [1] https://chromium.googlesource.com/chromium/src.git/+/
| master/...
| do_not_redeem wrote:
| I would bet overall most people have those 4GB RAM, 32GB
| eMMC laptops from walmart, etc. If you limit things to
| developers/gamers/enthusiasts, you'd probably be right.
| dartos wrote:
| The practicality of building it yourself has nothing to
| do with an organization affording you that ability.
|
| Also like, gentoo people compile everything
| copperx wrote:
| Perhaps an individual couldn't. But an organization or a
| state could.
| dartos wrote:
| > people have different definitions for what "open source"
| means.
|
| They shouldn't. It's just market confusion.
|
| There is an explicit widely accepted definition.
|
| Also like llama (the file you download from huggingface)
| isn't even a program. It's a binary weights file. No source
| to be opened, even.
|
| It's just freeware.
|
| https://opensource.org/osd
| dartos wrote:
| > To be fair if they released detailed instructions and
| datasets on how to rebuild llama
|
| Where?
|
| Books3 was famously one of the datasets used to train llama
| and it's very illegal to put that together nowadays.
|
| I believe the guy who wrote the script to build it got
| arrested
| profsummergig wrote:
| Please help me understand something.
|
| I've been out of the loop with HuggingFace models.
|
| What can you do with these models?
|
| 1. Can you download them and run them on your Laptop via
| JupyterLab?
|
| 2. What benefits does that get you?
|
| 3. Can you update them regularly (with new data on the internet,
| e.g.)?
|
| 4. Can you finetune them for a specific use case (e.g. GeoSpatial
| data)?
|
| 5. How difficult and time-consuming (person-hours) is it to
| finetune a model?
|
| (If HuggingFace has answers to these questions, please point me
| to the URL. HuggingFace, to me, seems like the early days of
| GitHub. A small number were heavy users, but the rest were left
| scratching their heads and wondering how to use it.)
|
| Granted it's a newbie question, but answers will be beneficial to
| a lot of us out there.
| mhh__ wrote:
| Yes (don't know about JupypterLab), skip, not really, yes,
| quite irritating so just pay someone else to do it.
| profsummergig wrote:
| Thanks! Succinct and complete.
| joshhart wrote:
| Hi,
|
| Yes you can. The community creates quantized variants of these
| that can run on consumer GPUs. A 4-bit quantization of LLAMA
| 70b works pretty well on Macbook pros, the neural engine with
| unified CPU memory is quite solid for these. GPUs is a bit
| tougher because consumer GPU RAM is still kinda small.
|
| You can also fine-tune them. There are lot of frameworks like
| unsloth that make this easier.
| https://github.com/unslothai/unsloth . Fine-tuning can be
| pretty tricky to get right, you need to be aware of things like
| learning rates, but there are good resources on the internet
| where a lot of hobbyists have gotten things working. You do not
| need a PhD in ML to accomplish this. You will, however, need
| data that you can represent textually.
|
| Source: Director of Engineering for model serving at
| Databricks.
| profsummergig wrote:
| Thank you! Very helpful!
| vtail wrote:
| Thank you Josh. Is there a resource you can point us too that
| helps answer "what kind of MacBook pro memory do I need to
| run ABC model at XYZ quantization?"
| Filligree wrote:
| Look at the filesize, add a couple of GB.
| jwitthuhn wrote:
| In general you can just use the parameter count to figure
| that out.
|
| 70B model at 8 bits per parameter would mean 70GB, 4 bits
| is 35GB, etc. But that is just for the raw weights, you
| also need some ram to store the data that is passing
| through the model and the OS eats up some, so add about a
| 10-15% buffer on top of that to make sure you're good.
|
| Also the quality falls off pretty quick once you start
| quantizing below 4-bit so be careful with that, but at
| 3-bit a 70B model should run fine on 32GB of ram.
| aiden3 wrote:
| how would the pricing on databricks when using model serving
| compare to, say, the prices seen in the original post here
| (i.e., "3.3 70B is 25X cheaper than GPT4o")?
| nickpsecurity wrote:
| I've been wanting to run into someone on the Databricks team.
| Can you ask whoever trains models like MPT to consider
| training an open model only on data clear of copyright
| claims? Specifically, one using only Gutenberg and the
| permissive code in The Stack? Or just Gutenberg?
|
| Since I follow Christ, I can't break the law or use what
| might be produced directly from infringement. I might be able
| to do more experiments if a free, legal model is available.
| Also, we can legally copy datasets like PG19 since they're
| public domain. Whereas, most others have works in which I
| might need a license to distribute.
|
| Please forward the request to the model trainers. Even a 7B
| model would let us do a lot of research on optimization
| algorithms, fine-tuning, etc.
| jerpint wrote:
| Basically equivalent to GitHub but for models. Anyone can
| upload anything, but it kind of standardizes tools and
| distribution for everyone. They also have a team that helps
| integrate releases for easier use and libraries for fine tuning
| profsummergig wrote:
| Thanks!
|
| I want to download my first HuggingFace model, and play with
| it. If you know of a resource that can help me decide what to
| start with, please share. If you don't, no worries. Thanks
| again.
| hrpnk wrote:
| Most of the HF models have a code snippet that you can use
| in order to run inference on the model. The transformers
| library will take care of the download as a dependency when
| you run the code. Typically, a python 3.10-3.11 environment
| is sufficient as environment. Example: https://huggingface.
| co/HuggingFaceTB/SmolLM2-1.7B-Instruct#t...
|
| If you have a MBP, you need to adjust the device name in
| the examples from "cuda" to "mps".
| hrpnk wrote:
| Seems that a bunch of quantized models are already uploaded to
| ollama: https://ollama.com/library/llama3.3/tags
| henry2023 wrote:
| I'm building a PC just to run inference on this and the QwQ 32B
| models.
|
| Any suggestions on RAM and GPU I should get?
| griomnib wrote:
| I have an A6000, it's about the most affordable for 48 GB VRAM
| (you can find for a little under $5k sometimes), which is
| roughly minimum to run a quantized 70b.
|
| System RAM doesn't really matter, but I have 128GB anyway as
| RAM is pretty cheap.
| paxys wrote:
| Why not 2 x 4090? Will be cheaper than A6000 if you can
| manage to find them at msrp, and will perform a lot better.
| griomnib wrote:
| My time is worth a lot of money and 2x 4090 is more work,
| so it's net more expensive in real terms.
| BoorishBears wrote:
| For both inference and training I haven't seen any modern
| LLM stack take more time for multiple GPUs/tensor
| parallelism
|
| I would take 1 RTX 6000 Ada, but if you mean the pre-Ada
| 6000, 2x4090 is faster for minimal hassle for most common
| usecases
| griomnib wrote:
| I mean the newest ones. I only do LLM inference, whereas
| my training load is all DistilBERT models and the A6000
| is a beast at cranking those out.
|
| Also by "time" I mean my time setting up the machine and
| doing sys admin. Single card is less hassle.
| BoorishBears wrote:
| The A6000 predates Ada?
|
| There is the RTX 6000 Ada (practically unrelated to the
| A6000) which has 4090 level performance, that what you're
| referring to?
| griomnib wrote:
| This one.
|
| https://www.bhphotovideo.com/c/product/1607840-REG/pny_te
| chn...
| elorant wrote:
| As a general rule of thumb, for 8-bit quantization you'll need
| as much VRAM as the parameters of the model. Half that if you
| run it at 4-bit. System RAM should be a bit more because you
| need RAM for the OS too. What most users do is go for second
| hand 3090s. You get two or three of them depending on
| quantization.
| pulse7 wrote:
| You also need VRAM for context: the larger the context, the
| more RAM you need.
| loudmax wrote:
| This of course depends on your budget and what you expect to do
| with these models. For a lot of people, the most cost-effective
| solution is probably to rent a GPU in the cloud.
|
| The limiting factor for running LLMs on consumer grade hardware
| is generally how much memory your GPU has access to. This is
| VRAM that's built into the GPU. On non-Apple hardware, the
| GPU's bandwidth to system RAM is so constrained that you might
| as well run those operations on the CPU.
|
| The cheapest PC solution is usually second-hand RTX 3090's.
| These can be had for around $700 and they have 24G of VRAM. An
| RTX 4090 also has 24G of VRAM, but they're about twice as
| expensive, so for that price you're probably better off getting
| two 3090's than a single 4090.
|
| Llama.cpp runs on the CPU and supports GPU offloading, so you
| can run a model partly on CPU and partly on GPU. Running
| anything on the CPU will slow down performance considerably,
| but it does mean that you can reasonably run a model that's
| slightly bigger than will fit in VRAM.
|
| Quantization works by trimming the least significant digits
| from the models' parameters, so the model uses less memory at
| the cost of slight brain damage. A lightly quantized version of
| QwQ 32B will fit onto a single 3090. A 70B parameter model will
| need to be quantized down to Q3 or so to run entirely on a
| 3090. Or you could run a model quantized to Q4 or Q5, but
| expect only a few tokens per second. We'll need to see how well
| the quantized versions of this new model behave in practice.
|
| Apple's M1-M4 series chips have unified memory so their GPU has
| access to the system RAM. If you like using a Mac and you were
| thinking of getting one anyway, they're not a bad choice. But
| you'll want to get a Mac with as much RAM as you can and
| they're not cheap.
| hubraumhugo wrote:
| Meta continues to overdeliver. Their goal from the start was to
| target and disrupt OpenAI/Anthropic with a scorched earth
| approach by releasing powerful open models.
|
| The big winners: we developers.
| ben30 wrote:
| This reminds me of Steve Jobs's famous comment to Dropbox about
| storage being 'a feature, not a product.' Zuckerberg - by open-
| sourcing these powerful models, he's effectively commoditising AI
| while Meta's real business model remains centred around their
| social platforms. They can leverage these models to enhance
| Facebook and Instagram's services while simultaneously benefiting
| from the community improvements and attention. It's not about
| selling AI; it's about using AI to strengthen their core
| business. By making it open, they get the benefits of widespread
| adoption and development without needing to monetise the models
| directly.
| lolinder wrote:
| Also don't underestimate the value they're getting from making
| more overtures to the developer community. It could be a
| coincidence, but it's only since they started releasing these
| models that I started noticing people on HN calling them
| "Meta", and attitudes towards them have been far more positive
| of late than usual.
|
| Good will isn't worth as much as cheap moderation automation
| and fancy features, but it's worth something.
| ecocentrik wrote:
| It would be strange if they didn't also use these models to
| generate much more sophisticated models of their user's
| interests and hyper-targeted advertising that always looks
| and feels like trusted friend recommendations for the exact
| product that's been missing from your life.
| lolinder wrote:
| There's the old HN! I knew it was around here somewhere!
| sangnoir wrote:
| > It would be strange if they didn't also use these models
| to generate much more sophisticated models of their user's
| interests and generate the hyper targeted advertising
|
| You have the sequence reversed as Meta already created ad
| targeting models. Meta was forced to scale its AI
| competence for ad targeting when Apple sent out a privacy
| update that destroyed tracking-based ad-serving and tanked
| Meta's share price by deleting billions in revenue for Meta
| over many quarters. Now that Meta has this skill as a core-
| competence, they are creating new models for public
| release. Why they are doing so is debatable[2], but I
| imagine the cost is marginal since they already had the GPU
| clusters, talent and know-how for survival purposes.
|
| 1. https://www.businessinsider.com/metas-bet-on-ai-has-
| saved-it...
|
| 2. I suspect Zuckerberg is not enthused by the idea of a
| future AI Apple-analog unilaterally shutting him out of the
| market. Having your net worth cut in half by a press-
| release has got to hurt.
| ecocentrik wrote:
| Thanks for the article but I don't think I have the
| sequence reversed.
|
| Without access to the tracking signal, it's been more
| important to build out a system that can recreate the
| value from that lost signal by analyzing what users are
| actually sharing and saying on their platform. Hence the
| importance of chat (VR, text, video...) and AI that can
| be used to process and extract value from a chat signal.
|
| I believe Meta's primary revenue source is still
| advertising (98%), so that is probably 98% of the why.
| sangnoir wrote:
| > Thanks for the article but I don't think I have the
| sequence reversed
|
| I suppose I read your first sentence as being in future
| tense when it might not be. The main thrust if my
| argument is that Meta _already_ successfully built those
| ad targeting models (Advantage+), and they preceded the
| Llama releases, so they don 't need to use Llama-derived
| models for ad targeting, as I understood your comment to
| be suggesting. The sequence was not/will not be "Llama ->
| ad targeting", but was "ad targeting -> Llama"
|
| Meta didn't have to release the weights of the models. Ad
| revenue doesn't explain why they did so.
| swalsh wrote:
| I call them OpenAI instead of Meta.
| grahamj wrote:
| I'm sticking with fuckerbook
| benatkin wrote:
| I'm sticking with:
|
| > They "trust me"
|
| > Dumb fucks
|
| Quotation marks his, not mine. It adds a certain vibe to
| it.
| LordDragonfang wrote:
| It's funny how quickly Zuck managed to turn his image around
| from "data-stealing actual lizard person" to "kind of normal
| guy" with a few years and a haircut. It's also not lost on me
| that he's the only "hacker" major tech CEO remaining:
| - Sundar is a glorified bean counter and his company is
| rotting from the inside, only kept afloat by the money
| printer that is ads. - Satya and Microsoft are in a
| similar boat, with the only major achievement being
| essentially buying OpenAI while every other product gets
| worse - Tim Cook is doing good things with Apple, but
| he still runs the company more like a fashion company than a
| tech company - Amazon was always more about logistics
| than cool hack value, and that hasn't changed since Bezos
| left - Elon is Elon
|
| Meanwhile Zuck is spending shareholder money pushing forward
| consumer VR because he thinks it's cool, demoing true AR
| glasses, releasing open-source models, and building giant
| Roman-style statues of his wife.
| talldayo wrote:
| I feel like owning Oculus must be the equivalent of the
| curse of the monkey's paw. It drove Palmer Luckey
| veritably[0] insane[1] when he was the head of it, and
| between the statue nonsense, the weaponizing of Llama and
| his angling for a position as a Trump advisor, it sure
| looks like Zuckerburg wants to head down the same path.
|
| [0] https://www.vice.com/en/article/palmer-luckey-made-a-
| vr-head...
|
| [1] https://www.codastory.com/authoritarian-tech/us-border-
| surve...
| talldayo wrote:
| It's funny. The only time I've _ever_ seen Hacker News
| unanimously applaud a Facebook product was when Pytorch
| announced they merged Apple Silicon support. Seems like Mr.
| Zuckerburg knows how to play a winning hand.
| yodsanklai wrote:
| > Also don't underestimate the value they're getting from
| making more overtures to the developer community.
|
| I wonder if it's significant. As developers, we're biased to
| think it matters, but in the grand scheme of things, 99.99%
| of people don't have a clue about open source or things that
| matter to hackers. As far as recruitment go, developers look
| primarily at how much they make, possibly the tech and how it
| looks on resume. There's always been a stigma around social
| networks and generally big tech companies, but not to the
| point it's going to hurt them.
| joelthelion wrote:
| It might make it a bit easier to hire engineers. Working
| for Facebook? Meh. Working for the "new Meta", on the other
| hand?
| a_wild_dandan wrote:
| AI researchers get huge checks working anywhere. So how
| does your company distinguish its appeal? Play to a
| researcher's desire for recognition in their field. Lean on
| open source. (I'm just paraphrasing Zuckerberg.)
| wongarsu wrote:
| Similar how places like SpaceX or game development are
| known for low pay and bad working conditions, yet have no
| shortage of applicants.
|
| No matter whether you want to hire the best of the best
| or just average people at a lower than average price,
| being a place where people want to work helps immensely
| ben_w wrote:
| I agree there's a lot of decisions that have to be made on
| the basis of "the tech and how it looks on resume" (as I'm
| finding out by _not_ having React Native on mine).
|
| > There's always been a stigma around social networks and
| generally big tech companies, but not to the point it's
| going to hurt them.
|
| I'm not sure it was "always":
|
| The one Facebook developer event I've been to made me feel
| dirty just to associate with them, but before that I had no
| negative feelings. It started off as "the new LiveJournal".
|
| Deleted my account for a few years, only came back to it
| when I started planning to move country and wanted to keep
| in contact with those who stayed put.
| signal11 wrote:
| Facebook Engineering has always been well regarded --
| starting with React on the front end, but also projects like
| Open Compute.
|
| Their product management on the other hand-- well, I mean,
| Facebook and Instagram are arguably as popular as McDonald's.
| So they've got that going for them.
| talldayo wrote:
| Spot-on. Hacker News has a hate-boner for Facebook and the
| Facebook-branded products, but from an engineering
| standpoint they're up there with Netflix solving problems
| at scale. FAIR had a better reputation than OpenAI since
| the start, and they've developed the Oculus acquisition to
| it's logical extremes. You can do a lot worse, looking at
| the way their peers handle similar situations.
|
| All the big tech companies have a Facebook-esque product
| they wish they could get rid of forever. Meta _has_
| Facebook, and instead of imploding like everyone said they
| would (for decades) they demonstrated competency in
| engineering and culture. The next 4 years will be a
| gauntlet with a literal "Mr. X" advising social media
| policy, but I frankly don't think Facebook has ever been
| down for the count in a pragmatic sense.
| huijzer wrote:
| I already was thinking for a while what the business model of
| open source was exactly. Why does Google spend money on Chrome
| also? After Zuckerberg's comments it hit me:
|
| Open source is useful for a business if it can either increase
| revenue or decrease costs.
|
| Examples:
|
| Increase revenue: Chrome and Visual Studio code. For example,
| the more people code, the more likely it is that they pay MSFT.
| So VS code aims to make programming as attractive as possible.
| Similar for Chrome.
|
| Decrease costs: Linux and Llama. As Zuckerburg said himself
| IIRC, they don't want one party snowball into an LLM monopoly
| so they rather help to get the open source ball rolling.
| barbazoo wrote:
| > For example, the more people code, the more likely it is
| that they pay MSFT. So VS code aims to make programming as
| attractive as possible
|
| How does that increase revenue in a remotely measurable way?
|
| Chrome, for sure, high market share, default search engine,
| more money, at least that's how I imagine it.
| jyap wrote:
| One example is that Microsoft owns GitHub.
| rafaelmn wrote:
| I think Balmers "developers, developers, developers" meme has
| been around longer than some people here are alive - it
| served them well in the Windows era and it serves them well
| in the cloud space.
| barbazoo wrote:
| Do the improvements the community proposes/makes to their
| models amount to anything significant? For a company like Meta
| with basically infinite money, do they really benefit from
| external help?
| lolinder wrote:
| I don't have eyes inside of Meta, but keep in mind that we're
| not just talking about fine-tunes and LoRAs, we're also
| talking about the entire llama ecosystem, including
| llama.cpp, ollama, and llamafile. These would not exist (or
| wouldn't have anything like as much momentum) without Meta's
| investment, but they're now huge community projects.
|
| I don't know if they use them internally, of course, but they
| _could_ , and they represent a _lot_ of work.
| muixoozie wrote:
| >storage being 'a feature, not a product.
|
| Somewhat unrelated mini-rant. Upgraded a phone recently after
| about 3 years. Surprised to see storage still capped around
| 128GB (in-general). That's got to be artificially held back
| capacity to push cloud storage services?
| viraptor wrote:
| There's lots of phones with more and/or with SD slots. It's
| not really "capped" as much as default size that seems to
| work just fine for the majority.
| rafaelmn wrote:
| Ironically gen AI made their products worse more than the rest.
| I can't believe the amount of AI slop I see every time I open
| Facebook - I'd check it occasionally when replying to messenger
| and scroll through for a while, after seeing the AI spam I
| don't even bother.
| jazzyjackson wrote:
| truly I wonder if they're fooled by their own click fraud,
| or, if the incentives really do work out that they get paid
| whether the engagement is from bots or people, but anyway I
| came here to say the same thing, it's shocking to me how
| enthusiastic Zuckerburg is about generative AI, what other
| possible outcome is there except actual human content
| creation being replaced by slop
| andy_ppp wrote:
| Baggy Tees, gold chains and now this!? Make this man president
| immediately!
| petercooper wrote:
| Commoditize your complement: https://gwern.net/complement
| theanonymousone wrote:
| I'm "tracking" the price of if 1M tokens in OpenRouter and it is
| decreasing every few refreshes. It's funny:
| https://openrouter.ai/meta-llama/llama-3.3-70b-instruct
| philipkiely wrote:
| Just spent a few minutes this morning spinning up a H100 model
| server and trying an FP8 quantized version (including kv cache
| quantization) to fit it on 2 H100s -- speed and quality looking
| promising.
|
| I'm excited to see if the better instruction following benchmarks
| improves function calling / agentic capabilities.
| bnchrch wrote:
| Open Sourcing Llama is one of the best example and roll out of
| "Commoditize Your Complement" in memory.
|
| Link to Gwern's "Laws of Tech: Commoditize Your Complement" for
| those who havent heard of this strategy before
|
| https://gwern.net/complement
| theanonymousone wrote:
| Given the comments saying it's performance seems comparable to
| 4o/4o-mini, is it safe to say that GPT-4 performance can be
| achieved with less than 100B parameters,in contrary to what
| previously was thought?
| adt wrote:
| Model card: https://github.com/meta-llama/llama-
| models/blob/main/models/...
|
| On the Models Table: https://lifearchitect.ai/models-table/
| andy_ppp wrote:
| How many tokens per second can I get on an M4 Max with 128gb of
| RAM?
| freediver wrote:
| Does unexpectedly well on our benchmark:
|
| https://help.kagi.com/kagi/ai/llm-benchmark.html
|
| Will dive into it more, but this is impressive.
| danielhanchen wrote:
| I uploaded 4bit bitsandbytes, GGUFs and original 16bit weights to
| https://huggingface.co/unsloth for those interested! You can also
| finetune Llama 3.3 70B in under 48GB of VRAM and 2x faster and
| use 70% less memory with Unsloth!
___________________________________________________________________
(page generated 2024-12-06 23:00 UTC)