[HN Gopher] Llama 3.2: Revolutionizing edge AI and vision with o...
       ___________________________________________________________________
        
       Llama 3.2: Revolutionizing edge AI and vision with open,
       customizable models
        
       Author : nmwnmw
       Score  : 172 points
       Date   : 2024-09-25 17:29 UTC (5 hours ago)
        
 (HTM) web link (ai.meta.com)
 (TXT) w3m dump (ai.meta.com)
        
       | TheAceOfHearts wrote:
       | I still can't access the hosted model at meta.ai from Puerto
       | Rico, despite us being U.S. citizens. I don't know what Meta has
       | against us.
       | 
       | Could someone try giving the 90b model this word search problem
       | [0] and tell me how it performs? So far with every model I've
       | tried, none has ever managed to find a single word correctly.
       | 
       | [0] https://imgur.com/i9Ps1v6
        
         | Workaccount2 wrote:
         | This is likely because the models use OCR on images with text,
         | and once parsed the word search doesn't make sense anymore.
         | 
         | Would be interesting to see a model just working on raw input
         | though.
        
           | simonw wrote:
           | Image models such as Llama 3.2 11B and 90B (and the Claude 3
           | series, and Microsoft Phi-3.5-vision-instruct, and PaliGemma,
           | and GPT-4o) don't run OCR as a separate step. Everything they
           | do is from that raw vision model.
        
         | paxys wrote:
         | Non US citizens can access the model just fine, if that's what
         | you are implying.
        
           | TheAceOfHearts wrote:
           | I'm not implying anything. It's just frustrating that despite
           | being a US territory with US citizens, PR isn't allowed to
           | use this service without any explanation.
        
             | paxys wrote:
             | Just because you cannot access the model doesn't mean all
             | of Puerto Rico is blocked.
        
               | TheAceOfHearts wrote:
               | When I visit meta.ai it says:
               | 
               | > Meta AI isn't available yet in your country
               | 
               | Maybe it's just my ISP, I'll ask some friends if they can
               | access the service.
        
               | paxys wrote:
               | meta.ai is their AI service (similar to ChatGPT). The
               | model source itself is hosted on llama.com.
        
               | TheAceOfHearts wrote:
               | I'm aware. I wanted to try out their hosted version of
               | the model because I'm GPU poor.
        
               | elcomet wrote:
               | You can try it on hugging face
        
       | nmwnmw wrote:
       | - Llama 3.2 introduces small vision LLMs (11B and 90B parameters)
       | and lightweight text-only models (1B and 3B) for edge/mobile
       | devices, with the smaller models supporting 128K token context.
       | 
       | - The 11B and 90B vision models are competitive with leading
       | closed models like Claude 3 Haiku on image understanding tasks,
       | while being open and customizable.
       | 
       | - Llama 3.2 comes with official Llama Stack distributions to
       | simplify deployment across environments (cloud, on-prem, edge),
       | including support for RAG and safety features.
       | 
       | - The lightweight 1B and 3B models are optimized for on-device
       | use cases like summarization and instruction following.
        
       | opdahl wrote:
       | I'm blown away with just how open the Llama team at Meta is. It
       | is nice to see that they are not only giving access to the
       | models, but they at the same time are open about how they built
       | them. I don't know how the future is going to go in the terms of
       | models, but I sure am grateful that Meta has taken this position,
       | and are pushing more openness.
        
         | nickpsecurity wrote:
         | Do they tell you what training data they use for alignment? As
         | in, what biases they intentionally put in the system they're
         | widely deploying?
        
           | warkdarrior wrote:
           | Do you have some concrete example of biases in their models?
           | Or are you just fishing for something to complain about?
        
       | resters wrote:
       | This is great! Does anyone know if the llama models are trained
       | to do function calling like openAI models are? And/or are there
       | any function calling training datasets?
        
         | refulgentis wrote:
         | Yes (rationale: 3.1 was, would be strange to rollback.)
         | 
         | In general, you'll do a ton of damage by constraining token
         | generation to valid JSON - I've seen models as small as 800M
         | handle JSON with that. It's ~impossible to train constraining
         | into it with remotely the same reliability -- you have to erase
         | a ton of conversational training that makes it say ex. "Sure!
         | Here's the JSON you requested:"
        
           | Closi wrote:
           | What about OpenAI Structured Outputs? This seems to do
           | exactly this.
        
             | refulgentis wrote:
             | Correct, I think so too, seemed that update must be doing
             | exactly this. tl;dr: in the context of Llama fn calling
             | reliability, you don't need to reach for training, in fact,
             | you'll do it and still have the same problem.
        
             | zackangelo wrote:
             | I'm building this type of functionality on top of Llama
             | models if you're interested:
             | https://docs.mixlayer.com/examples/json-output
        
         | TmpstsTrrctta wrote:
         | They mention tool calling in the link for the smaller models,
         | and compare to 8B levels of function calling in benchmarks
         | here:
         | 
         | https://news.ycombinator.com/item?id=41651126
        
         | ushakov wrote:
         | yes, but only the text-only models!
         | 
         | https://www.llama.com/docs/model-cards-and-prompt-formats/ll...
        
           | zackangelo wrote:
           | This is incorrect:
           | 
           | > With text-only inputs, the Llama 3.2 Vision Models can do
           | tool-calling exactly like their Llama 3.1 Text Model
           | counterparts. You can use either the system or user prompts
           | to provide the function definitions.
           | 
           | > Currently the vision models don't support tool-calling with
           | text+image inputs.
           | 
           | They support it, but not when an image is submitted in the
           | prompt. I'd be curious to see what the model does. Meta
           | typically sets conservative expectations around this type of
           | behavior (e.g., they say that the 3.1 8b model won't do
           | multiple tool calls, but in my experience it does so just
           | fine).
        
           | winddude wrote:
           | the vision models can also do tool calling according to the
           | docs, but with text-only inputs, maybe that's what you meant
           | ~ <https://www.llama.com/docs/model-cards-and-prompt-
           | formats/ll...>
        
       | moffkalast wrote:
       | I've just tested the 1B and 3B at Q8, some interesting bits:
       | 
       | - The 1B is extremely coherent (feels something like maybe
       | Mistral 7B at 4 bits), and with flash attention and 4 bit KV
       | cache it only uses about 4.2 GB of VRAM for 128k context
       | 
       | - A Pi 5 runs the 1B at 8.4 tok/s, haven't tested the 3B yet but
       | it might need a lower quant to fit it and with 9T training tokens
       | it'll probably degrade pretty badly
       | 
       | - The 3B is a certified Gemma-2-2B killer
       | 
       | Given that llama.cpp doesn't support any multimodality (they
       | removed the old implementation), it might be a while before the
       | 11B and 90B become runnable. Doesn't seem like they outperform
       | Qwen-2-VL at vision benchmarks though.
        
         | Patrick_Devine wrote:
         | Hoping to get this out soon w/ Ollama. Just working out a
         | couple of last kinks. The 11b model is legit good though,
         | particularly for tasks like OCR. It can actually read my
         | cursive handwriting.
        
       | gdiamos wrote:
       | Llama 3.2 includes a 1B parameter model. This should be 8x higher
       | throughput for data pipelines. In our experience, smaller models
       | are just fine for simple tasks like reading paragraphs from PDF
       | documents.
        
       | gdiamos wrote:
       | Do inference frameworks like vllm support vision?
        
         | woodson wrote:
         | Yes, vLLM does (though marked experimental):
         | https://docs.vllm.ai/en/latest/models/vlm.html
        
       | minimaxir wrote:
       | Off topic/meta, but the Llama 3.2 news topic received many, many
       | HN submissions and upvotes but never made it to the front page:
       | the fact that it's on the front page now indicates that
       | moderators intervened to rescue it:
       | https://news.ycombinator.com/from?site=meta.com (showdead on)
       | 
       | If there's an algorithmic penalty against the news for whatever
       | reason, that may be a flaw in the HN ranking algorithm.
        
         | makin wrote:
         | The main issue was that Meta quickly took down the first
         | announcement, and the only remaining working submission was the
         | information-sparse HuggingFace link. By the time the other
         | links were back up, it was too late. Perfect opportunity for a
         | rescue.
        
       | dhbradshaw wrote:
       | Tried out 3B on ollama, asking questions in optics, bio, and
       | rust.
       | 
       | It's super fast with a lot of knowledge, a large context and
       | great understanding. Really impressive model.
        
         | tomComb wrote:
         | I question whether a 3B model can have "a lot of knowledge".
        
           | foxhop wrote:
           | My guess is it uses the same vocabulary size as llama 3.1
           | which is 128,000 different tokens (words) to support many
           | languages. Parameter count is less of an indicator of fitness
           | than previously thought.
        
       | sva_ wrote:
       | Curious about the multimodal model's architecture. But alas, when
       | I try to request access
       | 
       | > Llama 3.2 Multimodal is not available in your region.
       | 
       | It sounds like they input the continuous output of an image
       | encoder into a transformer, similar to transfusion[0]? Does
       | someone know where to find more details?
       | 
       | Edit:
       | 
       |  _> Regarding the licensing terms, Llama 3.2 comes with a very
       | similar license to Llama 3.1, with one key difference in the
       | acceptable use policy: any individual domiciled in, or a company
       | with a principal place of business in, the European Union is not
       | being granted the license rights to use multimodal models
       | included in Llama 3.2._ [1]
       | 
       | What a bummer.
       | 
       | 0. https://www.arxiv.org/abs/2408.11039
       | 
       | 1. https://huggingface.co/blog/llama32#llama-32-license-
       | changes...
        
         | _ink_ wrote:
         | Oh. That's sad indeed. What might be the reason for excluding
         | Europe?
        
           | Arubis wrote:
           | Glibly, Europe has the gall to even consider writing
           | regulations without asking the regulated parties for
           | permission.
        
             | pocketarc wrote:
             | Between this and Apple's policies, big tech corporations
             | really seem to be putting the screws to the EU as much as
             | they can.
             | 
             | "See, consumers? Look at how bad your regulation is, that
             | you're missing out on all these cool things we're working
             | on. Talk to your politicians!"
             | 
             | Regardless of your political opinion on the subject, you've
             | got to admit, at the very least, it will be educational to
             | see how this develops over the next 5-10 years of tech
             | progress, as the EU gets excluded from more and more
             | things.
        
               | DannyBee wrote:
               | Or, again, they are just deciding the economy isn't worth
               | the cost. (or not worth prioritizing upfront or ....)
               | 
               | When we had numerous discussions on HN as these rules
               | were implemented, this is precisely what the europeans
               | said should happen.
               | 
               | So why does it now have to be some concerted effort to
               | "put the screws to EU"?
               | 
               | I otherwise agree it will be interesting, but mostly in
               | the sense that i watched people swear up and down this
               | was just about protecting EU citizens and they were fine
               | with none of these companies doing anything in the EU or
               | not prioritizing the EU if they decided it wasn't worth
               | the cost.
               | 
               | We'll see if that's true or not, i guess, or if they
               | really wanted it to be "you have to do it, but on our
               | terms" or whatever.
        
               | imiric wrote:
               | > Between this and Apple's policies, big tech
               | corporations really seem to be putting the screws to the
               | EU as much as they can.
               | 
               | Funny, I see that the other way around, actually. The EU
               | is forcing Big Tech to be transparent and not exploit
               | their users. It's the companies that must choose to
               | comply, or take their business elsewhere. Let's not
               | forget that Apple users in the EU can use 3rd-party
               | stores, and it was EU regulations that forced Apple to
               | switch to USB-C. All of these are a win for consumers.
               | 
               | The reason Meta is not making their models available in
               | the EU is because they can't or won't comply with the
               | recent AI regulations. This only means that the law is
               | working as intended.
               | 
               | > it will be educational to see how this develops over
               | the next 5-10 years of tech progress, as the EU gets
               | excluded from more and more things.
               | 
               | I don't think we're missing much that Big Tech has to
               | offer, and we'll probably be better off for it. I'm
               | actually in favor of even stricter regulations,
               | particularly around AI, but what was recently enacted is
               | a good start.
        
             | aftbit wrote:
             | This makes it sound like some kind of retaliation, instead
             | of Meta attempting to comply with the very regulations
             | you're talking about. Maybe llama3.2 would violate the
             | existing face recognition database policies?
        
             | DannyBee wrote:
             | Why is it that and not just cost/benefit for them?
             | 
             | They've decided it's not worth their time/energy to do it
             | right now in a way that complies with regulation (or
             | whatever)
             | 
             | Isn't that precisely the choice the EU wants them to make?
             | 
             | Either do it within the bounds of what we want, or leave us
             | out of it?
        
           | paxys wrote:
           | Punishment. "Your government passes laws we don't like, so we
           | aren't going to let you have our latest toys".
        
         | GaggiX wrote:
         | Fortunately, Qwen-2-VL exists, it is pretty good and under an
         | actual open source license, Apache 2.0.
         | 
         | Edit: the larger 72B model is not under Apache 2.0 but
         | https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/blob/main/...
         | 
         | Qwen2-VL-72B seems to perform better than llama-3.2-90B on
         | visual tasks.
        
         | mrfinn wrote:
         | Pity, it's over. We'll never ever be able to download those ten
         | gigabytes files, at the other side of the fence.
        
         | Y_Y wrote:
         | I hereby grant license to anyone in the EU to do whatever they
         | want with this.
        
           | lawlessone wrote:
           | Cheers :)
        
           | moffkalast wrote:
           | Well you said hereby so it must be law.
        
         | btdmaster wrote:
         | Full text:
         | 
         | https://github.com/meta-llama/llama-models/blob/main/models/...
         | 
         | https://github.com/meta-llama/llama-models/blob/main/models/...
         | 
         | > With respect to any multimodal models included in Llama 3.2,
         | the rights granted under Section 1(a) of the Llama 3.2
         | Community License Agreement are not being granted to you if you
         | are an individual domiciled in, or a company with a principal
         | place of business in, the European Union. This restriction does
         | not apply to end users of a product or service that
         | incorporates any such multimodal models.
        
         | ankit219 wrote:
         | If you are still curious about the architecture, from the blog:
         | 
         | > To add image input support, we trained a set of adapter
         | weights that integrate the pre-trained image encoder into the
         | pre-trained language model. The adapter consists of a series of
         | cross-attention layers that feed image encoder representations
         | into the language model. We trained the adapter on text-image
         | pairs to align the image representations with the language
         | representations. During adapter training, we also updated the
         | parameters of the image encoder, but intentionally did not
         | update the language-model parameters. By doing that, we keep
         | all the text-only capabilities intact, providing developers a
         | drop-in replacement for Llama 3.1 models.
         | 
         | What this crudely means is that they extended the base Llama
         | 3.1, to include image based weights and inference. You can do
         | that if you freeze the existing weights. add new ones which are
         | then updated during training runs (adapter training). Then they
         | did SFT and RLHF runs on the composite model (for lack of a
         | better word). This is a little known technique, and very
         | effective. I just had a paper accepted about a similar
         | technique, will share a blog once that is published if you are
         | interested (though it's not on this scale, and probably not as
         | effective). Side note: That is also why you see param size of
         | 11B and 90B as addition from the text only models.
        
         | IAdkH wrote:
         | Again, we see that Llama is totally open source! Practically
         | BSD licensed!
         | 
         | So the issue is privacy:
         | 
         | https://www.itpro.com/technology/artificial-intelligence/met...
         | 
         | "Meta aims to use the models in its platforms, as well as on
         | its Ray-Ban smart glasses, according to a report from Axios."
         | 
         | I suppose that means that Ray Ban smart glasses surveil the
         | environment and upload the victim's identities to Meta,
         | presumably for further training of models. Good that the EU
         | protects us from such schemes.
        
       | getcrunk wrote:
       | Still no 14/30b parameter models since llama 2. Seriously killing
       | real usability for power users/diy.
       | 
       | The 7/8B models are great for poc and moving to edge for minor
       | use cases ... but there's a big and empty gap till 70b that most
       | people can't run.
       | 
       | The tin foil hat in me is saying this is the compromise the
       | powers that be have agreed too. Basically being "open" but
       | practically gimped for average joe techie. Basically arms control
        
         | swader999 wrote:
         | You don't need an F-15 to play at least, a decent sniper rifle
         | will do. You can still practise even with a pellet gun. I'm
         | running 70b models on my M2 max with 96 ram. Even larger models
         | sort of work, although I haven't really put much time into
         | anything above 70b.
        
         | foxhop wrote:
         | 4090 has 24G
         | 
         | So we really need ~40B or G model (two cards) or like a ~20B
         | with some room for context window.
         | 
         | 5090 has ??G - still unreleased
        
       | kingkongjaffa wrote:
       | llama3.2:3b-instruct-q8_0 is performing better than 3.1 8b-q4 on
       | my macbookpro M1. It's faster and the results are better. It
       | answered a few riddles and thought experiments better despite
       | being 3b vs 8b.
       | 
       | I just removed my install of 3.1-8b.
       | 
       | my ollama list is currently:
       | 
       | $ ollama list
       | 
       | NAME ID SIZE MODIFIED
       | 
       | llama3.2:3b-instruct-q8_0 e410b836fe61 3.4 GB 2 hours ago
       | 
       | gemma2:9b-instruct-q4_1 5bfc4cf059e2 6.0 GB 3 days ago
       | 
       | phi3.5:3.8b-mini-instruct-q8_0 8b50e8e1e216 4.1 GB 3 days ago
       | 
       | mxbai-embed-large:latest 468836162de7 669 MB 3 months ago
        
         | taneq wrote:
         | For a second I read that as " _it_ just removed my install of
         | 3.1-8b" :D
        
       | sk11001 wrote:
       | Can one of thse models be run on a single machine? What specs do
       | you need?
        
         | Y_Y wrote:
         | Absolutely! They have a billion-parameter model that will run
         | on my first computer if we quantize it to 1.5 bits. But
         | realistically yes, if you can fit in total ram you can run it
         | slowly, if you can fit it in gpu ram you can probably run it
         | fast enough to chat.
        
       | GaggiX wrote:
       | The 90B seem to perform pretty weak on visual tasks compare to
       | Qwen2-VL-72B: https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct,
       | or am I missing something?
        
       | kombine wrote:
       | Are these models suitable for Code assistance - as an alternative
       | to Cursor or Copilot?
        
       | a_wild_dandan wrote:
       | "The Llama jumped over the ______!" (Fence? River? Wall?
       | Synagogue?)
       | 
       | With 1-hot encoding, the answer is "wall", with 100% probability.
       | Oh, you gave plausibility to "fence" too? WRONG! ENJOY MORE
       | PENALTY, SCRUB!
       | 
       | I believe this unforgiving dynamic is why model distillation
       | works well. The original teacher model had to learn via the "hot
       | or cold" game on _text_ answers. But when the child instead
       | imitates the teacher 's predictions, it learns _semantically
       | rich_ answers. That strikes me as vastly more compute-efficient.
       | So to me, it makes sense why these Llama 3.2 edge models punch so
       | far above their weight(s). But it still blows my mind thinking
       | how far models have advanced from a year or two ago. Kudos to
       | Meta for these releases.
        
       | bottlepalm wrote:
       | What mobile devices can the smaller models run on? iPhone,
       | Android?
        
       | simonw wrote:
       | I'm absolutely amazed at how capable the new 1B model is,
       | considering it's just a 1.3GB download (for the Ollama GGUF
       | version).
       | 
       | I tried running a full codebase through it (since it can handle
       | 128,000 tokens) and asking it to summarize the code - it did a
       | surprisingly decent job, incomplete but still unbelievable for a
       | model that tiny:
       | https://gist.github.com/simonw/64c5f5b111fe473999144932bef42...
       | 
       | More of my notes here:
       | https://simonwillison.net/2024/Sep/25/llama-32/
       | 
       | I've been trying out the larger image models to using the
       | versions hosted on https://lmarena.ai/ - navigate to "Direct
       | Chat" and you can select them from the dropdown and upload images
       | to run prompts.
        
         | GaggiX wrote:
         | Llama 3.2 vision models don't seem that great if they have to
         | compare them to Claude 3 Haiku or GPT4o-mini. For an open
         | alternative I would use Qwen-2-72B model, it's smaller than the
         | 90B and seems to perform quite better. Also Qwen2-VL-7B as an
         | alternative to Llama-3.2-11B, smaller, better in visual
         | benchmarks and also Apache 2.0.
        
         | foxhop wrote:
         | The llama 3.0, 3.1, & 3.2 all use the TikToken tokenizer which
         | is the open source openai tokenizer.
        
       | JohnHammersley wrote:
       | Ollama post: https://ollama.com/blog/llama3.2
        
       | gunalx wrote:
       | 3b was pretty good at multimodal (Norwegian) still a lot of
       | gibberish at times, and way more sensitive than 8b but more
       | usable than Gemma 2 2b at multi modal, fine at my python list
       | sorter with args standard question. But 90b vision just refuses
       | all my actually useful tasks like helping recreate the images in
       | html or do anything useful with the image data other than
       | describing it. Have not gotten as stuck with 70b or openai
       | before. Insane amount of refusals all the time.
        
       | thimabi wrote:
       | Does anyone know how these models fare in terms of multilingual
       | real-world usage? I've used previous iterations of llama models
       | and they all seemed to be lacking in that regard.
        
       ___________________________________________________________________
       (page generated 2024-09-25 23:00 UTC)