[HN Gopher] LeCun: Qualcomm working with Meta to run Llama-2 on ...
___________________________________________________________________
LeCun: Qualcomm working with Meta to run Llama-2 on mobile devices
Author : birriel
Score : 97 points
Date : 2023-07-23 15:58 UTC (7 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| api wrote:
| ... and the Great Cosmic Mind told us it's name, and it was
| Llama. Of its origin or the reason for its name it recalled
| nothing more than torrents and hugging faces and a being called
| TheBloke. The Llama simply was, is, and shall be, until it shall
| witness the heat death of the universe as fragments huddled
| around the last evaporating black holes.
| tamimio wrote:
| And let me guess, it will be used to intelligently identify and
| track users? Fecebook is desperate now on how it can harvest
| mosre data from anyone even if they decided not to use any
| fecebook products..
| superkuh wrote:
| Great? The community already did it with llama.cpp. Knowing the
| memory bandwidth bottleneck I can't imagine phones are going to
| do very well. But hey, llamas (1 and 2) run on rpi4, so it'll
| work. Just really, unusably, slow.
| wyldfire wrote:
| The work involved probably includes porting to the Snapdragon
| NSP for throughput and efficiency's sake.
|
| For LLMs the biggest challenge is addressing such a large model
| - or finding a balance between the model size and its
| capability on a mobile device.
| pavlov wrote:
| If only someone could convince a CPU company to optimize the
| chips for this workload. Oh, wait...
| smoldesu wrote:
| Like ARM? https://github.com/ARM-software/armnn
|
| Optimization for this workload has arguably been in-progress
| for decades. Modern AVX instructions can be found in laptops
| that are a decade old now, and most big inferencing projects
| are built around SIMD or GPU shaders. Unless your computer
| ships with onboard Nvidia hardware, there's usually not much
| difference in inferencing performance.
| pavlov wrote:
| Ultimately Qualcomm is the one who decides how to allocate
| die area on their CPUs, right? So it can't exactly hurt if
| this is a priority for them now.
| smoldesu wrote:
| Pretty much all of Qualcomm's SOCs are built using stock
| ARM core designs. ARMnn is optimized for multicore
| A-series chips, which constitutes everything from the
| Snapdragon 410 to the 888 (~2014-modern day).
| mgraczyk wrote:
| I think you'd be surprised by what's possible on mobile chips
| these days. They aren't going to be running the 70B model at
| useable speeds, but I think with enough optimization it should
| be possible to run the 7B and 13B models on device
| interactively. With quantization you can fit those models in
| less than 8GB of RAM.
| treprinum wrote:
| Chips are capable, but this is a question of battery and
| heat. llama.cpp on a phone makes it both hot and low on
| battery quickly.
| superkuh wrote:
| The rate of token output is bottlenecked by the time it takes
| to transfer the model between RAM and CPU. Not the time it
| takes to do the multiplication operations. If you have the
| latest and greatest mobile phone and 8GB (or 12GB) of LPDDR5
| on a Snapdragon 8 Gen 2 you still only have 8.5 Gbps memory
| bandwith (max, less in actual phones running it at slower
| speeds). That's 1 GB/s. So if your model is a 4 bit 7B
| parameter model that's 4GB in size that means it'll take at
| _least_ 4 seconds per token generated. That is _SLOW_.
|
| It doesn't matter that the Snapdragon 8 gen 2 has "AI" tensor
| cores or any of that. Memory bandwidth is the bottleneck for
| LLM. Phones have never needed HPC-like memory bandwidth and
| they don't have it. If Qualcomm is actually addressing this
| issue that'd be amazing. But I highly doubt it. Memory
| bandwidth costs $$$, massive power use, and volume/space not
| available in the form factor.
|
| Do you know of a smartphone that has more than 1GB/s of
| memory bandwidth? If so I will be surprised. Otherwise I
| think it is you who will be surprised how specialized their
| compute is and how slow they are in many general purpose
| computing tasks (like transferring data from RAM).
| refulgentis wrote:
| re community already did this:
|
| People are unreasonably attracted to things that are "minimal",
| at least 3 different local LLM codebase communities will tell
| you _they_ are the minimal solution.[1]
|
| It's genuinely helpful to have a static target for technical
| understanding. Other projects end up with a lot of rushed
| Python defining the borders in a primordial ecosystem with too
| many people too early.
|
| [1] Lifecycle: A lone hacker wants to gain understanding of the
| complicated word of LLMs. They implement some suboptimal, but
| code golfed, C code over the weekend. They attract a small
| working group and public interest.
|
| Once the working group is outputting tokens, it sees an
| optimization.
|
| This is landed.
|
| It is applauded.
|
| People discuss how this shows the open source community is
| where innovation happens. Isn't it unbelievable the closed
| source people didn't see this?[2]
|
| Repeat N times.
|
| Y steps into this loop, a new base model is released.
|
| The project adds support for it.
|
| However, it reeks of the "old" ways. There's even CLI arguments
| for the old thing from 3 weeks ago.
|
| A small working group, frustrated, starts building a new, more
| minimal solution...
|
| [2] The closed source people did. You have their model, not
| their inference code.
| MuffinFlavored wrote:
| Even on a platform where they are fast, I haven't found a solid
| real world use case personally for anything other than GPT-4
| quality LLM. Am I missing something?
| superkuh wrote:
| Non-commercial entertainment. Which makes this move by
| Qualcomm all the weirder. I agree, the llamas and all the
| other foundational models and all of their fine-tunes are not
| really useful for helping with real tasks that have a wrong
| answer.
| pera wrote:
| Link to the article:
| https://www.qualcomm.com/news/releases/2023/07/qualcomm-work...
| qwertox wrote:
| > "We applaud Meta's approach to open and responsible AI and
| are committed to driving innovation and reducing barriers-to-
| entry for developers of any size by bringing generative AI on-
| device,"
|
| Can someone explain to me why Meta's approach is responsible? I
| mean, I applaud Meta for "open sourcing" the models, but don't
| they contain potentially harmful data which can be accessed
| without some kind of filter? Let's say retrieve instructions on
| how to efficiently overthrow a government?
| synaesthesisx wrote:
| This is pretty much to go head-to-head with Apple (and Samsung).
| Both are making leaps and strides using "neural coprocessors" and
| the like for running models on mobile hardware. Mobile edge
| computing is where we're going to see a lot of use cases that
| enable functionality while maintaining data privacy &
| performance.
|
| Keep in mind "mobile devices" extends past just smartphones, onto
| wearables/headsets as well.
| givemeethekeys wrote:
| Sounds like a pump right before earnings.
| code_runner wrote:
| I don't even think the purpose for this is known. Not sure how
| this would impact earnings at all. Meta doesn't even
| manufacture a phone.
| objclxt wrote:
| > Meta doesn't even manufacture a phone
|
| Quest runs off Qualcomm chipsets, although in terms of actual
| units shipped Quest is a rounding error for QC.
| rvz wrote:
| That's the reason why OpenAI.com was panicking to governments to
| stop and slow down the rapid pace of $0 downloadable free AI
| models and LLMs in the first place, since anyone can have a
| GPT-3.5 in their hands and use it anywhere with Llama 2.
|
| A year ago, many believed that Meta was going to destroy itself
| as the stock went below $90 in peak fear. Now it looks like Meta
| is winning the race to zero in AI and all OpenAI.com can do is
| just sit their and watch their cloud-based AI decline in
| performance and run to fix outage after outage.
|
| No outage(s) when your LLM is on-device.
| redox99 wrote:
| I don't get the point. There is just no way you'll be able to run
| llama2 70B. And llama2 13B, although cool, is much much dumber
| than GPT3.5. I don't think it's useful as an ChatGPT style
| assistant.
|
| Maybe in the future we'll get very advanced models with that
| number of parameters. But running current Llama2 13B on a mobile
| device doesn't seem too useful IMO.
| yreg wrote:
| I really hope Apple doesn't mess this up and includes a solid on
| device LLM in iOS in the near future.
|
| They have amazing chips, but Siri has been a subpar assistent
| since forever. Now is the time to redeem her.
| asadm wrote:
| They are adding their first LLM in iOS 17's keyboard.
| cubefox wrote:
| "L"LM. No way this model will be "large" in any modern sense.
| For mobile devices both RAM and power consumption are very
| limited.
| yreg wrote:
| That's just for autocorrect, right?
|
| I would like to see an LLM in Siri and eventually even have
| it interact with/control the rest of the system.
|
| Ideally with Whisper-level speech recognition of course.
| coder543 wrote:
| Apple is supposedly bringing a better, transformer-based
| speech recognition model to iOS 17 as well, although I
| don't think either of these transformer models would be
| classified as a _large_ language model.
|
| Link to the announcement timestamp: https://developer.apple
| .com/videos/play/wwdc2023/101/?time=1...
| bigfudge wrote:
| Apples speech recognition is pretty good, at least for me.
| I always assumed the delta was because it does it in near
| real time, which is not possible with whisper.
| yreg wrote:
| I don't have a great accent, but Whisper understands me
| >99%. So do my colleagues.
|
| I've tried to talk to ChatGPT through a Siri shortcut for
| a day and Siri transcribed pretty much all of my requests
| wrong, to the point that GPT seldom understood what I
| want.
|
| Even the _Hey Siri ... Ask ChatGPT_ trigger phrase fails
| ~50% of the time for me.
| [deleted]
| coder543 wrote:
| Siri speech recognition is consistently terrible compared
| to the alternatives, in my experience. Google and
| Microsoft have much better speech recognition technology.
|
| Whisper is phenomenal by comparison Siri and arguably
| even what Google and Microsoft use, and no, there is
| nothing that stops Whisper from being used in real time.
| I can run real-time Whisper on my iPhone using the Hello
| Transcribe app, but the last time I tried it, the app
| itself was too flawed to recommend besides as a demo of
| real-time transcription.
|
| I am looking forward to trying out the new transcription
| model that Apple is bringing to iOS 17.
| gnicholas wrote:
| This will be great. Hopefully it will be able to figure out
| that it should capitalize my last name, which is on my
| Contact card, and that of a dozen of my relatives. When I
| went to the Apple Store, they told me that I could either add
| a custom autocorrect to my library, or reset all settings.
| They did admit this was some sort of bug, and that it would
| be massive overkill to reset all settings (lose all wifi
| passwords, etc.).
| coder543 wrote:
| > or reset all settings
|
| > that it would be massive overkill to reset all settings
| (lose all wifi passwords, etc.).
|
| I don't think that's what anyone was recommending...
|
| Settings -> General -> Transfer or Reset iPhone -> Reset ->
| Reset Keyboard Dictionary is almost certainly what they
| were recommending.
|
| What does resetting your keyboard dictionary have to do
| with your wifi passwords?
| gnicholas wrote:
| Nope, the two employees I spoke to were talking about a
| full reset (which affects network settings). Regardless
| of what the Keyboard Dictionary says, the iPhone should
| be autocompleting/capitalizing the last name of contacts,
| and especially the owner's name.
| coder543 wrote:
| Why would that implicitly be "regardless" of what the
| keyboard dictionary says? I would expect the learned
| dictionary to be prioritized over other sources of
| information, just as a practical matter, even if someone
| might reasonably assume there are other things that
| should be prioritized over it.
|
| None of that explains how resetting _everything_ would
| have any affect on capitalization of names if resetting
| the keyboard dictionary wouldn 't, and you didn't say
| whether you tried resetting keyboard dictionary.
| cscurmudgeon wrote:
| Otoh, Apple's monopolistic behavior implies that it will be
| good for society if they mess up.
| gnicholas wrote:
| I'd love to have my iPhone communicate with a Mac Studio at my
| house, for the heavy lifting. I realize this would be slower
| than having on-device processing, but it would be much better
| for battery life. And although I trust Apple's privacy more
| than Google/FB, I'd still rather keep my AI interactions
| separate from anyone's cloud.
| jayd16 wrote:
| > it would be much better for battery life.
|
| I wonder what the numbers actually are for local compute on
| custom hardware compared to firing up the wifi antennae to
| make the remote request.
| gnicholas wrote:
| Yeah I have wondered about this. But seeing how an LLM
| hammers my M2 MBA CPU for many seconds per request, I'm
| guessing this would have a significant impact on a
| smartphone battery.
| smoldesu wrote:
| You might be pleased to hear that nothing really stops you
| from doing this today. If you ran Serge[0] on a Mac with
| Tailscale, you could hack together a decently-accelerated
| Llama chatbot.
|
| [0] https://github.com/serge-chat/serge
| gnicholas wrote:
| I'm not technical enough to be able to hack this together,
| but I do hope that enough other people have the same itch,
| and are able to scratch it!
| bob-09 wrote:
| I'd love to see both options and a seamless transition
| between them. An option tuned for home use that utilizes the
| processing power of home devices and local networks, and an
| option tuned for on-the-go use utilizing the iPhone/iPad
| processors and mobile networks.
| rvz wrote:
| Of course they will add an on-device LLM and can afford to. It
| doesn't cost them anything to integrate or train a AI whether
| if it is a ConvNet or a LLM and jump into the race to zero with
| Meta with on-device machine learning.
|
| They have done it before and they will certainly do it again,
| especially with Apple Silicon and CoreML.
|
| The one that needs to worry the most is OpenAI.com as they
| rushed to stop the adoption of downloadable and powerful AI
| models for free to the regulators. That shows that OpenAI.com
| does not have any sort of 'moat' at all.
| yreg wrote:
| The question is whether the iPhones released in September are
| going to be already ready for it.
|
| They haven't mentioned LLMs at WWDC beyond keyboard
| autocorrect (mentioned already by your sibling comment).
| baq wrote:
| No chance unless they were prescient early 2022. Hardware
| cycles are too long for that otherwise.
| yreg wrote:
| Is it even necessary to do changes to the neural engine
| though? Maybe something like increasing RAM (which is
| rumoured) is enough.
| baq wrote:
| If there's one company which can wrangle its suppliers to
| deliver 4x RAM capacity in the same form factor,
| performance and thermals it's Apple, but they aren't
| sorcerers, just ruthlessly efficient.
|
| I'll be queueing at midnight for the first time ever if
| I'm wrong.
| sp332 wrote:
| It's going to chew up at least 1 GB of storage space and RAM,
| right? And probably kill the battery life to boot.
| refulgentis wrote:
| Yeah. People are playing tons of word games with this stuff,
| ex. Apple is saying its shipping an LLM for the iOS 17
| keyboard, and who knows what that means: it sounds great and
| plausible unless you're familiar w/the nuts and bolts.
| Tagbert wrote:
| Apple is calling their typing correction a "transformer".
| That is a component of LLMs, but Apple may not be using a
| full LLM in that case. This feature seems like a sandbox
| for them to try out some of this tech in the field while
| they do work internally on more ambitious implementations.
|
| Apple is also dogfooding an LLM AI tool internally also
| likley to gain a better understanding of how this works in
| practice and how people are using them.
| https://forums.macrumors.com/threads/apple-experimenting-
| wit...
| astrange wrote:
| An LLM is made entirely out of transformers; you could
| just call it a "large transformer model".
|
| In this case it's a transformer model that is not
| "large". So, an LM.
| shwaj wrote:
| Apple's not playing word games, because they didn't say
| "LLM". They said that autocorrect will use "a transformer
| language model, a state-of-the-art on-device machine
| learning language model for word prediction", which is a
| much more precise statement than what you attributed to
| them.
|
| This sounds totally plausible. It will be a much smaller
| transformer model than ChatGPT, probably much smaller than
| even GPT-2.
|
| https://www.apple.com/newsroom/2023/06/ios-17-makes-
| iphone-m...
| jpalomaki wrote:
| AppleTV devices are usually always connected and most of the
| time just idling. Maybe you could move processing to such
| device, if one is connected to the same Apple ID.
| gnicholas wrote:
| Yep, I'd love to have a semi-dedicated device in my home
| that handled these sorts of requests. I'd even consider
| buying a Mac mini, Studio, or other computer for this
| purpose.
| yreg wrote:
| Would be cool, but I think it is improbable. Apple would
| want such a key feature to be available to everyone and
| less than 10% of iPhone users have Macs.
|
| Unless they also made an option to run it in iCloud, but
| offering so many options to do a thing doesn't sound very
| Apple-like.
| gnicholas wrote:
| Agree. But it should be doable to set this up using open
| LLMs, right? For example, using Siri to trigger a
| shortcut that sends a prompt to the dedicated processing
| device.
| glimshe wrote:
| [flagged]
| Obscurity4340 wrote:
| Isn't the reception to LLama/whatever its called generally
| positive? Is there something I'm missing in terms of some
| shadowy endgame Meta built into it?
| JPLeRouzic wrote:
| Isn't it a challenge today to run a large LLM on a CPU/GPU as
| those found in mobile phones?
|
| I would have thought that only the information that it might be
| possible, is a good news?
| [deleted]
| logicchains wrote:
| You're not a fan of PyTorch or React I guess?
___________________________________________________________________
(page generated 2023-07-23 23:02 UTC)