[HN Gopher] MobileLLM: Optimizing Sub-Billion Parameter Language...
___________________________________________________________________
MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-
Device Use
Author : tosh
Score : 246 points
Date : 2024-07-09 11:48 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| sourcecodeplz wrote:
| Nice, could one use this to train models for Windows PCs also? I
| don't have a lot of ram.
| skiexperte wrote:
| Training models is not OS dependend. RAM is dependend on the
| size and i would argue this should be a lot easier to finetune
| with less GPU Ram.
|
| Nonetheless the endgoal will probably be downloading a model
| like this or paying for finetuning than downloading and using
| it through an optimized Neuralchip.
|
| Its currently more a question of when this will happen. The
| newest Windows cert already requires some neuralchip and even
| my google pixel 8 pro can host small models (i know the pixel
| is not a cheap phone, but the coprocessor should still be much
| more affordable than a big GPU)
| mmastrac wrote:
| > MobileLLM-125M/350M attains a remarkable 2.7%/4.3% accuracy
| boost over preceding 125M/350M SoTA models on zero-shot
| commonsense reasoning tasks
|
| Small models, slightly improved, probably still not good enough
| for the same use as online models. Nothing wrong with incremental
| progress, however.
|
| 1.5B parameter model does seem to be a pretty decent step up,
| even beating larger models by a wide margin. I'm not sure why
| they didn't go larger -- having a more efficient model that fits
| on hardware the size of the RPi could be a gamechanger (IIRC
| TinyLlama 7B does run, barely).
| phkahler wrote:
| >> Small models, slightly improved, probably still not good
| enough for the same use as online models. Nothing wrong with
| incremental progress, however.
|
| An even smaller language model should still be useful as part
| of a speech-to-text system. These should benefit from using the
| language model to narrow down what word is spoken in the face
| of ambiguity or noise.
| woodson wrote:
| ASR systems already use language models during decoding,
| though mostly not large decoder-only LLMs. However,
| incorporating LLMs into ASR is currently at the center of a
| lot of research, e.g. using a speech encoder like wav2vec 2.0
| or the whisper encoder with a Qformer etc. and a LoRA adapter
| on an LLM trained for ASR.
| omarelb wrote:
| Really interested in this! Do you know of some good reading
| in this area?
| cjtrowbridge wrote:
| Llama-3-8b runs fine on raspberry pi
| inhumantsar wrote:
| how fast is that for you?
| choppaface wrote:
| But imagine if these models were baked into your Instagram app
| and then used for ad targeting using your own compute. Then
| Facebook gets to look at tons of other data and for less cost
| (and much less litigation risk) to them.
|
| In this application it's unfair to compare tiny models to cloud
| models. Moreover any incremental precision boosts to tiny
| models would be notable (and directly translate to revenue).
| HanClinto wrote:
| > I'm not sure why they didn't go larger -- having a more
| efficient model that fits on hardware the size of the RPi could
| be a gamechanger (IIRC TinyLlama 7B does run, barely).
|
| I'm not sure that RPi is the right target for the next step of
| local LLMs, and I think that it's worth considering web-
| deployment on engines like WebLLM [1].
|
| A 7B model may "run fine" on a Raspberry Pi, but I've
| (personally) found 7B models to be a bit larger than I want to
| download / run for web-based interfaces.
|
| However, a solid 125M model is the sort of thing that I can run
| on a webpage, and the time it takes to download to the local
| user's browser (combined with my bandwidth costs) aren't
| exorbitant.
|
| [1] https://github.com/mlc-ai/web-llm
| zurfer wrote:
| While this is interesting, I wonder what the use case is, other
| than better autocomplete?
| potatoman22 wrote:
| It could power simple agents like Siri under the hood. Helping
| with natural language understanding, intent classification,
| retrieval, and other agent tasks.
| rvnx wrote:
| Like the Rabbit R1 or Humane AI Pin
| redox99 wrote:
| Local agent like siri that can do simple tasks, and route more
| complex requests.
| skiexperte wrote:
| Reading emails, replying to emails, scheduling tasks, using
| apis for services.
|
| Basically everything which doesn't need knowledge but actions.
|
| "Tell my wife i'm late" and it will use some configured magic
| to talk to service xy and just does it.
|
| Siri is very good in doing homeautomatistaion without the
| internet, the old google agent and alexa were absolutly not and
| i don't think they were ever available offline.
|
| This basically gives you a local (local-first!) good working
| assistent
| Narhem wrote:
| Would be very nice to have my schedule automatically managed
| by Siri. Already has a few nice things but I genuinely have
| trust issues, especially with AI.
| lovethevoid wrote:
| You can get very far with the Shortcuts app by the way.
| Some examples: using your current location to estimate when
| you should leave to get to your next meeting on your
| calendar, letting those included in the calendar event know
| you're running late. Highly highly recommend it, the
| learning curve isn't much, a bunch of drag and drop!
| throwthrowuknow wrote:
| You could possibly fine tune it for narrow domain tasks like
| they did with tiny-agent
| https://bair.berkeley.edu/blog/2024/05/29/tiny-agent/
|
| I like the approach that Apple seems to be taking with fine
| tuned small models that handle routine tasks and then defer to
| larger off device models for things they can't confidently do.
| I imagine you could construct a training set that contains
| examples that should produce low confidence answers where you
| could add an output that is essentially a "call for help"
| option so you could train it to choose that. Smaller models
| also means you could have more running in parallel and use
| another to route requests to the appropriate expert.
| Narhem wrote:
| Probably hacking foreign intelligence codes.
| nsonha wrote:
| user cases are that of LLMs, from a mobile UI (so every AI use
| case there is), when you need privacy from big tech's AI APIs.
|
| I'm just so amazed by statements like "LLMs can ONLY be used
| for autocomplete", like am I supposed to be impressed by the
| smirkiness?
| nl wrote:
| The question was more about the capability and knowledge in a
| sub-1B LLM: at that size what is it capable to do beyond
| excellent autocompletion.
| barronli wrote:
| It can be fine tuned for device related actions. In other
| words, with all the capabilities of your device applications or
| services, the small model can virtually have the same
| capabilities. It can always dispatch a user request in way of
| "natural language" to those applications, and orchestrate the
| applications. It can dispatch user requests beyond the device
| capabilities to a cloud model. This is powerful since it
| changes how you interact with your devices.
| syassami wrote:
| https://www.meta.com/smart-glasses/
| simion314 wrote:
| I tested the Google AI on my phone, I had the browser open and
| asked it to read the page to me and it responded that it does
| not have access to the internet.
|
| So I would like an AI assistant that:
|
| 1 can understand english and my native language
|
| 2 that is aware that runs on Android(or KDE/Linux) and can
| understand commands like "open the Android Settings ,
| Application section " or "read the page that is opened in the
| browser" or "read the text in the popup that is now opened".
| Basically to be integrated with the OS via public and open
| APIs. Big AI companies could compete on selling us better
| assistants especially for multi lingual people.
|
| 3 the model should be small , it should not know geography,
| history, music bands etc, for tasks where the user asks
| question there should be an option for the model to forward the
| question to a search engine or even an online LLM.
| Havoc wrote:
| What apps can one currently use to run them on say an iPhone?
| Only aware of the MLC one which has literally 3 old models only
| 5cott0 wrote:
| wat
|
| https://huggingface.co/mlc-ai
| Havoc wrote:
| On my iphone there doesn't seem to be an option to download
| more.
|
| Vaguely recall there being a button initially but don't see
| it anymore
| pickettd wrote:
| The Android apk for MLC is updated frequently with recent
| models built-in. And a Samsung S24+ can comfortably run 7-8B
| models at reasonable speeds (10ish tokens/sec).
|
| https://llm.mlc.ai/docs/deploy/android.html
| woadwarrior01 wrote:
| I have an (mlc-llm based) app on the App Store that supports
| over 2 dozen models, including some recent ones.
| ukuina wrote:
| cnvrs runs GGUFs on iOS:
| https://testflight.apple.com/join/ERFxInZg
| yshvrdhn wrote:
| Am I missing something but can't something like distillation help
| here ?
| imurray wrote:
| The paper says they tried that:
| https://arxiv.org/abs/2402.14905
|
| Deep link to the relevant snippet in html version:
| https://ar5iv.labs.arxiv.org/html/2402.14905#S3.SS5
|
| _" So far, we trained compact models from scratch using next
| tokens as hard labels. We explored Knowledge Distillation
| (KD)... Unfortunately KD increases training time (slowdown of
| 2.6-3.2x) and exhibits comparable or inferior accuracy to
| label-based training (details in appendix)."_
| PoignardAzur wrote:
| I wonder how much you can push the "deeper and thinner" part. At
| some point your entire FFN fits into your L2 cache, you're bound
| to get some performance jumps.
| sigmoid10 wrote:
| Other research from Meta FAIR actually suggests that you should
| prune deeper layers if you want to improve performance while
| maintaining accuracy [1]. So there must be a cutoff point for
| smaller networks where this approach still works, otherwise the
| results are contradictory. Or we could drastically improve
| these new models even further.
|
| [1] https://arxiv.org/html/2403.17887v1
| woodson wrote:
| That reminds me of the findings of Google's paper on
| EfficientT5 (https://arxiv.org/abs/2109.10686). They refer to
| it as "DeepNarrow".
| ejdhshsuwisjsh wrote:
| Anyone is aware of custom mobile llms?
|
| Optimizing and loading in your own voice, selecting your primary
| language and adding a little bit of personal knowledge like
| nicknames, location and stuff?
|
| My pixel 8 apparently can use / load local models but don't have
| the time right now to follow that rabbit hole
| euniceee3 wrote:
| Tensor chips are not open enough for an optimized mobile LLM to
| be ran on them.
| vhiremath4 wrote:
| It seems like the smaller models get the largest size decrease by
| embedding share/weight tying between the linear head and token
| embeddings. Is there any research going into how to further
| reduce size from there?
| cztomsik wrote:
| If you mean that LM-head is just inverted embedding matrix then
| this was already done in GPT-2.
|
| Unfortunately, the only thing I found out about this is that
| bigger models benefit from separate layer. But this was only
| mentioned somewhere in discord, so no paper to read and my
| personal hunch is that it should work for bigger models too.
| After all, GPT-3 was just scaled GPT-2.
|
| From my personal experiments, models learn better if you give
| them harder task. And tied weights could be one of such things.
| Multi-token prediction could be another and bitnet could be
| also considered such... (and dropout too)
| cjtrowbridge wrote:
| Why no mmlu or gsm8k?
| lawlessone wrote:
| Does it have to stay on mobile devices? Bit of niche but if its
| not a resource hog it could be handy for giving NPC's in games
| more interesting dialogue without having use
|
| Even better if it could be tuned in someway to allow dialogue to
| influence NPC behavior or actions.
| janalsncm wrote:
| It would be fascinating if NPCs had more backstory to them and
| more complex behaviors. Although I would imagine it would be
| near impossible to test since anything could influence their
| behavior.
| lawlessone wrote:
| yeah definitely testing would be nightmare. especially if
| conversations could influence the wider game.
|
| You'd have someone on youtube cheesing games by running
| scamming npcs.
| HanClinto wrote:
| I'm definitely interested in exploring this sort of thing.
| How much can we do with creating interesting characters and
| interesting circumstances?
|
| Makes me think of the way that characters are set up in AI
| Alibis -- each with their own secrets, but also with clues
| about other NPC's secrets. That feels like clever design, and
| it's the first use-case of using LLMs for NPC dialogue that
| feels interesting to me:
| https://news.ycombinator.com/item?id=40921990
| kevingadd wrote:
| Would it be _interesting_ dialogue? You could generate more
| dialogue, but would it have anything underpinning it of
| interest to the player? i.e. you could suddenly have
| townspeople that would talk about local scenery or their
| relationships with other NPCs, but none of that stuff they
| describe would actually _exist_ in the game. I would personally
| be weirded out if NPCs started making stuff up.
|
| I can imagine training some sort of LLM _on_ your game data
| such that NPCs are able to actually describe the game world,
| but I can 't imagine what kind of scale you'd need to operate
| at for that to be cheaper than just paying someone to write the
| dialogue. Maybe at Ubisoft's scale where your team sizes are in
| the thousands (AFAIK, they have been investigating using AI for
| writing, but it's mostly for things like combat barks which are
| very repetitive and basically noise.)
| lawlessone wrote:
| >Would it be interesting dialogue?
|
| It would definitely depend a lot on the implementation. I
| think it could work great for some indie dev's. Not all of
| course, devs that like writing understandably won't like it.
| KTibow wrote:
| When Gemma 2 2b releases it would be interesting to compare its
| scaling with this
| pmontra wrote:
| Interesting research, but Meta do not have any device worth
| talking about (at least at scale,) unless they want to ship that
| as part of their apps.
| TeMPOraL wrote:
| They have Oculus.
| ynx wrote:
| Dismissiveness like this tends to radiate ignorance, not
| insight.
|
| Quests have shipped roughly ~1/2 PS5 sales. Certainly a scale
| only a handful of technologically advanced product lines
| outside of phones ever reach.
|
| Incidentally, the enabling technology for the Quest? On-device
| ML that grew out of - you guessed it - developing on-device
| inference for their apps.
| HanClinto wrote:
| 125M parameters feels very feasible to ship as part of apps --
| even web-based apps.
| banish-m4 wrote:
| Hey HN. I actually have a current need for on-device wake-word-
| like STT. Which model(s) have the lowest WER and can run on an
| RPi 4B? I've been looking at openWakeWord. It's for an DIY
| inventory system.
___________________________________________________________________
(page generated 2024-07-10 23:02 UTC)