[HN Gopher] Introducing Gemma 3n
       ___________________________________________________________________
        
       Introducing Gemma 3n
        
       Author : bundie
       Score  : 245 points
       Date   : 2025-06-26 17:03 UTC (5 hours ago)
        
 (HTM) web link (developers.googleblog.com)
 (TXT) w3m dump (developers.googleblog.com)
        
       | wiradikusuma wrote:
       | I still don't understand the difference between Gemma and Gemini
       | for on-device, since both don't need network access. From
       | https://developer.android.com/ai/gemini-nano :
       | 
       | "Gemini Nano allows you to deliver rich generative AI experiences
       | without needing a network connection or sending data to the
       | cloud." -- replace Gemini with Gemma and the sentence still
       | valid.
        
         | readthenotes1 wrote:
         | Perplexity.ai gave an easier to understand response than Gemini
         | 2.5 afaict.
         | 
         | Gemini nano is for Android only.
         | 
         | Gemma is available for other platforms and has multiple size
         | options.
         | 
         | So it seems like Gemini nano might be a very focused Gemma
         | everywhere to follow the biology metaphor instead of the
         | Italian name interpretation
        
           | ridruejo wrote:
           | The fact that you need HN and competitors to explain your
           | offering should make Google reflect ...
        
             | gardnr wrote:
             | The Gemini billing dashboard makes me feel sad and
             | confused.
        
         | tyushk wrote:
         | Licensing. You can't use Gemini Nano weights directly (at least
         | commercial ly) and must interact with them through Android
         | MLKit or similar Google approved runtimes.
         | 
         | You can use Gemma commercially using whatever runtime or
         | framework you can get to run it.
        
           | littlestymaar wrote:
           | It's not even clear you can license language model weight
           | though.
           | 
           | I'm not a lawyer but the analysis I've read had a pretty
           | strong argument that there's no human creativity involved in
           | the training, which is an entirely automatic process, and as
           | such it cannot be copyrighted in any way (the same way you
           | cannot put a license on a software artifact just because you
           | compiled it yourself, you must have copyright ownership on
           | the _source code_ you 're compiling).
        
             | skissane wrote:
             | IANAL either but the answer likely depends on the
             | jurisdiction
             | 
             | US standards for copyrightability require human creativity
             | and model weights likely don't have the right kind of human
             | creativity in them to be copyrightable in the US. No court
             | to my knowledge has ruled on the question as yet, but
             | that's the US Copyright Office's official stance.
             | 
             | By contrast, standards for copyrightability in the UK are a
             | lot weaker than-and so no court has ruled on the issue in
             | the UK yet either, it seems likely a UK court would hold
             | model weights to be copyrightable
             | 
             | So from Google/Meta/etc's viewpoint, asserting copyright
             | makes sense, since even if the assertion isn't legally
             | valid in the US, it likely is in the UK - and not just the
             | UK, many other major economies too. Australia, Canada,
             | Ireland, New Zealand tend to follow UK courts on copyright
             | law not US courts. And many EU countries are closer to the
             | UK than the US on this as well, not necessarily because
             | they follow the UK, often because they've reached a similar
             | position based on their own legal traditions
             | 
             | Finally: don't be surprised if Congress steps in and tries
             | to legislate model weights as copyrightable in the US too,
             | or grants them some sui generis form of legal protection
             | which is legally distinct from copyright but similar to
             | it-I can already hear the lobbyist argument, "US AI
             | industry risks falling behind Europe because
             | copyrightability of AI models in the US is legally
             | uncertain and that legal uncertainty is discouraging
             | investment"-I'm sceptical that is actually true, but
             | something doesn't have to be true for lobbyists to convince
             | Congress that it is
        
               | simonw wrote:
               | > US standards for copyrightability require human
               | creativity and model weights likely don't have the right
               | kind of human creativity in them to be copyrightable in
               | the US. No court to my knowledge has ruled on the
               | question as yet, but that's the US Copyright Office's
               | official stance.
               | 
               | Has the US copyright office said that about model
               | weights? I've only heard them saying that about images
               | produced entirely from a prompt to a model.
        
             | AlanYx wrote:
             | That's one of the reasons why they gate Gemini Nano with
             | the "Gemini Nano Program Additional Terms of Service". Even
             | if copyright doesn't subsist in the weights or if using
             | them would be fair use, they still have recourse in breach
             | of contract.
        
               | skissane wrote:
               | The problem is that contracts don't bind subsequent
               | recipients, copyright does
               | 
               | Google gives the model to X who gives it to Y who gives
               | it to Z. X has a contract with Google, so Google can sue
               | X for breach of contract if they violate its terms. But
               | do Y and Z have such a contract? Probably not. Of course,
               | Google can put language in their contract with X to try
               | to make it bind Y and Z too, but is that language going
               | to be legally effective? More often than not, no. The
               | language may enable Google to successfully sue X over Y
               | and Z's behaviour, but not successfully sue Y and Z
               | directly. Whereas, with copyright, Y and Z are directly
               | liable for violations just as X is
        
               | jinlisp wrote:
               | Thank you, this is a nice point to consider. Don't know
               | if using the weights could be considered equivalent or
               | implying accepting the terms of services from weights
               | creators.
        
               | derefr wrote:
               | I've wondered about this for a while now (where e.g. some
               | models of HuggingFace require clickwrap license
               | agreements to download, that try to prohibit you from
               | using the model in certain ways.)
               | 
               | It seems to me that if some anonymous ne'er-do-well were
               | to publicly re-host the model files for separate
               | download; and you acquired the files from that person,
               | rather than from Google; then you wouldn't be subject to
               | their license, as you never so much as saw the clickwrap.
               | 
               | (And you wouldn't be committing IP theft by acquiring it
               | from that person, either, because of the non-
               | copyrightability.)
               | 
               | I feel that there must be something wrong with that
               | logic, but I can't for the life of me think of what it
               | is.
        
             | km3r wrote:
             | Why not? Training isn't just "data in/data out". The
             | process for training is continuously tweaked and adjusted.
             | With many of those adjustments being specific to the type
             | of model you are trying to output.
        
               | skissane wrote:
               | The US copyright office's position is basically this-
               | under US law, copyrightability requires direct human
               | creativity, an automated training process involves no
               | direct human creativity so cannot produce copyright. Now,
               | we all know there is a lot of creative human effort in
               | selecting what data to use as input, tinkering with
               | hyperparameters, etc - but the copyright office's
               | position is that doesn't legally count - creative human
               | effort in overseeing an automated process doesn't change
               | the fact that the automated process itself doesn't
               | _directly_ involve any human creativity. So the human
               | creativity in model training fails to make the model
               | copyrightable because it is too indirect
               | 
               | By contrast, UK copyright law accepts the "mere sweat of
               | the brow" doctrine, the mere fact you spent money on
               | training is likely sufficient to make its output
               | copyrightable, UK law doesn't impose the same
               | requirements for a direct human creative contribution
        
             | IncreasePosts wrote:
             | Doesn't that imply just the training process isn't
             | copyrightable? But weights aren't just training, they're
             | also your source data. And if the training set shows
             | originality in selection, coordination, or arrangement,
             | isn't that copyrightable? So why wouldn't the weights also
             | be copyrightable?
        
               | skissane wrote:
               | The problem is, can you demonstrate that originality of
               | selection and arrangement actually survives in the
               | trained model? It is legally doubtful.
               | 
               | Nobody knows for sure what the legal answer is, because
               | the question hasn't been considered by a court - but the
               | consensus of expert legal opinion is copyrightability of
               | models is doubtful under US law, and the kind of argument
               | you make isn't strong enough to change that. As I said,
               | different case for UK law, nobody really needs your
               | argument there because model weights likely are
               | copyrightable in the UK already
        
               | rvnx wrote:
               | The weights are mathematical facts. As raw numbers, they
               | are not copyrightable.
        
               | IncreasePosts wrote:
               | `en_windows_xp_professional_with_service_pack_3_x86_cd_vl
               | _x14-73974.iso` is also just raw numbers, but I believe
               | Windows XP was copyrightable
        
               | badsectoracula wrote:
               | For the same reason GenAI output isn't copyrightable
               | regardless of how much time you spend tweaking your
               | prompts.
               | 
               | Also i'm pretty sure none of the AI companies would
               | really want to touch the concept of having the copyright
               | of source data affect the weight's own copyright,
               | considering all of them pretty much hoover up the entire
               | Internet without caring about those copyrights (and IMO
               | trying to claim that they should be able to ignore the
               | copyrights of training data and also that the GenAI
               | output is not under copyright but at the same trying
               | trying to claim copyright for the weights is dishonest,
               | if not outright leechy).
        
         | jabroni_salad wrote:
         | Gemma is open source and apache 2.0 licensed. If you want to
         | include it with an app you have to package it yourself.
         | 
         | gemini nano is an android api that you dont control at all.
        
           | nicce wrote:
           | > Gemma is open source and apache 2.0 licensed
           | 
           | Closed source but open weight. Let's not ruin the definition
           | of the term in advantage of big companies.
        
             | zackangelo wrote:
             | Your reply adds more confusion, imo.
             | 
             | The inference code and model architecture IS open source[0]
             | and there are many other high quality open source
             | implementations of the model (in many cases contributed by
             | Google engineers[1]). To your point: they do not publish
             | the data used to train the model so you can't re-create it
             | from scratch.
             | 
             | [0] https://github.com/google-deepmind/gemma [1]
             | https://github.com/vllm-project/vllm/pull/2964
        
               | candiddevmike wrote:
               | If for some reason you had the training data, is it even
               | possible to create an exact (possibly same hash?) copy of
               | the model? Seems like there are a lot of other pieces
               | missing like the training harness, hardware it was
               | trained on, etc?
        
               | OneDeuxTriSeiGo wrote:
               | to be entirely fair that's quite a high bar even for most
               | "traditional" open source.
               | 
               | And even if you had the same data, there's no guarantee
               | the random perturbations during training are driven by a
               | PRNG and done in a way that is reproducible.
               | 
               | Reproducibility does not make something open source.
               | Reproducibility doesn't even necessarily make something
               | free software (under the GNU interpretation). I mean
               | hell, most docker containers aren't even hash-
               | reproducible.
        
               | zackangelo wrote:
               | Yes, this is true. A lot of times labs will hold back
               | necessary infrastructure pieces that allow them to train
               | huge models reliably and on a practical time scale. For
               | example, many have custom alternatives to Nvidia's NCCL
               | library to do fast distributed matrix math.
               | 
               | Deepseek published a lot of their work in this area
               | earlier this year and as a result the barrier isn't as
               | high as it used to be.
        
               | nicce wrote:
               | I am not sure if this adds even more confusion. Linked
               | library is about fine-tuning which is completely
               | different process.
               | 
               | Their publications about producing Gemma is not accurate
               | enough that even with data you would get the same
               | results.
        
               | zackangelo wrote:
               | In the README of the linked library they have a code
               | snippet showing how to have a conversation with the
               | model.
               | 
               | Also, even if it were for fine tuning, that would require
               | an implementation of the model's forward pass (which is
               | all that's necessary to run it).
        
               | nicce wrote:
               | That is completely different discussion. Otherwise, even
               | Gemini 2.5 Pro would be open-source with this logic since
               | clients are open-source for interacting with the cloud
               | APIs.
        
             | Imustaskforhelp wrote:
             | Yes!! But I doubt how many are truly truly open source
             | models since most just confuse open source with open
             | weights and the definition has been changed really smh.
        
           | cesarb wrote:
           | > Gemma is open source and apache 2.0 licensed.
           | 
           | Are you sure? On a quick look, it appears to use its own
           | bespoke license, not the Apache 2.0 license. And that license
           | appears to have field of use restrictions, which means it
           | would not be classified as an open source license according
           | to the common definitions (OSI, DFSG, FSF).
        
         | impure wrote:
         | I suspect the difference is in the training data. Gemini is
         | much more locked down and if it tries to repeat something from
         | the draining data verbatim you will get a 'recitation error'.
        
       | danielhanchen wrote:
       | Made some GGUFs if anyone wants to run them!
       | 
       | ./llama.cpp/llama-cli -hf unsloth/gemma-3n-E4B-it-GGUF:UD-Q4_K_XL
       | -ngl 99 --jinja --temp 0.0
       | 
       | ./llama.cpp/llama-cli -hf unsloth/gemma-3n-E2B-it-GGUF:UD-Q4_K_XL
       | -ngl 99 --jinja --temp 0.0
       | 
       | I'm also working on an inference + finetuning Colab demo! I'm
       | very impressed since Gemma 3N has audio, text and vision!
       | https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-...
        
         | upghost wrote:
         | Literally was typing out "Unsloth, do your thing!!" but you are
         | way ahead of me. You rock <3 <3 <3
         | 
         | Thank you!
        
           | danielhanchen wrote:
           | :) Thanks!
        
         | knowaveragejoe wrote:
         | What is `jinja` in this context?
        
           | gowld wrote:
           | https://jinja.palletsprojects.com/en/stable/
        
         | bilsbie wrote:
         | Thanks! What kind of rig do I need?
        
       | turnsout wrote:
       | This looks amazing given the parameter sizes and capabilities
       | (audio, visual, text). I like the idea of keeping simple tasks
       | local. I'll be curious to see if this can be run on an M1
       | machine...
        
         | bigyabai wrote:
         | This should run fine on most hardware - CPU inference of the
         | E2B model on my Pixel 8 Pro gives me ~9tok/second of decode
         | speed.
        
         | Fergusonb wrote:
         | Sure it can, easiest way is to get ollama, then `ollama run
         | gemma3n` You can pair it with tools like simonw's LLM to pipe
         | stuff to it.
        
       | minimaxir wrote:
       | LM Studio has MLX variants of the model out:
       | http://huggingface.co/lmstudio-community/gemma-3n-E4B-it-MLX...
       | 
       | However it's still 8B parameters and there are no quantized
       | models just yet.
        
       | Workaccount2 wrote:
       | Anyone have any idea on the viability of running this on a Pi5
       | 16GB? I have a few fun ideas if this can handle working with
       | images (or even video?) well.
        
         | gardnr wrote:
         | The 4-bit quant weighs 4.25 GB and then you need space for the
         | rest of the inference process. So, yeah you can definitely run
         | the model on a Pi, you may have to wait some time for results.
         | 
         | https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF
        
         | refulgentis wrote:
         | See here, long story short, this is another in a series of blog
         | posts that would lead you to believe this was viable, but it
         | isn't :/ https://news.ycombinator.com/item?id=44389793
        
       | impure wrote:
       | I've been playing around with E4B in AI Studio and it has been
       | giving me really great results, much better than what you'd
       | expect from an 8B model. In fact I'm thinking of trying to
       | install it on a VPS so I can have an alternative to pricy APIs.
        
       | tgtweak wrote:
       | Any readily-available APKs for testing this on Android?
        
         | refulgentis wrote:
         | APK link here: https://github.com/google-ai-
         | edge/gallery?tab=readme-ov-file...
        
           | tgtweak wrote:
           | Ah, I already had edge installed and it had gemma 3n-e4b
           | downloaded... is this the same model that was previously
           | released?
        
             | makeramen wrote:
             | Seems like that was a preview model, unknown if this
             | released version is different
        
               | tgtweak wrote:
               | I think it's only pulling the older model - I see it's
               | using the liteRT models from May.
        
       | refulgentis wrote:
       | Somethings really screwy with on-device models from Google, I
       | can't put my finger on what, and I think being ex-Google is
       | screwing with my ability to evaluate.
       | 
       | Cherry-picking something that's quick to evaluate:
       | 
       | "High throughput: Processes up to 60 frames per second on a
       | Google Pixel, enabling real-time, on-device video analysis and
       | interactive experiences."
       | 
       | You can download an APK from the official Google project for
       | this, linked from the blogpost: https://github.com/google-ai-
       | edge/gallery?tab=readme-ov-file...
       | 
       | If I download it, run it on Pixel Fold, _actual_ 2B model which
       | is half the size of the ones the 60 fps claim is made for, it
       | takes 6.2-7.5 seconds to begin responding (3 samples, 3 diff
       | photos). Generation speed is shown at 4-5 tokens per second,
       | slightly slower than what llama.cpp does on my phone. (I maintain
       | an AI app that inter alia, wraps llama.cpp on all platforms)
       | 
       | So, *0.16* frames a second, not 60 fps.
       | 
       | The blog post is so jammed up with so many claims re: this is
       | special for on-device and performance that just...seemingly
       | aren't true. At all.
       | 
       | - Are they missing a demo APK?
       | 
       | - Was there some massive TPU leap since the Pixel Fold release?
       | 
       | - Is there a lot of BS in there that they're pretty sure won't be
       | called out in a systematic way, given the amount of effort it
       | takes to get this inferencing?
       | 
       | - I used to work on Pixel, and I remember thinking that it seemed
       | like there weren't _actually_ public APIs for the TPU. Is that
       | what 's going on?
       | 
       | In any case, either:
       | 
       | A) I'm missing something, big or
       | 
       | B) they are lying, repeatedly, big time, in a way that would be
       | shown near-immediately when you actually tried building on it
       | because it "enables real-time, on-device video analysis and
       | interactive experiences."
       | 
       | Everything I've seen the last year or two indicates they are
       | lying, big time, regularly.
       | 
       | But if that's the case:
       | 
       | - How are they getting away with it, over this length of time?
       | 
       | - How come I never see anyone else mention these gaps?
        
         | catchmrbharath wrote:
         | The APK that you linked, runs the inference on CPU and does not
         | run it on Google Tensor.
        
           | refulgentis wrote:
           | That sounds fair, but opens up another N questions:
           | 
           | - Are there APK(s) that _run on Tensor_?
           | 
           | - Is it possible to run on Tensor if you're not Google?
           | 
           | - Is there _anything at all from anyone_ I can download that
           | 'll run it on Tensor?
           | 
           | - If there isn't, why not? (i.e. this isn't the first on
           | device model release by any stretch, so I can't give benefit
           | of the doubt at this point)
        
             | catchmrbharath wrote:
             | > Are there APK(s) that run on Tensor?
             | 
             | No. AiCore service internally uses the inference on Tensor
             | (http://go/android-dev/ai/gemini-nano)
             | 
             | > Is there anything at all from anyone I can download
             | that'll run it on Tensor?
             | 
             | No.
             | 
             | > If there isn't, why not? (i.e. this isn't the first on
             | device model release by any stretch, so I can't give
             | benefit of the doubt at this point)
             | 
             | Mostly because 3P support has not been a engineering
             | priority.
        
               | refulgentis wrote:
               | > Mostly because 3P support has not been a engineering
               | priority.
               | 
               | Got it: assuming you're at Google, in eng. parlance, it's
               | okay if it's not Prioritized(tm) but then
               | product/marketing/whoever shouldn't be publishing posts
               | around the premise it's running 60 fps multimodal
               | experiences on device.
               | 
               | They're very, very, lucky that ratio of people vaguely
               | interested in this, to people follow through on using it,
               | is high, so comments like mine end up at -1.
        
           | lostmsu wrote:
           | How does their demo work then? It's been 3 months since 3n
           | was first released publicly.
        
         | mlsu wrote:
         | It looks to me by the marketing copy that the vision encoder
         | can run 60FPS.
         | 
         | > MobileNet-V5-300M
         | 
         | Which makes sense as it's 300M in size and probably far less
         | complex, not a multi billions of parameters transformer.
        
           | refulgentis wrote:
           | I agree that's the most likely interpretation - does it read
           | as a shell game to you? Like, it _can_ do that but once you
           | get _the thing that can use the output_ involved it 's
           | 1/100th of that? Do they have anything that does stuff with
           | the outputs from _just_ MobileNet? If they don 't, how are
           | they sure I can build 60 fps realtime audiovisual experiences
           | they say I can?
        
             | namibj wrote:
             | Classify/similarity/clustering works fine with just an
             | encoder, doesn't it?
             | 
             | I guess there's benefit to running that step without
             | subsampling to the initial 256 tokens per image/frame ( htt
             | ps://ai.google.dev/gemma/docs/gemma-3n/model_card#inputs_..
             | . ) to go on from that,
             | https://github.com/antimatter15/reverse-engineering-
             | gemma-3n suggests these are 2048 dimensional tokens, which
             | makes these 60 Hz frame digestion rate produce just under
             | 31.5 Million floats-of-your-choosen-precision per second.
             | At least at the high (768x768) input resolution, this is a
             | bit less than one float per pixel.
             | 
             | I guess maybe with very heavy quantizing to like 4 bit that
             | could beat sufficiently-artifact-free video coding for then
             | streaming the tokenized vision to a (potentially cloud)
             | system that can keep up with the 15360 token/s at
             | (streaming) prefill stage?
             | 
             | Or I could imagine just local on-device visual semantic
             | search by expanding the search query into a bunch of tokens
             | that have some signed desire/want-ness each and where the
             | search tokens get attended to the frame's encoded tokens,
             | activation function'd, scaled (to positive/negative) by the
             | search token's desire score, and then just summed over each
             | frame to get a frame score which can be used for ranking
             | and other such search-related tasks.
             | 
             | (For that last thought, I asked Gemini 2.5 Pro to calculate
             | flops load, and it came out to 1.05 MFLOPS per frame per
             | search token; Reddit suggests the current Pixel's TPU does
             | around 50 TOPS, so if these reasonably match each
             | terminology wise, assuming we're spending about 20% of it's
             | compute on the search/match aspect, it comes out to an
             | unreasonably (-seeming) about 190k tokens the search query
             | could get expanded to. I interpret this result to imply
             | that quality/accuracy issues in the searching/filtering
             | mechanism would hit before encountering throughout issues
             | in this.)
        
       | lucb1e wrote:
       | I read the general parts and skimmed the inner workings but I
       | can't figure out what the high-level news is. What does this
       | concretely do that Gemma didn't already do, or what
       | benchmark/tasks did it improve upon?
       | 
       | Until it goes into the inner details (MatFormer, per-layer
       | embeddings, caching...), the only sentence I've found that
       | concretely mentions a new thing is "the first model under 10
       | billion parameters to reach [an LMArena score over 1300]". So
       | it's supposed to be better than other models until those that use
       | 10GB+ RAM, if I understand that right?
        
         | awestroke wrote:
         | > What does this concretely do that Gemma didn't already do
         | 
         | Open weights
        
           | lucb1e wrote:
           | Huh? I'm pretty sure I ran Gemma on my phone last month. Or
           | is there a difference between downloadable (you get the
           | weights because it's necessary to run the thing) and "open"
           | weights?
        
             | throwaway2087 wrote:
             | Wasn't it a preview version?
        
               | lucb1e wrote:
               | Oh, that could be. So this is the first on-device model
               | that Google releases, that's the news?
        
       | conradev wrote:
       | Kevin Kwok did a great job taking it apart:
       | https://github.com/antimatter15/reverse-engineering-gemma-3n
        
       | ghc wrote:
       | I just tried gemma3 out and it seems to be prone to getting stuck
       | in loops where it outputs an infinite stream of the same word.
        
         | sigmoid10 wrote:
         | Sounds a lot like an autoregressive sampling problem. Maybe try
         | to set temperature and repeat penalty differently.
        
       | actinium226 wrote:
       | I'm not a fan of this anarchic naming convention that OpenAI has
       | apparently made standard across the industry.
        
         | unsupp0rted wrote:
         | What would you have called it?
        
       | ericvolp12 wrote:
       | The Y-axis in that graph is fucking hilarious
        
       | lostmsu wrote:
       | I made a simple website[0] to check online model MMLU quickly
       | (runs a subset), and Gemma 3n consistently loses to LLaMA 3.3
       | (~61% vs ~66%), and definitely loses to LLaMA 4 Scout (~86%). I
       | suspect that means its rating on LMArena Leaderboard is just some
       | form of gaming the metric.
       | 
       | What's interesting, that it beats smarter models in my Turing
       | Test Battle Royale[1]. I wonder if it means it is a better
       | talker.
       | 
       | 0. https://mmlu.borgcloud.ai/
       | 
       | 1. https://trashtalk.borg.games/
        
       | bravetraveler wrote:
       | Updated Ollama to use this, now neither old or new work - much
       | productivity
        
         | rvnx wrote:
         | Well, see it the other way, there is something positive:
         | commenters here on HN claim that AI is useless. You can now
         | also join the bandwagon of people who have free time.
        
       | lowbatt wrote:
       | If I wanted to run this locally at somewhat decent speeds, is an
       | RK3588S board (like OrangePi 5) the cheapest option?
        
         | ac29 wrote:
         | RK3588 uses a 7 year old CPU design and OrangePi 5 looks
         | expensive (well over $100).
         | 
         | A used sub-$100 x86 box is going to be much better
        
           | lowbatt wrote:
           | you're right. For my purposes, I was thinking of something I
           | could use if I wanted to manufacture a new (smallish) product
        
         | jm4 wrote:
         | It depends on your idea of decent speeds and what you would use
         | it for. I just tried it on a laptop with an AMD HX 370 running
         | on battery in power save mode and it's not especially
         | impressive, although it runs much better in balanced or
         | performance mode. I gave it the prompt "write a fizzbuzz
         | program in rust" and it took almost a minute and a half. I
         | expect it to be pretty terrible on an SBC. Your best bet is to
         | try it out on the oldest hardware you have and figure out if
         | you can tolerate worse performance.
        
           | lowbatt wrote:
           | good idea, will test that out
        
         | babl-yc wrote:
         | I'm going to attempt to get it running on the BeagleY-AI
         | https://www.beagleboard.org/boards/beagley-ai
         | 
         | Similar form factor to raspberry pi but with 4 TOPS of
         | performance and enough RAM.
        
       | nsingh2 wrote:
       | Whats are some use cases for these local small models, for
       | individuals? Seems like for programming related work, the
       | proprietary models are significantly better and that's all I
       | really use LLMs for personally.
       | 
       | Though I can imagine a few commercial applications where
       | something like this would be useful. Maybe in some sort of
       | document processing pipeline.
        
         | russdill wrote:
         | Hoping to try it out with home assistant.
        
         | toddmorey wrote:
         | I think speech to text is the highlight used case for local
         | models because they are now really good at it and there's no
         | network latency.
        
         | androng wrote:
         | filtering out spam SMS messages without sending all SMS to the
         | cloud
        
         | thimabi wrote:
         | I'm thinking about building a pipeline to mass generate
         | descriptions for the images in my photo collection, to
         | facilitate search. Object recognition in local models is
         | already pretty good, and perhaps I can pair it with models to
         | recognize specific people by name as well.
        
         | jsphweid wrote:
         | For me? Handling data like private voice memos, pictures,
         | videos, calendar information, emails, some code etc. Stuff I
         | wouldn't want to share on the internet / have a model potential
         | slurp up and regurgitate as part of its memory when the data is
         | invariably used in some future training process.
        
         | msabalau wrote:
         | I just like having quick access to reasonable model that runs
         | comfortably on my phone, even if I'm in a place without
         | connectivity.
        
       | thimabi wrote:
       | Suppose I'd like to use models like this one to perform web
       | searches. Is there anything available in the open-source world
       | that would let me do that without much tinkering needed?
       | 
       | I think it's something that even Google should consider:
       | publishing open-source models with the possibility of grounding
       | their replies in Google Search.
        
         | vorticalbox wrote:
         | I have been using ollama + open web ui. open webUI already have
         | a web search tool all you would need to do is click the toggle
         | for it under the chat.
        
           | zettabomb wrote:
           | Unfortunately the OWUI web search is really slow and just not
           | great overall. I would suggest using an MCP integration
           | instead.
        
         | joerick wrote:
         | Google do have an API for this. It has limits but perfectly
         | good for personal use.
         | 
         | https://developers.google.com/custom-search/v1/overview
        
       | kccqzy wrote:
       | It seems way worse than other small models, including responding
       | with complete non sequiturs. I think my favorite small model is
       | still DeepSeek distilled with Llama 8B.
        
       | rvnx wrote:
       | Is there a chance that we see an uncensored version of this ?
        
         | throwaway2087 wrote:
         | Can you apply abiliteration? I'm not sure if their MatFormer
         | architecture is compatible with current techniques
        
       | pilooch wrote:
       | This model is fully compatible with anything previously done with
       | gemma3. Just passed it to one of my vlm fine-tuning scripts and
       | it started without issues (hf transformer code). On a single GPU
       | with Lora the E4B model takes 18Gb of VRAM in batch size 1 where
       | gemma-4B was 21Gb. Nice one from deepmind, the gemma3 family tops
       | the open weights VLLMs.
        
       | refulgentis wrote:
       | My post politely describing this blog post does not match
       | Google's own app, running inference on Pixel, is downvoted to -1,
       | below dead posts with one-off short jokes.
       | 
       | I am posting again because I've been here 16 years now, it is
       | _very_ suspicious that happened, and given the replies to it, we
       | now know this blog post is false.
       | 
       | There is no open model that you can download today and run at
       | even 1% of the claims in the blog post.
       | 
       | You can read a reply from someone indicating they have inside
       | knowledge on this, who notes this won't work as advertised unless
       | you're Google (i.e. internally, they have it binding to a
       | privileged system process that can access the Tensor core, and
       | this isn't available to third parties. Anyone else is getting
       | 1/100th of the speeds in the post)
       | 
       | This post promises $150K in prizes for on-device multimodal apps
       | and tells you it's running at up to 60 fps, they know it runs at
       | 0.1 fps, Engineering says it is because they haven't prioritized
       | 3rd parties yet, and somehow, Google is getting away with this.
        
       | simonw wrote:
       | I tried my "Generate an SVG of a pelican riding a bicycle" prompt
       | against Gemma 3n 7.5GB from Ollama and 15GB for mlx-vlm and got a
       | pleasingly different result for the two quantization sizes:
       | https://simonwillison.net/2025/Jun/26/gemma-3n/
        
         | JohnKemeny wrote:
         | Is that actually a useful benchmark, or is it just for the
         | laughs? I've never really understood that.
        
           | OtherShrezzing wrote:
           | For me, it shows if LLM are generalising from their training
           | data. LLM understand all of the words in the prompt. they
           | understand the spec for svg better than any human. They know
           | what a bird is. They know what a bike is. They know how to
           | draw (and given access to computer-use could probably ace
           | this test). They can plan and execute on those plans.
           | 
           | Everything here should be trivial for LLM, but they're quite
           | poor at it because there's almost no "how to draw complex
           | shapes in svg" type content in their training set.
        
       | zknowledge wrote:
       | anyone know how much it costs to use the deployed version of
       | gemma 3n? The docs indicate you can use the gemini api for
       | deployed gemma 3n but the pricing page just shows "unavailable"
        
       | kgwxd wrote:
       | Can popular sci-fi go 30 seconds without some lame wad naming
       | themselves or a product after it?
        
       ___________________________________________________________________
       (page generated 2025-06-26 23:00 UTC)