[HN Gopher] Introducing Gemma 3n
___________________________________________________________________
Introducing Gemma 3n
Author : bundie
Score : 245 points
Date : 2025-06-26 17:03 UTC (5 hours ago)
(HTM) web link (developers.googleblog.com)
(TXT) w3m dump (developers.googleblog.com)
| wiradikusuma wrote:
| I still don't understand the difference between Gemma and Gemini
| for on-device, since both don't need network access. From
| https://developer.android.com/ai/gemini-nano :
|
| "Gemini Nano allows you to deliver rich generative AI experiences
| without needing a network connection or sending data to the
| cloud." -- replace Gemini with Gemma and the sentence still
| valid.
| readthenotes1 wrote:
| Perplexity.ai gave an easier to understand response than Gemini
| 2.5 afaict.
|
| Gemini nano is for Android only.
|
| Gemma is available for other platforms and has multiple size
| options.
|
| So it seems like Gemini nano might be a very focused Gemma
| everywhere to follow the biology metaphor instead of the
| Italian name interpretation
| ridruejo wrote:
| The fact that you need HN and competitors to explain your
| offering should make Google reflect ...
| gardnr wrote:
| The Gemini billing dashboard makes me feel sad and
| confused.
| tyushk wrote:
| Licensing. You can't use Gemini Nano weights directly (at least
| commercial ly) and must interact with them through Android
| MLKit or similar Google approved runtimes.
|
| You can use Gemma commercially using whatever runtime or
| framework you can get to run it.
| littlestymaar wrote:
| It's not even clear you can license language model weight
| though.
|
| I'm not a lawyer but the analysis I've read had a pretty
| strong argument that there's no human creativity involved in
| the training, which is an entirely automatic process, and as
| such it cannot be copyrighted in any way (the same way you
| cannot put a license on a software artifact just because you
| compiled it yourself, you must have copyright ownership on
| the _source code_ you 're compiling).
| skissane wrote:
| IANAL either but the answer likely depends on the
| jurisdiction
|
| US standards for copyrightability require human creativity
| and model weights likely don't have the right kind of human
| creativity in them to be copyrightable in the US. No court
| to my knowledge has ruled on the question as yet, but
| that's the US Copyright Office's official stance.
|
| By contrast, standards for copyrightability in the UK are a
| lot weaker than-and so no court has ruled on the issue in
| the UK yet either, it seems likely a UK court would hold
| model weights to be copyrightable
|
| So from Google/Meta/etc's viewpoint, asserting copyright
| makes sense, since even if the assertion isn't legally
| valid in the US, it likely is in the UK - and not just the
| UK, many other major economies too. Australia, Canada,
| Ireland, New Zealand tend to follow UK courts on copyright
| law not US courts. And many EU countries are closer to the
| UK than the US on this as well, not necessarily because
| they follow the UK, often because they've reached a similar
| position based on their own legal traditions
|
| Finally: don't be surprised if Congress steps in and tries
| to legislate model weights as copyrightable in the US too,
| or grants them some sui generis form of legal protection
| which is legally distinct from copyright but similar to
| it-I can already hear the lobbyist argument, "US AI
| industry risks falling behind Europe because
| copyrightability of AI models in the US is legally
| uncertain and that legal uncertainty is discouraging
| investment"-I'm sceptical that is actually true, but
| something doesn't have to be true for lobbyists to convince
| Congress that it is
| simonw wrote:
| > US standards for copyrightability require human
| creativity and model weights likely don't have the right
| kind of human creativity in them to be copyrightable in
| the US. No court to my knowledge has ruled on the
| question as yet, but that's the US Copyright Office's
| official stance.
|
| Has the US copyright office said that about model
| weights? I've only heard them saying that about images
| produced entirely from a prompt to a model.
| AlanYx wrote:
| That's one of the reasons why they gate Gemini Nano with
| the "Gemini Nano Program Additional Terms of Service". Even
| if copyright doesn't subsist in the weights or if using
| them would be fair use, they still have recourse in breach
| of contract.
| skissane wrote:
| The problem is that contracts don't bind subsequent
| recipients, copyright does
|
| Google gives the model to X who gives it to Y who gives
| it to Z. X has a contract with Google, so Google can sue
| X for breach of contract if they violate its terms. But
| do Y and Z have such a contract? Probably not. Of course,
| Google can put language in their contract with X to try
| to make it bind Y and Z too, but is that language going
| to be legally effective? More often than not, no. The
| language may enable Google to successfully sue X over Y
| and Z's behaviour, but not successfully sue Y and Z
| directly. Whereas, with copyright, Y and Z are directly
| liable for violations just as X is
| jinlisp wrote:
| Thank you, this is a nice point to consider. Don't know
| if using the weights could be considered equivalent or
| implying accepting the terms of services from weights
| creators.
| derefr wrote:
| I've wondered about this for a while now (where e.g. some
| models of HuggingFace require clickwrap license
| agreements to download, that try to prohibit you from
| using the model in certain ways.)
|
| It seems to me that if some anonymous ne'er-do-well were
| to publicly re-host the model files for separate
| download; and you acquired the files from that person,
| rather than from Google; then you wouldn't be subject to
| their license, as you never so much as saw the clickwrap.
|
| (And you wouldn't be committing IP theft by acquiring it
| from that person, either, because of the non-
| copyrightability.)
|
| I feel that there must be something wrong with that
| logic, but I can't for the life of me think of what it
| is.
| km3r wrote:
| Why not? Training isn't just "data in/data out". The
| process for training is continuously tweaked and adjusted.
| With many of those adjustments being specific to the type
| of model you are trying to output.
| skissane wrote:
| The US copyright office's position is basically this-
| under US law, copyrightability requires direct human
| creativity, an automated training process involves no
| direct human creativity so cannot produce copyright. Now,
| we all know there is a lot of creative human effort in
| selecting what data to use as input, tinkering with
| hyperparameters, etc - but the copyright office's
| position is that doesn't legally count - creative human
| effort in overseeing an automated process doesn't change
| the fact that the automated process itself doesn't
| _directly_ involve any human creativity. So the human
| creativity in model training fails to make the model
| copyrightable because it is too indirect
|
| By contrast, UK copyright law accepts the "mere sweat of
| the brow" doctrine, the mere fact you spent money on
| training is likely sufficient to make its output
| copyrightable, UK law doesn't impose the same
| requirements for a direct human creative contribution
| IncreasePosts wrote:
| Doesn't that imply just the training process isn't
| copyrightable? But weights aren't just training, they're
| also your source data. And if the training set shows
| originality in selection, coordination, or arrangement,
| isn't that copyrightable? So why wouldn't the weights also
| be copyrightable?
| skissane wrote:
| The problem is, can you demonstrate that originality of
| selection and arrangement actually survives in the
| trained model? It is legally doubtful.
|
| Nobody knows for sure what the legal answer is, because
| the question hasn't been considered by a court - but the
| consensus of expert legal opinion is copyrightability of
| models is doubtful under US law, and the kind of argument
| you make isn't strong enough to change that. As I said,
| different case for UK law, nobody really needs your
| argument there because model weights likely are
| copyrightable in the UK already
| rvnx wrote:
| The weights are mathematical facts. As raw numbers, they
| are not copyrightable.
| IncreasePosts wrote:
| `en_windows_xp_professional_with_service_pack_3_x86_cd_vl
| _x14-73974.iso` is also just raw numbers, but I believe
| Windows XP was copyrightable
| badsectoracula wrote:
| For the same reason GenAI output isn't copyrightable
| regardless of how much time you spend tweaking your
| prompts.
|
| Also i'm pretty sure none of the AI companies would
| really want to touch the concept of having the copyright
| of source data affect the weight's own copyright,
| considering all of them pretty much hoover up the entire
| Internet without caring about those copyrights (and IMO
| trying to claim that they should be able to ignore the
| copyrights of training data and also that the GenAI
| output is not under copyright but at the same trying
| trying to claim copyright for the weights is dishonest,
| if not outright leechy).
| jabroni_salad wrote:
| Gemma is open source and apache 2.0 licensed. If you want to
| include it with an app you have to package it yourself.
|
| gemini nano is an android api that you dont control at all.
| nicce wrote:
| > Gemma is open source and apache 2.0 licensed
|
| Closed source but open weight. Let's not ruin the definition
| of the term in advantage of big companies.
| zackangelo wrote:
| Your reply adds more confusion, imo.
|
| The inference code and model architecture IS open source[0]
| and there are many other high quality open source
| implementations of the model (in many cases contributed by
| Google engineers[1]). To your point: they do not publish
| the data used to train the model so you can't re-create it
| from scratch.
|
| [0] https://github.com/google-deepmind/gemma [1]
| https://github.com/vllm-project/vllm/pull/2964
| candiddevmike wrote:
| If for some reason you had the training data, is it even
| possible to create an exact (possibly same hash?) copy of
| the model? Seems like there are a lot of other pieces
| missing like the training harness, hardware it was
| trained on, etc?
| OneDeuxTriSeiGo wrote:
| to be entirely fair that's quite a high bar even for most
| "traditional" open source.
|
| And even if you had the same data, there's no guarantee
| the random perturbations during training are driven by a
| PRNG and done in a way that is reproducible.
|
| Reproducibility does not make something open source.
| Reproducibility doesn't even necessarily make something
| free software (under the GNU interpretation). I mean
| hell, most docker containers aren't even hash-
| reproducible.
| zackangelo wrote:
| Yes, this is true. A lot of times labs will hold back
| necessary infrastructure pieces that allow them to train
| huge models reliably and on a practical time scale. For
| example, many have custom alternatives to Nvidia's NCCL
| library to do fast distributed matrix math.
|
| Deepseek published a lot of their work in this area
| earlier this year and as a result the barrier isn't as
| high as it used to be.
| nicce wrote:
| I am not sure if this adds even more confusion. Linked
| library is about fine-tuning which is completely
| different process.
|
| Their publications about producing Gemma is not accurate
| enough that even with data you would get the same
| results.
| zackangelo wrote:
| In the README of the linked library they have a code
| snippet showing how to have a conversation with the
| model.
|
| Also, even if it were for fine tuning, that would require
| an implementation of the model's forward pass (which is
| all that's necessary to run it).
| nicce wrote:
| That is completely different discussion. Otherwise, even
| Gemini 2.5 Pro would be open-source with this logic since
| clients are open-source for interacting with the cloud
| APIs.
| Imustaskforhelp wrote:
| Yes!! But I doubt how many are truly truly open source
| models since most just confuse open source with open
| weights and the definition has been changed really smh.
| cesarb wrote:
| > Gemma is open source and apache 2.0 licensed.
|
| Are you sure? On a quick look, it appears to use its own
| bespoke license, not the Apache 2.0 license. And that license
| appears to have field of use restrictions, which means it
| would not be classified as an open source license according
| to the common definitions (OSI, DFSG, FSF).
| impure wrote:
| I suspect the difference is in the training data. Gemini is
| much more locked down and if it tries to repeat something from
| the draining data verbatim you will get a 'recitation error'.
| danielhanchen wrote:
| Made some GGUFs if anyone wants to run them!
|
| ./llama.cpp/llama-cli -hf unsloth/gemma-3n-E4B-it-GGUF:UD-Q4_K_XL
| -ngl 99 --jinja --temp 0.0
|
| ./llama.cpp/llama-cli -hf unsloth/gemma-3n-E2B-it-GGUF:UD-Q4_K_XL
| -ngl 99 --jinja --temp 0.0
|
| I'm also working on an inference + finetuning Colab demo! I'm
| very impressed since Gemma 3N has audio, text and vision!
| https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-...
| upghost wrote:
| Literally was typing out "Unsloth, do your thing!!" but you are
| way ahead of me. You rock <3 <3 <3
|
| Thank you!
| danielhanchen wrote:
| :) Thanks!
| knowaveragejoe wrote:
| What is `jinja` in this context?
| gowld wrote:
| https://jinja.palletsprojects.com/en/stable/
| bilsbie wrote:
| Thanks! What kind of rig do I need?
| turnsout wrote:
| This looks amazing given the parameter sizes and capabilities
| (audio, visual, text). I like the idea of keeping simple tasks
| local. I'll be curious to see if this can be run on an M1
| machine...
| bigyabai wrote:
| This should run fine on most hardware - CPU inference of the
| E2B model on my Pixel 8 Pro gives me ~9tok/second of decode
| speed.
| Fergusonb wrote:
| Sure it can, easiest way is to get ollama, then `ollama run
| gemma3n` You can pair it with tools like simonw's LLM to pipe
| stuff to it.
| minimaxir wrote:
| LM Studio has MLX variants of the model out:
| http://huggingface.co/lmstudio-community/gemma-3n-E4B-it-MLX...
|
| However it's still 8B parameters and there are no quantized
| models just yet.
| Workaccount2 wrote:
| Anyone have any idea on the viability of running this on a Pi5
| 16GB? I have a few fun ideas if this can handle working with
| images (or even video?) well.
| gardnr wrote:
| The 4-bit quant weighs 4.25 GB and then you need space for the
| rest of the inference process. So, yeah you can definitely run
| the model on a Pi, you may have to wait some time for results.
|
| https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF
| refulgentis wrote:
| See here, long story short, this is another in a series of blog
| posts that would lead you to believe this was viable, but it
| isn't :/ https://news.ycombinator.com/item?id=44389793
| impure wrote:
| I've been playing around with E4B in AI Studio and it has been
| giving me really great results, much better than what you'd
| expect from an 8B model. In fact I'm thinking of trying to
| install it on a VPS so I can have an alternative to pricy APIs.
| tgtweak wrote:
| Any readily-available APKs for testing this on Android?
| refulgentis wrote:
| APK link here: https://github.com/google-ai-
| edge/gallery?tab=readme-ov-file...
| tgtweak wrote:
| Ah, I already had edge installed and it had gemma 3n-e4b
| downloaded... is this the same model that was previously
| released?
| makeramen wrote:
| Seems like that was a preview model, unknown if this
| released version is different
| tgtweak wrote:
| I think it's only pulling the older model - I see it's
| using the liteRT models from May.
| refulgentis wrote:
| Somethings really screwy with on-device models from Google, I
| can't put my finger on what, and I think being ex-Google is
| screwing with my ability to evaluate.
|
| Cherry-picking something that's quick to evaluate:
|
| "High throughput: Processes up to 60 frames per second on a
| Google Pixel, enabling real-time, on-device video analysis and
| interactive experiences."
|
| You can download an APK from the official Google project for
| this, linked from the blogpost: https://github.com/google-ai-
| edge/gallery?tab=readme-ov-file...
|
| If I download it, run it on Pixel Fold, _actual_ 2B model which
| is half the size of the ones the 60 fps claim is made for, it
| takes 6.2-7.5 seconds to begin responding (3 samples, 3 diff
| photos). Generation speed is shown at 4-5 tokens per second,
| slightly slower than what llama.cpp does on my phone. (I maintain
| an AI app that inter alia, wraps llama.cpp on all platforms)
|
| So, *0.16* frames a second, not 60 fps.
|
| The blog post is so jammed up with so many claims re: this is
| special for on-device and performance that just...seemingly
| aren't true. At all.
|
| - Are they missing a demo APK?
|
| - Was there some massive TPU leap since the Pixel Fold release?
|
| - Is there a lot of BS in there that they're pretty sure won't be
| called out in a systematic way, given the amount of effort it
| takes to get this inferencing?
|
| - I used to work on Pixel, and I remember thinking that it seemed
| like there weren't _actually_ public APIs for the TPU. Is that
| what 's going on?
|
| In any case, either:
|
| A) I'm missing something, big or
|
| B) they are lying, repeatedly, big time, in a way that would be
| shown near-immediately when you actually tried building on it
| because it "enables real-time, on-device video analysis and
| interactive experiences."
|
| Everything I've seen the last year or two indicates they are
| lying, big time, regularly.
|
| But if that's the case:
|
| - How are they getting away with it, over this length of time?
|
| - How come I never see anyone else mention these gaps?
| catchmrbharath wrote:
| The APK that you linked, runs the inference on CPU and does not
| run it on Google Tensor.
| refulgentis wrote:
| That sounds fair, but opens up another N questions:
|
| - Are there APK(s) that _run on Tensor_?
|
| - Is it possible to run on Tensor if you're not Google?
|
| - Is there _anything at all from anyone_ I can download that
| 'll run it on Tensor?
|
| - If there isn't, why not? (i.e. this isn't the first on
| device model release by any stretch, so I can't give benefit
| of the doubt at this point)
| catchmrbharath wrote:
| > Are there APK(s) that run on Tensor?
|
| No. AiCore service internally uses the inference on Tensor
| (http://go/android-dev/ai/gemini-nano)
|
| > Is there anything at all from anyone I can download
| that'll run it on Tensor?
|
| No.
|
| > If there isn't, why not? (i.e. this isn't the first on
| device model release by any stretch, so I can't give
| benefit of the doubt at this point)
|
| Mostly because 3P support has not been a engineering
| priority.
| refulgentis wrote:
| > Mostly because 3P support has not been a engineering
| priority.
|
| Got it: assuming you're at Google, in eng. parlance, it's
| okay if it's not Prioritized(tm) but then
| product/marketing/whoever shouldn't be publishing posts
| around the premise it's running 60 fps multimodal
| experiences on device.
|
| They're very, very, lucky that ratio of people vaguely
| interested in this, to people follow through on using it,
| is high, so comments like mine end up at -1.
| lostmsu wrote:
| How does their demo work then? It's been 3 months since 3n
| was first released publicly.
| mlsu wrote:
| It looks to me by the marketing copy that the vision encoder
| can run 60FPS.
|
| > MobileNet-V5-300M
|
| Which makes sense as it's 300M in size and probably far less
| complex, not a multi billions of parameters transformer.
| refulgentis wrote:
| I agree that's the most likely interpretation - does it read
| as a shell game to you? Like, it _can_ do that but once you
| get _the thing that can use the output_ involved it 's
| 1/100th of that? Do they have anything that does stuff with
| the outputs from _just_ MobileNet? If they don 't, how are
| they sure I can build 60 fps realtime audiovisual experiences
| they say I can?
| namibj wrote:
| Classify/similarity/clustering works fine with just an
| encoder, doesn't it?
|
| I guess there's benefit to running that step without
| subsampling to the initial 256 tokens per image/frame ( htt
| ps://ai.google.dev/gemma/docs/gemma-3n/model_card#inputs_..
| . ) to go on from that,
| https://github.com/antimatter15/reverse-engineering-
| gemma-3n suggests these are 2048 dimensional tokens, which
| makes these 60 Hz frame digestion rate produce just under
| 31.5 Million floats-of-your-choosen-precision per second.
| At least at the high (768x768) input resolution, this is a
| bit less than one float per pixel.
|
| I guess maybe with very heavy quantizing to like 4 bit that
| could beat sufficiently-artifact-free video coding for then
| streaming the tokenized vision to a (potentially cloud)
| system that can keep up with the 15360 token/s at
| (streaming) prefill stage?
|
| Or I could imagine just local on-device visual semantic
| search by expanding the search query into a bunch of tokens
| that have some signed desire/want-ness each and where the
| search tokens get attended to the frame's encoded tokens,
| activation function'd, scaled (to positive/negative) by the
| search token's desire score, and then just summed over each
| frame to get a frame score which can be used for ranking
| and other such search-related tasks.
|
| (For that last thought, I asked Gemini 2.5 Pro to calculate
| flops load, and it came out to 1.05 MFLOPS per frame per
| search token; Reddit suggests the current Pixel's TPU does
| around 50 TOPS, so if these reasonably match each
| terminology wise, assuming we're spending about 20% of it's
| compute on the search/match aspect, it comes out to an
| unreasonably (-seeming) about 190k tokens the search query
| could get expanded to. I interpret this result to imply
| that quality/accuracy issues in the searching/filtering
| mechanism would hit before encountering throughout issues
| in this.)
| lucb1e wrote:
| I read the general parts and skimmed the inner workings but I
| can't figure out what the high-level news is. What does this
| concretely do that Gemma didn't already do, or what
| benchmark/tasks did it improve upon?
|
| Until it goes into the inner details (MatFormer, per-layer
| embeddings, caching...), the only sentence I've found that
| concretely mentions a new thing is "the first model under 10
| billion parameters to reach [an LMArena score over 1300]". So
| it's supposed to be better than other models until those that use
| 10GB+ RAM, if I understand that right?
| awestroke wrote:
| > What does this concretely do that Gemma didn't already do
|
| Open weights
| lucb1e wrote:
| Huh? I'm pretty sure I ran Gemma on my phone last month. Or
| is there a difference between downloadable (you get the
| weights because it's necessary to run the thing) and "open"
| weights?
| throwaway2087 wrote:
| Wasn't it a preview version?
| lucb1e wrote:
| Oh, that could be. So this is the first on-device model
| that Google releases, that's the news?
| conradev wrote:
| Kevin Kwok did a great job taking it apart:
| https://github.com/antimatter15/reverse-engineering-gemma-3n
| ghc wrote:
| I just tried gemma3 out and it seems to be prone to getting stuck
| in loops where it outputs an infinite stream of the same word.
| sigmoid10 wrote:
| Sounds a lot like an autoregressive sampling problem. Maybe try
| to set temperature and repeat penalty differently.
| actinium226 wrote:
| I'm not a fan of this anarchic naming convention that OpenAI has
| apparently made standard across the industry.
| unsupp0rted wrote:
| What would you have called it?
| ericvolp12 wrote:
| The Y-axis in that graph is fucking hilarious
| lostmsu wrote:
| I made a simple website[0] to check online model MMLU quickly
| (runs a subset), and Gemma 3n consistently loses to LLaMA 3.3
| (~61% vs ~66%), and definitely loses to LLaMA 4 Scout (~86%). I
| suspect that means its rating on LMArena Leaderboard is just some
| form of gaming the metric.
|
| What's interesting, that it beats smarter models in my Turing
| Test Battle Royale[1]. I wonder if it means it is a better
| talker.
|
| 0. https://mmlu.borgcloud.ai/
|
| 1. https://trashtalk.borg.games/
| bravetraveler wrote:
| Updated Ollama to use this, now neither old or new work - much
| productivity
| rvnx wrote:
| Well, see it the other way, there is something positive:
| commenters here on HN claim that AI is useless. You can now
| also join the bandwagon of people who have free time.
| lowbatt wrote:
| If I wanted to run this locally at somewhat decent speeds, is an
| RK3588S board (like OrangePi 5) the cheapest option?
| ac29 wrote:
| RK3588 uses a 7 year old CPU design and OrangePi 5 looks
| expensive (well over $100).
|
| A used sub-$100 x86 box is going to be much better
| lowbatt wrote:
| you're right. For my purposes, I was thinking of something I
| could use if I wanted to manufacture a new (smallish) product
| jm4 wrote:
| It depends on your idea of decent speeds and what you would use
| it for. I just tried it on a laptop with an AMD HX 370 running
| on battery in power save mode and it's not especially
| impressive, although it runs much better in balanced or
| performance mode. I gave it the prompt "write a fizzbuzz
| program in rust" and it took almost a minute and a half. I
| expect it to be pretty terrible on an SBC. Your best bet is to
| try it out on the oldest hardware you have and figure out if
| you can tolerate worse performance.
| lowbatt wrote:
| good idea, will test that out
| babl-yc wrote:
| I'm going to attempt to get it running on the BeagleY-AI
| https://www.beagleboard.org/boards/beagley-ai
|
| Similar form factor to raspberry pi but with 4 TOPS of
| performance and enough RAM.
| nsingh2 wrote:
| Whats are some use cases for these local small models, for
| individuals? Seems like for programming related work, the
| proprietary models are significantly better and that's all I
| really use LLMs for personally.
|
| Though I can imagine a few commercial applications where
| something like this would be useful. Maybe in some sort of
| document processing pipeline.
| russdill wrote:
| Hoping to try it out with home assistant.
| toddmorey wrote:
| I think speech to text is the highlight used case for local
| models because they are now really good at it and there's no
| network latency.
| androng wrote:
| filtering out spam SMS messages without sending all SMS to the
| cloud
| thimabi wrote:
| I'm thinking about building a pipeline to mass generate
| descriptions for the images in my photo collection, to
| facilitate search. Object recognition in local models is
| already pretty good, and perhaps I can pair it with models to
| recognize specific people by name as well.
| jsphweid wrote:
| For me? Handling data like private voice memos, pictures,
| videos, calendar information, emails, some code etc. Stuff I
| wouldn't want to share on the internet / have a model potential
| slurp up and regurgitate as part of its memory when the data is
| invariably used in some future training process.
| msabalau wrote:
| I just like having quick access to reasonable model that runs
| comfortably on my phone, even if I'm in a place without
| connectivity.
| thimabi wrote:
| Suppose I'd like to use models like this one to perform web
| searches. Is there anything available in the open-source world
| that would let me do that without much tinkering needed?
|
| I think it's something that even Google should consider:
| publishing open-source models with the possibility of grounding
| their replies in Google Search.
| vorticalbox wrote:
| I have been using ollama + open web ui. open webUI already have
| a web search tool all you would need to do is click the toggle
| for it under the chat.
| zettabomb wrote:
| Unfortunately the OWUI web search is really slow and just not
| great overall. I would suggest using an MCP integration
| instead.
| joerick wrote:
| Google do have an API for this. It has limits but perfectly
| good for personal use.
|
| https://developers.google.com/custom-search/v1/overview
| kccqzy wrote:
| It seems way worse than other small models, including responding
| with complete non sequiturs. I think my favorite small model is
| still DeepSeek distilled with Llama 8B.
| rvnx wrote:
| Is there a chance that we see an uncensored version of this ?
| throwaway2087 wrote:
| Can you apply abiliteration? I'm not sure if their MatFormer
| architecture is compatible with current techniques
| pilooch wrote:
| This model is fully compatible with anything previously done with
| gemma3. Just passed it to one of my vlm fine-tuning scripts and
| it started without issues (hf transformer code). On a single GPU
| with Lora the E4B model takes 18Gb of VRAM in batch size 1 where
| gemma-4B was 21Gb. Nice one from deepmind, the gemma3 family tops
| the open weights VLLMs.
| refulgentis wrote:
| My post politely describing this blog post does not match
| Google's own app, running inference on Pixel, is downvoted to -1,
| below dead posts with one-off short jokes.
|
| I am posting again because I've been here 16 years now, it is
| _very_ suspicious that happened, and given the replies to it, we
| now know this blog post is false.
|
| There is no open model that you can download today and run at
| even 1% of the claims in the blog post.
|
| You can read a reply from someone indicating they have inside
| knowledge on this, who notes this won't work as advertised unless
| you're Google (i.e. internally, they have it binding to a
| privileged system process that can access the Tensor core, and
| this isn't available to third parties. Anyone else is getting
| 1/100th of the speeds in the post)
|
| This post promises $150K in prizes for on-device multimodal apps
| and tells you it's running at up to 60 fps, they know it runs at
| 0.1 fps, Engineering says it is because they haven't prioritized
| 3rd parties yet, and somehow, Google is getting away with this.
| simonw wrote:
| I tried my "Generate an SVG of a pelican riding a bicycle" prompt
| against Gemma 3n 7.5GB from Ollama and 15GB for mlx-vlm and got a
| pleasingly different result for the two quantization sizes:
| https://simonwillison.net/2025/Jun/26/gemma-3n/
| JohnKemeny wrote:
| Is that actually a useful benchmark, or is it just for the
| laughs? I've never really understood that.
| OtherShrezzing wrote:
| For me, it shows if LLM are generalising from their training
| data. LLM understand all of the words in the prompt. they
| understand the spec for svg better than any human. They know
| what a bird is. They know what a bike is. They know how to
| draw (and given access to computer-use could probably ace
| this test). They can plan and execute on those plans.
|
| Everything here should be trivial for LLM, but they're quite
| poor at it because there's almost no "how to draw complex
| shapes in svg" type content in their training set.
| zknowledge wrote:
| anyone know how much it costs to use the deployed version of
| gemma 3n? The docs indicate you can use the gemini api for
| deployed gemma 3n but the pricing page just shows "unavailable"
| kgwxd wrote:
| Can popular sci-fi go 30 seconds without some lame wad naming
| themselves or a product after it?
___________________________________________________________________
(page generated 2025-06-26 23:00 UTC)