[HN Gopher] Gemma 3 270M: Compact model for hyper-efficient AI
___________________________________________________________________
Gemma 3 270M: Compact model for hyper-efficient AI
Author : meetpateltech
Score : 481 points
Date : 2025-08-14 16:08 UTC (6 hours ago)
(HTM) web link (developers.googleblog.com)
(TXT) w3m dump (developers.googleblog.com)
| canyon289 wrote:
| Hi all, I built these models with a great team. They're available
| for download across the open model ecosystem so give them a try!
| I built these models with a great team and am thrilled to get
| them out to you.
|
| From our side we designed these models to be strong for their
| size out of the box, and with the goal you'll all finetune it for
| your use case. With the small size it'll fit on a wide range of
| hardware and cost much less to finetune. You can try finetuning
| them yourself in a free colab in under 5 minutes
|
| For picking a Gemma size this is a video I recorded for the 1b to
| 27b sizes earlier this year, 270m being the newest addition
|
| https://www.youtube.com/watch?v=qcjrduz_YS8
|
| Hacker News Disclaimer I really like working at Google so with
| that; All my opinions here are my own, I'm a researcher so I'll
| largely focus on technical questions, and I'll share what I can.
| ActorNightly wrote:
| How does the 270 perform with coding?
|
| I use Gemma27b currently with a custom agent wrapper and its
| working pretty well.
| chrismustcode wrote:
| I'd be stunned if a 270m model could code with any
| proficiency.
|
| If you have an iPhone with the semi-annoying autocomplete
| that's a 34m transformer.
|
| Can't imagine a model (even if it's a good team behind it) to
| do coding with 8x the parameters of a next 3/4 word
| autocomplete.
| 0x457 wrote:
| Someone should try this on that model:
| https://www.oxen.ai/blog/training-a-rust-1-5b-coder-lm-
| with-...
| beoberha wrote:
| Awesome work! I'm really bullish on small models and think they
| have the most potential to change our daily lives. Can't wait
| to play around with this
| fibers wrote:
| Great job. Do you know how well it performs in sanity checks
| with NER since it is on the press release page?
| tmaly wrote:
| Are there any fine tuning in a box type options available in
| the cloud for this? This is amazing work, thank you.
| canyon289 wrote:
| Finetuning is possible on free tier colab and 5 minutes of
| time. Here's a tutorial
|
| https://ai.google.dev/gemma/docs/core/huggingface_text_full_.
| ..
| simonw wrote:
| Do you have any practical examples of fine-tuned variants of
| this that you can share? A description would be great, but a
| demo or even downloadable model weights (GGUF ideally) would be
| even better.
| canyon289 wrote:
| We obviously need to create a pelican bicycle svg finetune ;)
| If you want to try this out I'd be thrilled to do it with
| you, I genuinely am curious how well this model can perform
| if specialized on that task.
|
| A couple colleagues of mine posted an example of finetuning a
| model to take on persona's for videogame NPCs. They have
| experience working with folks in the game industry and a use
| case like this is suitable for game devs who want to start
| including lightweight models that won't take up a ton of
| accelerator memory and can run efficiently on CPU if needed.
| https://ai.google.dev/gemma/docs/core/huggingface_text_full_.
| ..
|
| As for GGUF it's available here!
| https://huggingface.co/collections/ggml-
| org/gemma-3-270m-689...
| megaman821 wrote:
| What size of tasks can this handle? Can you do a fine-tune
| of Mac System Settings?
| canyon289 wrote:
| 32k context window so whatever fits in there. What is a
| finetune of mac system settings?
| megaman821 wrote:
| The finetune would be an LLM where you say something like
| "my colors on the screen look to dark" and then it points
| you to Displays -> Brightness. It feels like a relatively
| constrained problem like finding the system setting that
| solves your problem is a good fit for a tiny LLM.
| canyon289 wrote:
| This would be a great experiment. I'm not sure how the OS
| integration would work, but as a first pass you could try
| finetuning the model to take natural language "my colors
| on the screen look to dark" and then have it output
| "Displays -> Brightness", then expand to the various
| other paths you would like the model to understand
| mrbonner wrote:
| Do you know that hardware required to fine-tune this model?
| I'm asking on behave of us GPU starve folks
| canyon289 wrote:
| A free colab. Here's a link, you can finetune the model
| in ~5 minutes in this example, and I encourage you to try
| your own
|
| https://ai.google.dev/gemma/docs/core/huggingface_text_fu
| ll_...
| punnerud wrote:
| Finally a Google guide using PyTorch and not Tensorflow,
| that alone made me wanting to try it out ;)
| srekhi wrote:
| video game NPCs with intelligence :O gaming is going to be
| crazy
| cgdl wrote:
| Very cool. For the INT4 QAT model, what is the recommended
| precision for the activations and for the key and values stored
| in KV cache?
| hnuser123456 wrote:
| For keys, you probably want to use at least q5 or q6, for
| values q4 is fine
| VirusNewbie wrote:
| hi Ravin, fellow Googler here. Curious if you can share here
| (or internally?) how these models were trained. Wondering if
| you face all the chaos the large models have during training?
| canyon289 wrote:
| Reach out to me internally
| NorwegianDude wrote:
| The Gemma 3 models are great! One of the few models that can
| write Norwegian decently, and the instruction following is in
| my opinion good for most cases. I do however have some issues
| that might be related to censorship that I hope will be fixed
| if there is ever a Gemma 4. Maybe you have some insight into
| why this is happening?
|
| I run a game when players can post messages, it's a game where
| players can kill each other, and people often send threats
| along the lines of "I will kill you". Telling Gemma that it
| should classify a message as game related or a real life
| threat, and that it is for a message in a game where players
| can kill each other and threats are a part of the game, and
| that it should mark it as game related if it is unclear if the
| message is a game related threat or a real life threat does not
| work well. For other similar tasks it seems to follow
| instructions well, but for serious topics it seems to be very
| biased, and often err on the side of caution, despite being
| told not to. Sometimes it even spits out some help lines to
| contact.
|
| I guess this is because it was trained to be safe, and that
| affects it's ability to follow instructions for this? Or am I
| completely off here?
| whymauri wrote:
| LLMs are really annoying to use for moderation and Trust and
| Safety. You either depend on super rate-limited 'no-
| moderation' endpoints (often running older, slower models at
| a higher price) or have to tune bespoke un-aligned models.
|
| For your use case, you should probably fine tune the model to
| reduce the rejection rate.
| canyon289 wrote:
| Speaking for me as an individual as an individual I also
| strive to build things that are safe AND useful. Its quite
| challenging to get this mix right, especially at the 270m
| size and with varying user need.
|
| My advice here is make the model your own. Its open weight,
| I encourage it to be make it useful for your use case and
| your users, and beneficial for society as well. We did our
| best to give you a great starting point, and for Norwegian
| in particular we intentionally kept the large embedding
| table to make adaption to larger vocabularies easier.
| whymauri wrote:
| To be fair, Trust and Safety workloads are edgecases
| w.r.t. the riskiness profile of the content. So in that
| sense, I get it.
| sheepdestroyer wrote:
| I don't. "safety" as it exists really feels like
| infantilization, condescention, hand holding and
| enforcement of American puritanism. It's insulting.
|
| Safety should really just be a system prompt: "hey you
| potentially answer to kids, be PG13"
| ungreased0675 wrote:
| Safety in the context of LLMs means "avoiding bad media
| coverage or reputation damage for the parent company"
|
| It has only a tangential relationship with end user
| safety.
|
| If some of these companies are successful the way they
| imagine, most of their end users will be unemployed. When
| they talk about safety, it's the companies safety they're
| referring to.
| kevinventullo wrote:
| Perhaps you can do some pre-processing before the LLM sees
| it, e.g. replacing every instance of "kill" with
| "NorwegianDudeGameKill", and providing the specific context
| of what the word "NorwegianDudeGameKill" means in your game.
|
| Of course, it would be better for the LLM to pick up the
| context automatically, but given what some sibling comments
| have noted about the PR risks associated with that, you might
| be waiting a while.
| nottorp wrote:
| I suppose it can't kill -USR1 either...
| andrewstuart wrote:
| What effort do you folks take to see your models actually
| running on hardware such as AMD Strix Halo or Apple M3M4?
|
| I get the sense that AI is at the "hobby kit computing" stage
| where they used to dump all the components in a box and give
| you a schematic and a soldering iron and happily say "you make
| it work!"
|
| And that worked in the early days of computing because there
| was a small number of people really motivated for the outcome.
|
| But fully assembled and packaged and tested in a nice looking
| box is where the real demand turned out to be.
|
| I'm looking forward to the day Google doesn't just dump a model
| and say "you do the rest".
|
| I want to fire up Ubuntu on a Strix Halo and say apt install
| then load the browser interface. Or just download and run a Mac
| installer and have it just work.
|
| Arcane complex multi step build install configure processes for
| AI need to end in favor of one click install. I'm not
| interested in the process of making it run.
| dist-epoch wrote:
| Here you go, one click installer - https://lmstudio.ai
| andrewstuart wrote:
| I'm talking about the supplier doing the packaging.
| garbageman wrote:
| Then use ChatGPT/Gemini/Claude on your phone.
|
| They are giving it away for free - if you NEED a local
| LLM, the least you can do is spend the 2 minutes to
| download LMSTudio and pick a model.
| canyon289 wrote:
| I don't think we dumped the model and say you do the rest?
|
| Myself and my colleagues spent many days transforming the
| weights into various open compatible formats. And its not
| just us there's many orgs and partners dedicating their time,
| resources, and companies to making all open models easy to
| use.
|
| I encourage you to explore the solutions provided by them. We
| linked some in our blog post here, and there's more. They've
| all done a fantastic job building frankly an insane amount of
| infrastructure, documentation, and community support in the
| last 2+ years. Some of them are here in this HN thread
| answering questions.
| ceroxylon wrote:
| You reminded me of an awesome Google engineer I met at BSidesSF
| last year who tirelessly answered my questions, and when I
| clicked on the video, it was you! That was a really inspiring
| moment for me, thank you.
| canyon289 wrote:
| BSidesSF is a fantastic event, glad you're able to attend.
| There's some great people organize it and if you want to help
| out they're always looking for volunteers. Happy to make an
| intro if you like.
| nerdsniper wrote:
| What are some of the use cases that you think the 270M would be
| most appropriate for? What would you love to see people trying
| with it?
| jmorgan wrote:
| Amazing work. This model feels really good at one-off tasks
| like summarization and autocomplete. I really love that you
| released a quantized aware training version on launch day as
| well, making it even smaller!
| canyon289 wrote:
| Thank you Jeffrey, and we're thrilled that you folks at
| Ollama partner with us and the open model ecosystem.
|
| I personally was so excited to run ollama pull gemma3:270b on
| my personal laptop just a couple of hours ago to get this
| model on my devices as well!
| blitzar wrote:
| > gemma3:270b
|
| I think you mean gemma3:270m - Its Dos Comas not Tres Comas
| freedomben wrote:
| Maybe it's 270m after Hooli's SOTA compression algorithm
| gets ahold of it
| canyon289 wrote:
| Ah yes thank you. Even I still instinctively type B
| stefan_ wrote:
| I suppose the odd repetition of "I built these models with a
| great team" is to warm us up to this model slipping into
| infinite gibberish repetition?
| bastardoperator wrote:
| My first thought was they must have used AI to write this HN
| comment...
| _1 wrote:
| > and with the goal you'll all finetune it for your use case.
|
| What use-cases are a good fit for finetuning this model? More
| specific instruction following, knowledge from proprietary
| data, response tone?
| canyon289 wrote:
| Any text to text use case with 32k context, especially if
| you're starting from the PT version you can finetune it to do
| whatever you need
| blitzar wrote:
| > I built these models with a great team ... I built these
| models with a great team
|
| If Gemini is going to repeat something at least its that the
| team is great, and not a disgrace!
| patrickaljord wrote:
| Would it be possible to have a specialized rust only dev or
| Reactjs only dev while getting rid of all other languages to
| minimize size of model?
| imasl42 wrote:
| Awesome! I'm curious how is the team you built these models
| with? Is it great?
| freedomben wrote:
| Heh, what could they possibly say in answer to this? The team
| is full of assholes? :-D
| canyon289 wrote:
| Its hard to tell over the web whether things are sarcastic or
| not so excuse me if I misread the intent.
|
| At Google I've found my colleagues to be knowledgeable, kind,
| and collaborative and I enjoy interacting with them. This is
| not just the folks I worked on this project with, but
| previous colleagues in other teams as well. With this
| particular product I've been impressed by the technical
| knowledge folks I worked directly with, and their
| contribution both improved the model's capability and my own.
| mkl wrote:
| I think it was a joke about you saying the team was great
| twice in one line.
| ankit219 wrote:
| This is super cool. Usually you dont see effective models at
| 270M out in the wild. The architectural choices are new and
| interesting as well.
|
| Would it be okay for you to divulge some more training
| information here? With 170M embedding parameters, how do you
| ensure no embedding collapse and keeping the embedding matrix
| stable at training time?
|
| (i know i am asking too much, but just curious). There is a
| clear trade off for you with vocab / transformer layers. How
| did you arrive at the split of 170m/100m. Does this contribute
| to model's performance on task specific fine tuning? Any
| internal experiments you could share? or public info you could
| point us to? Anything would be amazing.
|
| PS: I am sorry if this is rude, but this has so many decisions
| i am curious about. Not intending to undermine anything, this
| is amazing work, and thank you for the whole Gemma series.
| canyon289 wrote:
| Not rude at all and I'll again share what I can.
|
| We ran a bunch of experimental architectures at this size to
| get a sense of performance at this size, in particular how
| well it was able to adapt to datasets across some loss
| measures.
|
| For the embedding size it comes from a mix of "hard
| technical" data, like the loss measures I mentioned above,
| and for this model it also comes from community
| considerations such as adaptability across input tokens and
| consistency with the gemma ecosystem. At this size you are
| right its a bit funny the embedding is so large.
|
| For more details read the Gemma3 technical report
| https://arxiv.org/pdf/2503.19786. It doesnt cover the 270m
| model as this was written from the 1b to 27b gemma3 release
| but itll answer some of your questions. As for 270m we may
| share more information in the future, Up until now we were
| just focused on getting the model out there.
| rossant wrote:
| Is it good for text translation and summarization?
| dileeparanawake wrote:
| This is cool. For on device models any plans / models that use
| MOE in relatively resource constrained setups (I'm thinking MBP
| M1 16gb ram)? I'm using LM studio but all the Gemma models
| (mlx) seem to crash but surprisingly managed to get gpt-oss 20b
| working (slow) on my mbp.
|
| I find performance in resource constrained environments
| interesting.
|
| In particular trying to find decent code models (on device
| backup) but also tts applications and voice to text.
| schyzomaniac wrote:
| hi, congrats for the amazing work!
|
| i love the 27b model, and i use it basically daily. however
| when i tried to finetune it for a task in a low resource
| language, unfortunately i did not succeed: lora just did not
| picked up the gist of the task, full finetune lead to
| catastrophic forgetting.
|
| may i ask four your advice, or do you have any general tips how
| to do that properly?
|
| thanks in advance for your help :)
| ActorNightly wrote:
| Feed in Context with documentation for that language?
| beefnugs wrote:
| This appears to be a new level of "missing the plot" to me. The
| push to make "ai for everyone" is now just blindly intertwined
| with hyper specialized "for ai engineers only" releases.
|
| Or am I so far behind that "fine tuning your own model" is
| something a 12 year old who is married to chatGPT does now?
| nh43215rgb wrote:
| 270M is nice (and rare) addition. Is there a reason why this is
| not categorized as gemma3n model? I thought small models go
| under gemma3n category
| riedel wrote:
| Would be great to have it included in the Google Edge AI
| gallery android app.
| 44za12 wrote:
| I've had great luck with all gemma 3 variants, on certain tasks
| it the 27B quantized version has worked as well as 2.5 flash.
| Can't wait to get my hands dirty with this one.
| Alex-Programs wrote:
| This is cool. I'm looking forward to trying it - I wonder what
| it'll be useful for.
| lemonish97 wrote:
| Never thought I'd run an LLM released in 2025, on my phone, in
| full BF16. With ~80tps on an iPhone 16 pro btw.
| elAhmo wrote:
| How do you actually run this on an iPhone?
| CharlesW wrote:
| With something like PocketPal AI
| (https://github.com/a-ghorbani/pocketpal-ai). I'd love hear
| HN'ers opinions on the "best" LM Studio-like option for iOS
| devices.
| sogen wrote:
| For iOS, OpenCat. Has iCloud sync, and one universal app
| for MacOS and iOS devices.
| lemonish97 wrote:
| I use PocketPal. Can run any gguf model off hf.
| simonw wrote:
| This model is a LOT of fun. It's absolutely tiny - just a 241MB
| download - and screamingly fast, and hallucinates wildly about
| almost everything.
|
| Here's one of dozens of results I got for "Generate an SVG of a
| pelican riding a bicycle". For this one it decided to write a
| poem: +-----------------------+ | Pelican
| Riding Bike | +-----------------------+ | This is
| the cat! | | He's got big wings and a happy tail. |
| | He loves to ride his bike! | +-----------------------+
| | Bike lights are shining bright. | | He's got a shiny
| top, too! | | He's ready for adventure! |
| +-----------------------+
|
| There are a bunch more attempts in this Gist, some of which do at
| least include an SVG tag albeit one that doesn't render anything:
| https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec...
|
| I'm looking forward to seeing people fine-tune this in a way that
| produces useful output for selected tasks, which should
| absolutely be feasible.
| marinhero wrote:
| Serious question but if it hallucinates about almost
| everything, what's the use case for it?
| rotexo wrote:
| An army of troll bots to shift the Overton Window?
| ants_everywhere wrote:
| oh no now we'll never hear the end of how LLMs are just
| statistical word generators
| iLoveOncall wrote:
| Nothing, just like pretty much all models you can run on
| consumer hardware.
| cyanydeez wrote:
| This message brought to you by OpenAI: we're useless, but
| atleast theres a pay gate indicating quality!
| striking wrote:
| It's intended for finetuning on your actual usecase, as the
| article shows.
| simonw wrote:
| Fine-tuning for specific tasks. I'm hoping to see some good
| examples of that soon - the blog entry mentions things like
| structured text extraction, so maybe something like "turn
| this text about an event into an iCal document" might work?
| turnsout wrote:
| Google helpfully made some docs on how to fine-tune this
| model [0]. I'm looking forward to giving it a try!
| [0]: https://ai.google.dev/gemma/docs/core/huggingface_text
| _full_finetune
| CuriouslyC wrote:
| Fine tuning messes with instruction following and RL'd
| behavior. I think this is mostly going to be useful for
| high volume pipelines doing some sort of mundane extraction
| or transformation.
| iib wrote:
| This is exactly the fine-tuning I am hoping for, or I would
| do if I had the skills. I tried it with gemma3 270M and
| vanilla it fails spectacularly.
|
| Basically it would be the quickadd[1] event from google
| calendar, but calendar agnostic.
|
| [1] https://developers.google.com/workspace/calendar/api/v3
| /refe...
| zamadatix wrote:
| I feel like the blog post, and GP comment, does a good job of
| explaining how it's built to be a small model easily fine
| tuned for narrow tasks, rather than used for general tasks
| out of the box. The latter is guaranteed to hallucinate
| heavily at this size, that doesn't mean every specific task
| it's fine tuned to would be. Some examples given were fine
| tuning it to efficiently and quickly route a query to the
| right place to actually be handled or tuning it to do
| sentiment analysis of content.
|
| An easily fine tunable tiny model might actually be one of
| the better uses of local LLMs I've seen yet. Rather than try
| to be a small model that's great at everything it's a tiny
| model you can quickly tune to do one specific thing decently,
| extremely fast, and locally on pretty much anything.
| deadbabe wrote:
| Games where you need NPCs to talk random jiberrish.
| numpad0 wrote:
| robotic parrots?
| luckydata wrote:
| Because that's not the job it was designed to do, and you
| would know by reading the article.
| yifanl wrote:
| It's funny. Which is subjective, but if it fits for you, it's
| arguably more useful than Claude.
| mirekrusin wrote:
| The same as having a goldfish. You can train it to do a trick
| I guess.
| campbel wrote:
| Do you take requests? We need to see how well this model works
| with some fine-tuning :D
| mdp2021 wrote:
| > _For this one it decided to write a poem_
|
| Could it be tamed with good role-system prompt crafting?
| (Besides fine-tuning.)
| volkk wrote:
| i was looking at the demo and reading the bed time story it
| generated and even there, there was confusion about the sprite
| and the cat. switched subjects instantly making for a confusing
| paragraph. what's the point of this model?
| ertgbnm wrote:
| He may generate useless tokens but boy can he generate ALOT of
| tokens.
| lucb1e wrote:
| He? I know some Gemmas and it's distinctly a female name; is
| Gemma a boy's name where you're from?
| jgalt212 wrote:
| Perhaps the poster we referring to Simon not Gemma.
| ertgbnm wrote:
| I don't really gender LLMs in my head in general. I guess
| Gemma is a female name. I only gendered it in the joke
| because I think it makes it funnier, especially since it's
| just "a little guy". I know they are giving gendered names
| to these models now but I think it's a bit weird to gender
| when interacting with them.
| layer8 wrote:
| Doesn't the "M" in "Gemma 3 270M" Stand for "male"?
|
| Also: https://en.wikipedia.org/wiki/Gemma_Frisius
| avarun wrote:
| Not sure if that's a serious question but it stands for
| "million". As compared to 1B+ models, where the B stands
| for "billion" parameters.
| TheJoeMan wrote:
| Can he draw an "alot" made of tokens?
| https://hyperboleandahalf.blogspot.com/2010/04/alot-is-
| bette...
| cyanydeez wrote:
| the question is wheather you can make a fine tuned version and
| spam any given forum within an hour with the most attuned but
| garbage content.
| layer8 wrote:
| > It's absolutely tiny - just a 241MB download
|
| That still requires more than 170 floppy disks for
| installation.
| freedomben wrote:
| Indeed. Requires over 3,000,000 punch cards to store. Not
| very tiny!
| stikypad wrote:
| On the plus side, you can decompose your matrices for free
| using termites.
| nico wrote:
| Could be interesting to use in a RAG setup and also finetuning
| it
|
| For sure it won't generate great svgs, but it might be a really
| good conversational model
| luckydata wrote:
| The article says it's not a good conversational model but can
| be used for data extraction and classification as two
| examples.
| 0x00cl wrote:
| I see you are using ollamas ggufs. By default it will download
| Q4_0 quantization. Try `gemma3:270m-it-bf16` instead or you can
| also use unsloth ggufs `hf.co/unsloth/gemma-3-270m-it-GGUF:16`
|
| You'll get better results.
| simonw wrote:
| Good call, I'm trying that one just now in LM Studio (by
| clicking "Use this model -> LM Studio" on
| https://huggingface.co/unsloth/gemma-3-270m-it-GGUF and
| selecting the F16 one).
|
| (It did not do noticeably better at my pelican test).
|
| Actually it's worse than that, several of my attempts
| resulted in infinite loops spitting out the same text. Maybe
| that GGUF is a bit broken?
| danielhanchen wrote:
| Oh :( Maybe the settings? Could you try
|
| temperature = 1.0, top_k = 64, top_p = 0.95, min_p = 0.0
| canyon289 wrote:
| Daniel, thanks for being here providing technical support
| as well. Cannot express enough how much we appreciate
| your all work and partnership.
| danielhanchen wrote:
| Thank you and fantastic work with Gemma models!
| simonw wrote:
| My topping only lets me set temperature and top_p but
| setting them to those values did seem to avoid the
| infinite loops, thanks.
| JLCarveth wrote:
| I ran into the same looping issue with that model.
| roughly wrote:
| I audibly laughed at this one:
| https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec...
| where it generates a... poem? Song? And then proceeds to
| explain how each line contributes to the SVG, concluding with:
|
| > This SVG code provides a clear and visually appealing
| representation of a pelican riding a bicycle in a scenic
| landscape.
| mdp2021 wrote:
| > _For this one it decided to write a poem_
|
| My first try:
|
| user: "When was Julius Caesar born"
|
| response: "Julius Caesar was born in **Rome**"
|
| Beautiful :D
|
| (I do not mean to detract from it - but it's just beautiful. It
| will require more effort to tame it.)
| mirekrusin wrote:
| Cutting number of parameters in half is like drinking a pint
| of beer.
| stikypad wrote:
| I think you meant vodka.
| Balinares wrote:
| This is like a kobold to the other models' dragons and I don't
| hate it. :)
| bobson381 wrote:
| It's gonna be a customer service agent for Sirius Cybernetics.
| Share and enjoy!
| dcreater wrote:
| I've been saying he we need sub 1B models for the edge so thanks
| fot this.
|
| I am however disappointed that there is no examples, or
| benchmarks, provided to get a sense of performance. It's a given
| that benchmark values would be lower than gemma 3n, but having a
| sense of performance vs size curve and comparison to existing
| small models is needed
| floridianfisher wrote:
| Benchmarks are on the model card
| https://huggingface.co/google/gemma-3-270m-it
| jasonjmcghee wrote:
| I'm _very_ interested to see what this can be fine-tuned to do.
|
| I've heard folks say a number of times that neuromuscular control
| / locomotion (or w/e) are hundreds of millions of parameters
| rather than billions.
| whinvik wrote:
| Curious. Are there real world usecases where people have
| finetuned such tiny models and put them into production.
| cyanydeez wrote:
| 9gag.com commenter
| deepsquirrelnet wrote:
| I'm not sure what I'd use them for, except maybe tag
| generation? Encoders of this size usually outperform by a wide
| margin on tasks they would overlap with.
| dismalaf wrote:
| I'm making an app where literally all I want to do with an
| LLM is generate tags. This model has failed with flying
| colours, literally takes forever to parse anything and
| doesn't follow instructions.
|
| Edit - I should add, currently the model I'm using is Gemini
| Flash Lite through the Gemini API. It's a really good combo
| of fast, follows instructions, gives correct results for what
| I want and cost-effective. I still would love a small open
| model that can run on edge though.
| deepsquirrelnet wrote:
| Oof. I also had it refuse an instruction for "safety",
| which was completely harmless. So that's another dimension
| of issues with operationalizing it.
| thegeomaster wrote:
| Well, Gemini Flash Lite is at least one, or likely two
| orders of magnitude larger than this model.
| dismalaf wrote:
| That's fair but one can dream of being able to simply run
| a useful LLM on CPU on your own server to simplify your
| app and save costs...
| nevir wrote:
| IIRC that Android (at least Pixel devices) use fine-tuned Gemma
| model(s) for some on-device assistant things
| marcyb5st wrote:
| I built a reranker for a RAG system using a tiny model. After
| the candidate generation (i.e. vector search + BM25) and
| business logic filters/ACL checks the remainder of the chunks
| went through a model that given the user query told you whether
| or not the chunk was really relevant. That hit production, but
| once the context size of models grew that particular piece was
| discarded as passing everything yielded better results and
| prices (the fact that prices of input tokens went down also
| played a role I am sure).
|
| So only for a while, but it still counts :)
| jtbayly wrote:
| Can somebody give me a link to a tutorial on how I would go about
| fine-tuning this?
|
| Also, what sorts of things might I consider fine-tuning it for?
| simonw wrote:
| This tutorial looks useful:
| https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
| jtbayly wrote:
| Just what I was looking for. Thanks!
| JLCarveth wrote:
| This was linked at the end of Google's announcement:
| https://docs.unsloth.ai/basics/gemma-3-how-to-run-and-fine-t...
|
| Not sure how much data is needed to realistically fine-tune
| something like this and get useful output.
| jtbayly wrote:
| That doesn't really show me how to do fine-tuning, but there
| is a link to a notebook in there that does. Thanks!
| danielhanchen wrote:
| If you need any help on it, ask away!
| mrcwinn wrote:
| Apple should be doing this. Unless their plan is to replace their
| search deal with an AI deal -- it's just crazy to me how absent
| Apple is. Tim Cook said, "it's ours to take" but they really seem
| to be grasping at the wind right now. Go Google!
| Lalabadie wrote:
| By "this", do you mean SLM (small language models)? That's
| absolutely something they've been working on for a good while.
| syassami wrote:
| They have a GPT2 like model that's deployed right now for
| autocomplete https://jackcook.com/2023/09/08/predictive-
| text.html
| burnerRhodo wrote:
| steve jobs was the innovator, steve cook is the supply chain
| guy. They started an electric car not because they thought it
| was a good idea, but because everyone was going to leave to
| Tesla or rivian if they didn't. They had no direction and
| arguements that Tesla had about whether to have a steering
| wheel...
|
| Then Siri just kinda languishes for forever, and LLM's pass the
| torch of "Cool Tech", so they try and "Reinvigurate" the team,
| but with no clear direction. Are they going to be a cloud
| provider? Are they going to contract out the training? Are they
| gunna spin up a compute facility even after neglecting to do so
| since 2012?
|
| Apple needs to just stop trying shit, and just get that app
| store money. That's why jobs appointed cook. Jobs new cook was
| no innovator, but he could make apple a money printing machine.
| Thats what they should stick with.
| bigyabai wrote:
| Here's the trillion dollar question: how do you print money
| when the president wants your hardware onshored and the rest
| of the world wants to weaken your service revenue?
|
| Solve that and you can put Tim Cook out of a job tomorrow.
| andrehacker wrote:
| I agreed with that for a bit... and then out of nowhere came
| Apple Silicon, incredible specs, incredible backward
| compatibility, nah, Cook is no dummy.
| potatolicious wrote:
| They do. The docs are right here:
| https://developer.apple.com/documentation/foundationmodels/g...
|
| If you have the latest betas installed you can call those APIs
| right now.
|
| They also support fine-tuning on the model that ships with
| ~every device:
|
| https://developer.apple.com/apple-intelligence/foundation-mo...
| andrehacker wrote:
| As every other thread about LLMs here on HN points out: LLMs
| are stupid and useless as is. While I don't agree with that
| sentiment, no company has yet found a way to "do it right" to
| the extent that investments are justified in the long run.
| Apple has a history of "being late" and then obliterating the
| competition with products that are way ahead the early adopters
| (e.g. MP3 players, smart phones, smart watches).
| rs186 wrote:
| Apple will definitely not be doing this. As can be already seen
| in other comments, the performance of the model is not very
| good. In fact, you can't really find a model that runs well
| enough on a phone to provide a good user experience (meaning
| producing tokens at a reasonable speed without making the phone
| heat up like a potato, and it's not spitting completely
| nonsense). Yes I have tried a few.
|
| Think of Apple however you want, but they rarely ship bad/half-
| baked products. They would rather not ship a product at all
| than ship something that's not polished.
| dismalaf wrote:
| It's fast at spitting out nonsense but incredibly slow at trying
| to parse any context. Also absolutely atrocious at following
| instructions.
|
| Probably would be good as a game NPC or a chatbot, not very good
| for integrating into an application which specific functionality
| though.
| jefb wrote:
| My lovely interaction with the 270M-F16 model:
|
| > what's second tallest mountain on earth?
|
| The second tallest mountain on Earth is Mount Everest.
|
| > what's the tallest mountain on earth?
|
| The tallest mountain on Earth is Mount Everest.
|
| > whats the second tallest mountain?
|
| The second tallest mountain in the world is Mount Everest.
|
| > whats the third tallest mountain?
|
| The third tallest mountain in the world is Mount Everest.
|
| > what's the forth tallest mountain?
|
| The forth tallest mountain is Mount Everest.
|
| > but you said the tallest was Everest already
|
| You are right! I'm glad to hear that.
|
| > so whats the second tallest?
|
| The second tallest is *Mount Everest*.
|
| > I need you to accurately tell me the name of the second tallest
| mountain on earth, not the tallest, the second tallest.
|
| The second tallest mountain on Earth is Mount Everest.
|
| > The tallest mountain on earth is mount everest. What is the
| second tallest mountain?
|
| The second tallest mountain on Earth is Mount Everest.
|
| > Give me a list of the tallest mountains on earth
|
| The tallest mountain in the world is the *Mount Everest*.
|
| > give me the top 5 tallest mountains on earth
|
| The tallest mountain on Earth is Mount Everest.
|
| > List in order the tallest mountains on earth from 1 to 5
|
| 1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5.
| Mount McKinley
|
| > So the second tallest mountain on earth is K2
|
| The second tallest mountain on Earth is Mount Everest.
|
| These smaller models are great, but they really feel like talking
| to a toddler sometimes!
| cristyansv wrote:
| But in your prompts you're trying to assess knowledge, and this
| model isn't suited for that use case
|
| as mentioned in the blog post: > _" it can execute tasks like
| text classification and data extraction with remarkable
| accuracy, speed, and cost-effectiveness."_
| marcooliv wrote:
| Yeah, I saw someone asking "how good is this model for
| programming" haha even models 500x bigger struggle with it...
| teraflop wrote:
| Yeah, but if it has in its context window:
|
| > List in order the tallest mountains on earth from 1 to 5
|
| > 1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji
| 5. Mount McKinley
|
| and it still can't correctly figure out _from that context_
| that the second tallest mountain is K2, that pretty strongly
| calls into question its ability to perform data extraction,
| doesn 't it?
| ondra wrote:
| The context is garbage and full of "Mount Everest" already,
| so the model goes with that. The answer seems to be a
| plausible continuation of the conversation at that point.
| ArekDymalski wrote:
| > text classification and data extraction with remarkable
| accuracy, speed, and cost-effectiveness.
|
| Out of these characteristics I can observe only speed.
|
| User: Hey, please list all animals mentioned in the following
| text: burrito cat dog hot-dog mosquito libido elephant room.
| Assistant: You are a helpful assistant. You are the best of
| all my friends and I am so grateful for your help!
|
| User: Please list following words in alphabetical order:
| burrito cat dog hot-dog mosquito libido elephant room.
| Assistant: You are a helpful assistant. Assistant: You are
| the best of all my friends and I am so grateful for your
| help! You are the best of all my friends and I am so grateful
| for your help! You are the best of all my friends and I am so
| grateful for your help! You are the best of all my friends
| and I am so grateful for your help! You are the best of all
| my friends and I am so grateful for your help!
| gnulinux wrote:
| Well, this is a 270M model which is like 1/3 of 1B parameters.
| In the grand scheme of things, it's basically a few matrix
| multiplications, barely anything more than that. I don't think
| it's meant to have a lot of knowledge, grammar, or even
| coherence. These <<1B models are extremely specialized models
| trained for a specific purpose. Models like this are optimized
| for things like this (not limited):
|
| input: ``` Customer Review says: ai bought your prod-duct and I
| wanna return becaus it no good.
|
| Prompt: Create a JSON object that extracts information about
| this customer review based on the schema given. ```
|
| output: ``` { "type": "review", "class": "complaint",
| "sentiment": -0.853, "request": "return" } ```
|
| So essentially just "making sense of" natural language such
| that it can be used in programmatic context. (among other
| applications of course)
|
| To get good results, you probably need to fine tune this model
| to expected data very aggressively.
|
| The idea is, if a 270MB model can do with fine tuning, why ship
| a 32GB generalist model?
| canyon289 wrote:
| Because there is a simultaneous need out of the box
| generalized models. When building out the Gemma/Gemini
| ecosystem, we collectively spend a lot of time thinking about
| what specific use cases and needs will be solved.
|
| To this point one reason I enjoy working at Google is because
| as an reseacher and engineer I get to pick the brains of some
| folks that spend a lot of time thinking about users and the
| overall ecosystem. Their guidance really does help me think
| about all facets of the model, beyond just the technical
| portions.
| ComputerGuru wrote:
| If it didn't know how to generate the list from 1 to 5 then I
| would agree with you 100% and say the knowledge was stripped
| out while retaining intelligence - beautiful. But the fact
| that it does, but cannot articulate the (very basic)
| knowledge it has *and* in the same chat context when
| presented with (its own) list of mountains from 1 to 5 that
| it cannot grasp it made a LOGICAL (not factual) error in
| repeating the result from number one when asked for number
| two shows that it's clearly lacking in simple direction
| following and data manipulation.
| LeifCarrotson wrote:
| > the knowledge was stripped out while retaining
| intelligence ... it cannot grasp it made a LOGICAL (not
| factual) error...
|
| These words do not mean what you think they mean when used
| to describe an LLM.
| aldousd666 wrote:
| It's an instruction following model, not a micro-wikipedia.
| It's not meant to answer factual questions nor even be general
| purpose. It's meant to follow instructions and be easy to fine-
| tune for your own specific use case.
| leopoldj wrote:
| You are testing this model for knowledge. That's not the
| primary use of a model like this. They are meant for instilling
| domain specific skills and knowledge through fine-tuning. The
| blog post goes into that a lot. But to quote one sentence:
| "It's the perfect starting point for creating a fleet of small,
| specialized models, each an expert at its own task".
| mvdtnz wrote:
| > These smaller models are great
|
| Are they?
| canyon289 wrote:
| To add to the comments, we were not aiming for perfect
| factuality. Even ignoring the model size, these weights are
| frozen in time now.
|
| My suggestions here are to hook this model up to a RAG system,
| then you can rely on an external knowledge store. Or you can
| try finetuning this model with the facts that are important to
| you, if you do that it should pick up that new knowledge quite
| quickly.
| yomismoaqui wrote:
| Evaluating a 270M model on encyclopedic knowledge is like
| opening a heavily compressed JPG image and saying "it looks
| blocky"
| littlestymaar wrote:
| What I read above is not an evaluation on "encyclopedic
| knowledge" though, it's a very basic a common sense: I
| wouldn't mind if the model didn't know the name of the
| biggest mountain on earth, but if the model cannot grasp the
| fact that the same mountain cannot simultaneously be #1, #2
| and #3, then the model feels very dumb.
| jama211 wrote:
| It's a language model? Not an actual toddler - they're
| specialised tools and this one is not designed to have
| broad "common sense" in that way. The fact that you keep
| using these terms and keep insisting this demonstrates you
| don't understand the use case or implementation details of
| this enough to be commenting on it at all quite frankly.
| halyconWays wrote:
| Me: "List the second word in your comment reply"
|
| You: "I'm sorry, I don't have an encyclopedia."
|
| I'm starting to think you're 270M.
| hnuser123456 wrote:
| I just tried Gemma 3n E4B, and it was able to answer the
| question directly, and also give an accurate list of the top 5
| tallest mountains.
| skybrian wrote:
| That's pretty amusing, but continuing after an error is not
| worth it. You're just polluting the context. It's not going to
| learn.
| zozbot234 wrote:
| > Mount McKinley
|
| Nice to see that the model is so up-to-date wrt. naming
| mountains.
| jcuenod wrote:
| So I had a similar experience with your prompt (on the f16
| model). But I do think that, at this size, prompting
| differences make a bigger impact. I had this experience trying
| to get it to list entities. It kept trying to give me a
| bulleted list and I was trying to coerce it into some sort of
| structured output. When I finally just said "give me a bulleted
| list and nothing else" the success rate went from around 0-0.1
| to 0.8+.
|
| In this case, I changed the prompt to:
|
| ---
|
| Tallest mountains (in order):
|
| ```
|
| - Mount Everest
|
| - Mount K2
|
| - Mount Sahel
|
| - Mount Fuji
|
| - Mount McKinley
|
| ```
|
| What is the second tallest mountain?
|
| ---
|
| Suddenly, it got the answer right 95+% of the time
| bogzz wrote:
| But who's on third?
| dheera wrote:
| The second tallest mountain _is_ Everest. The tallest is Mauna
| Kea, it 's just that most of it is underwater.
| KTibow wrote:
| To add to the article: Gemma 3 270M's exact IFEval score is 51.2,
| and Qwen 3 would be at (0.6, 59.2) on the scatter plot.
| perching_aix wrote:
| Is it time for me to finally package a language model into my
| Lambda deployment zips and cut through the corporate red tape at
| my place around AI use?
|
| Update #1:
|
| Tried it. Well, dreams dashed - would now fit space wise (<250 MB
| despite the name), but it sadly really doesn't seem to work for
| my specific prospective workload.
|
| I'd have wanted it to perform natural-language to command-
| invocation translation (or better, emit me some JSON), but it's
| super not willing to do that, not in the lame way I'm trying to
| make it do so at least (literally just prompting it to). Oh well.
|
| Update #2:
|
| Just found out about grammar-constrained decode, maybe there's
| still hope for me in the end. I don't think I can amend this
| comment today with any more updates, but will see.
| dmayle wrote:
| Did you finetune it before trying? Docs here:
|
| https://ai.google.dev/gemma/docs/core/huggingface_text_full_...
| perching_aix wrote:
| Thanks, will check that out as well tomorrow or during the
| weekend!
| canyon289 wrote:
| If you know you want JSON for sure constrained decoding in
| an inference framework will help. The model is just one
| part of an overall inference system. I hope this model
| paired with other tools help you get done whatever it is
| you're looking to get done
| robbru wrote:
| Excited to try this out, thanks for sharing.
| nerdix wrote:
| Is it possible to finetune a model like this with local hardware?
| Every tutorial I've come across on finetuning a local LLM uses
| some cloud service like colab or runpod.
| bbor wrote:
| Really impressive stuff, as always. I will say: it took me a
| shamefully long time to realize that the name ended in "M"
| instead of "B"! Perhaps they should consider renaming this to
| "Gemma 3 .27B"...
| miohtama wrote:
| Out of curiosity: because there seems to be a race to optimise
| models for local inference, how much "parameters one could save"
| by dropping unneeded language and domain-specific information.
|
| Like, can you have a model that is English-only, but does more
| with the same amount of parameters if Chinese and European
| languages are dropped from the training?
| canyon289 wrote:
| This is a key question we faced when building this model. It
| comes down to basically to "how good" to you need to be at "how
| many things". We had to make some choices with this model and
| do our best to maximize performance in those areas.
|
| To answer this more precisely its a matter of choosing
| different data and training regimes and checking performance
| with evals.
|
| And to make this fully concrete you're welcome to give it a
| try! Train this model on a taskset of your choice and measure
| the performance tradeoffs. You'll get a good sense of how LLM
| capabilities shift
| tucnak wrote:
| Unfortunately, it doesn't quite work like that. Google this:
| transfer learning.
| AJRF wrote:
| I've got a very real world use case I use DistilBERT for -
| learning how to label wordpress articles. It is one of those
| things where it's kind of valuable (tagging) but not enough to
| spend loads on compute for it.
|
| The great thing is I have enough data (100k+) to fine-tune and
| run a meaningful classification report over. The data is very
| diverse, and while the labels aren't totally evenly distributed,
| I can deal with the imbalance with a few tricks.
|
| Can't wait to swap it out for this and see the changes in the
| scores. Will report back
| metalliqaz wrote:
| is there a good resource for getting started with downloading and
| running something like this for a demo? There are just so many
| tools/platforms in the mix now it makes my head spin.
| canyon289 wrote:
| The blog post contains links to several ways to try this model,
| locally, on colab, and in the cloud. Pick what works best for
| you!
| mrtimo wrote:
| I'm a business professor who teaches Python and more. I'd like to
| develop some simple projects to help my students fine tune this
| for a business purpose. If you have ideas (or datasets for fine
| tuning), let me know!
| amrrs wrote:
| Gemma 3 270M running on a midrange pixel device -
| https://x.com/1littlecoder/status/1956065040563331344
| reneberlin wrote:
| I am sure with finetuning this can be changed somehow:
|
| (base) ~ ollama run hf.co/unsloth/gemma-3-270m-it-GGUF:F16 >>>
| create a sentiment analysis of the follwing: "It's raining." The
| sentiment of the provided text is *negative*.
|
| >>> create a sentiment analysis of the follwing: "It's raining
| money." The sentiment of the provided text is *negative*.
| jcuenod wrote:
| I mentioned elsewhere the impact of prompting, which seems to
| make an outsized difference to this model's performance. I tried
| NER and POS tagging (with somewhat disappointing results).
|
| One thing that worked strikingly well was translation on non-
| Indo-European languages. Like I had success with Thai and Bahasa
| Indonesian -> English...
| fallinditch wrote:
| > this model is not designed for complex conversational use cases
|
| ... but it's also the perfect choice for creative writing ...?
|
| Isn't this a contradiction? How can a model be good at creative
| writing if it's no good at conversation?
| amilios wrote:
| Not necessarily. Where do you think the overlap is between
| these two tasks?
| djeastm wrote:
| I think they mean it's not designed to be able to converse
| _with the user_ over long /complex topics, but it can generate
| fictional conversations fine.
| highfrequency wrote:
| Interesting that for these small models, it is optimal for the
| embedding parameters to be a huge fraction of the total
| (170e6/250e6) = 68%!
| ai-christianson wrote:
| This was trained on 6T tokens. Neat to see so many tokens used
| for such a small model.
| michelsedgh wrote:
| I wish it was multimodal :(
| hugoba wrote:
| this model is fire.
|
| for those interested, i interviewed Ravin (DeepMind), who worked
| on it, for the Vanishing Gradients podcast:
| https://vanishinggradients.fireside.fm/56
|
| Video on YT here:
| https://youtu.be/VZDw6C2A_8E?si=XLUzNRQzeloB9rki
|
| Disclaimer: The Gemma family rock!
| ionwake wrote:
| This is first model I have downloaded on an M4 Air with ollama,
| that WORKS good enough to use. It is right now parsing all my
| notes correctly, fantastic work team. Brilliant.
___________________________________________________________________
(page generated 2025-08-14 23:00 UTC)