hngopher.com

       [HN Gopher] Gemma: New Open Models
       ___________________________________________________________________
        
       Gemma: New Open Models
        
       Author : meetpateltech
       Score  : 880 points
       Date   : 2024-02-21 13:03 UTC (9 hours ago)
        
 (HTM) web link (blog.google)
 (TXT) w3m dump (blog.google)
        
       | smcn wrote:
       | There are some pretty impressive benchmarks on
       | https://ai.google.dev/gemma. Even the 2b model looks fairly not
       | awful?
       | 
       | I guess my weekend is going to be spent exploring this.
        
       | alekandreev wrote:
       | Hello on behalf of the Gemma team! We are really excited to
       | answer any questions you may have about our models.
       | 
       | Opinions are our own and not of Google DeepMind.
        
         | declaredapple wrote:
         | Congrats on the launch and thanks for the contribution! This
         | looks like it's on-par or better compared to mistral 7B 0.1 or
         | is that 0.2?
         | 
         | Are there plans for MoE or 70B models?
        
           | kathleenfromgdm wrote:
           | Great question - we compare to the Mistral 7B 0.1 pretrained
           | models (since there were no pretrained checkpoint updates in
           | 0.2) and the Mistral 7B 0.2 instruction-tuned models in the
           | technical report here: https://goo.gle/GemmaReport
        
         | zitterbewegung wrote:
         | Do you have a plan of releasing higher parameter models?
        
           | alekandreev wrote:
           | We have many great things in research and development phases,
           | so stay tuned. I'm hopeful we can share more in the coming
           | weeks and month!
        
             | brucethemoose2 wrote:
             | That is awesome!
             | 
             | I hope y'all consider longer context models as well.
             | 
             | Also, are ya'll looking alternative architectures like
             | Mamba? Being "first" with a large Mamba model would cement
             | your architectural choices/framework support like llama did
             | for Meta.
        
         | neximo64 wrote:
         | How are these performing so well compared to Llama 2, are there
         | any documents on the architecture and differences, is it MoE?
         | 
         | Also note some of the links on the blog post don't work, e.g
         | debugging tool.
        
           | kathleenfromgdm wrote:
           | We've documented the architecture (including key differences)
           | in our technical report here (https://goo.gle/GemmaReport),
           | and you can see the architecture implementation in our Git
           | Repo (https://github.com/google-deepmind/gemma).
        
         | h1t35h wrote:
         | It seems you have exposed the internal debugging tool link in
         | the blog post. You may want to do something about it.
        
           | trisfromgoogle wrote:
           | Ah, I see -- the link is wrong, thank you for flagging!
           | Fixing now.
        
             | neximo64 wrote:
             | The link to the debugging tool is an internal one, no one
             | outside Google can access it
        
             | h1t35h wrote:
             | The blog post shares the link for debugging tool as
             | https://*.*.corp.google.com/codelabs/responsible-ai/lit-
             | gemm...
             | 
             | .corp and the login redirect makes me believe it was
             | supposed to be an internal link
        
               | littlestymaar wrote:
               | Same for the "safety classifier"
        
               | barrkel wrote:
               | https://codelabs.developers.google.com/codelabs/responsib
               | le-...
        
             | wrexx0r wrote:
             | The link in the Debugging section redirects to a Google SSO
             | login page
        
         | pama wrote:
         | Will these soon be available on lmsys for human comparison
         | against other models? Can they run with llama.cpp?
        
           | ErneX wrote:
           | Yes to llama.cpp
           | 
           | https://twitter.com/ggerganov/status/1760293079313973408
        
             | sbarre wrote:
             | I came here wondering if these models are "open" in the
             | sense that they'll show up on sites like Ollama where you
             | can download and run them locally.
             | 
             | Am I correct to conclude that this means they eventually
             | will?
             | 
             | It's unclear to me from Google's docs exactly what "open"
             | means for Gemma
        
               | benpacker wrote:
               | Yes - they are open weights and open inference code,
               | which means they can be integrated into Ollama.
               | 
               | They are not "open training" (either in the training code
               | or training data sense), so they are not reproducible,
               | which some have suggested ought to be a component of the
               | definition of open models.
        
               | OJFord wrote:
               | It really should shouldn't it? I'm quite ML-naive, but
               | surely providing the model without 'training code or
               | training data' is just like providing a self-hostable
               | binary without the source code? Nobody calls that open
               | source, it's not even source available.
        
               | sunnybeetroot wrote:
               | That's why they're called open as in free to use how you
               | wish, not open source where the source of the training is
               | also provided.
        
               | OJFord wrote:
               | But my point is there's no analogy for that that we call
               | open? It's like self-hostable, or free (as in beer).
        
               | sunnybeetroot wrote:
               | That's a fair comment, maybe free-to-use is more
               | appropriate.
        
               | idiotsecant wrote:
               | Man, people will find anything to complain about.
        
               | OJFord wrote:
               | I'm not complaining, I'm unlikely ever to use it
               | (regardless of how open or not it is) so it doesn't
               | really matter to me, just surprised to learn what people
               | mean by 'open' in this context.
        
               | michaelt wrote:
               | It is widely believed (and in some cases acknowledged)
               | that a lot of models are trained on copyrighted data
               | scraped from the web. In some cases, even scrapes of
               | ebook piracy websites - google 'books3' to learn more.
               | 
               | Some companies (such as those working on AI) believe this
               | is legal, others (such as the copyright holders to those
               | books) believe it isn't.
               | 
               | In any case, IMHO it's unlikely any cutting edge models
               | will be offering us their training data any time soon.
        
               | SushiHippie wrote:
               | https://huggingface.co/google/gemma-7b-it/tree/main
               | 
               | yes, similar to the llama models, you'll also need to
               | accept the license to download them officially. But the
               | llama models have been unofficially downloadable without
               | accepting the license for quite a while, so it's probably
               | just a matter of time.
        
         | artninja1988 wrote:
         | I find the snyde remarks around open source in the paper and
         | announcement rather off putting.
         | 
         | As the ecosystem evolves, we urge the corporate AI community to
         | move beyond demanding to be taken seriously as a player in open
         | source for models that are not actually open, and avoid
         | preaching with a PR statement that can be interpreted as
         | uniformed at best or malicious at worst.
        
           | silentsanctuary wrote:
           | Which remarks are you referring to?
        
             | artninja1988 wrote:
             | The synde remarks at metas llama license that doesn't allow
             | companies with 700 million monthly active users to use it,
             | while this model also doesn't have a really 'open' license
             | itself and also this paragraph:
             | 
             | >As the ecosystem evolves, we urge the wider AI community
             | to move beyond simplistic 'open vs. closed' debates, and
             | avoid either exaggerating or minimising potential harms, as
             | we believe a nuanced, collaborative approach to risks and
             | benefits is essential. At Google DeepMind we're committed
             | to developing high-quality evaluations and invite the
             | community to join us in this effort for a deeper
             | understanding of AI systems.
        
               | tomComb wrote:
               | Well, given that that restriction added to the meta-llama
               | license is aimed at Google, is petty, and goes against
               | open source norms, I think it's reasonable that they
               | should feel this way about it.
        
               | lordswork wrote:
               | How is this a snide remark? It's factual and prevented
               | their team from benchmarking against Llama 2.
        
               | trisfromgoogle wrote:
               | Quick question -- can you tell me where you got that
               | quote? It's not in the main blog or any of the launch
               | communications that I can see.
        
               | artninja1988 wrote:
               | The quote is from the technical report
               | 
               | https://storage.googleapis.com/deepmind-
               | media/gemma/gemma-re...
        
           | trisfromgoogle wrote:
           | It would be great to understand what you mean by this -- we
           | have a deep love for open source and the open developer
           | ecosystem. Our open source team also released a blog today
           | describing the rationale and approach for open models and
           | continuing AI releases in the open ecosystem:
           | 
           | https://opensource.googleblog.com/2024/02/building-open-
           | mode...
           | 
           | Thoughts and feedback welcome, as always.
        
             | artninja1988 wrote:
             | The statement on you not being able to use LLaMA 2 to
             | benchmark is also false and highly misleading see https://x
             | .com/BlancheMinerva/status/1760302091166241163?s=20
        
               | lordswork wrote:
               | If, on the Llama 2 version release date, the monthly
               | active users [...] is greater than 700 million monthly
               | active users [...] you are not authorized to exercise any
               | of the rights under this Agreement
               | 
               | I would guess this is Google being careful to not be
               | burned by this lame clause in the Llama 2 license.
        
             | mrob wrote:
             | If you truly love Open Source, you should update the the
             | language you use to describe your models so it doesn't
             | mislead people into thinking it has something to do with
             | Open Source.
             | 
             | Despite being called "Open", the Gemma weights are released
             | under a license that is incompatible with the Open Source
             | Definition. It has more in common with Source-Available
             | Software, and as such it should be called a "Weights-
             | Available Model".
        
           | jppittma wrote:
           | Working at google is like this, where no matter how much you
           | try to do the right thing you're always under attack.
        
         | tosh wrote:
         | Are there any plans for releasing the datasets used?
        
           | alekandreev wrote:
           | This would be really interesting in my opinion, but we are
           | not releasing datasets at this time. See the C4 dataset for
           | an earlier open dataset from Google.
        
         | sbarre wrote:
         | Can the Gemma models be downloaded to run locally, like open-
         | source models Llama2, Mistral, etc ?
         | 
         | Or is your definition of "open" different?
        
           | kathleenfromgdm wrote:
           | Yes, you can get started downloading the model and running
           | inference on Kaggle:
           | https://www.kaggle.com/models/google/gemma ; for a full list
           | of ways to interact with the model, you can check out
           | https://ai.google.dev/gemma.
        
             | dartharva wrote:
             | Can we have llamafile releases as well?
             | 
             | https://github.com/Mozilla-Ocho/llamafile
        
             | syntaxing wrote:
             | A small typo in your model link that breaks it. There's an
             | extra ; on the end.
        
               | kathleenfromgdm wrote:
               | Corrected - thanks :)
        
           | Kostic wrote:
           | It should be possible to run it via llama.cpp[0] now.
           | 
           | [0] https://github.com/ggerganov/llama.cpp/pull/5631
        
             | nerdix wrote:
             | Amazing how quickly this happened.
        
           | tomp wrote:
           | Their definition of "open" is "not open", i.e. you're only
           | allowed to use Gemma in "non-harmful" way.
           | 
           | We all know that Google thinks that saying that 1800s English
           | kings were _white_ is  "harmful".
        
             | wantsanagent wrote:
             | Not sure why you're getting downvoted. I would have thought
             | HN of all places would recognize the power and value of OSI
             | licensing and the danger of the proliferation of these
             | source available but definitely not Open Source licenses.
        
             | hackerlight wrote:
             | > We all know that Google thinks that saying that 1800s
             | English kings were white is "harmful".
             | 
             | If you know how to make "1800s english kings" show up as
             | white 100% of the time without also making "kings" show up
             | as white 100% of the time, maybe you should apply to
             | Google? Clearly you must have advanced knowledge on how to
             | perfectly remove bias from training distributions if you
             | casually throw stones like this.
        
               | trackflak wrote:
               | Tell me you take this seriously: https://twitter.com/napo
               | leon21st/status/1760116228746805272
               | 
               | It has no problem with other cultures and ethnicities,
               | yet somehow white or Japanese just throws everything off?
               | 
               | I suppose 'bias' is the new word for "basic historic
               | accuracy". I can get curious about other peoples without
               | forcibly promoting them at the expense of my own Western
               | and British people and culture. This 'anti bias' keyword
               | injection is a laughably bad, in your face solution to a
               | non-issue.
               | 
               | I lament the day 'anti-bias' AI this terrible is used to
               | make real world decisions. At least we now know we can't
               | trust such a model because it has already been so
               | evidently crippled by its makers.
        
           | austinvhuang wrote:
           | Yes models can be downloaded locally. In addition to the
           | python NN frameworks and ggml as options, we also implemented
           | a standalone C++ implementation that you can run locally at
           | https://github.com/google/gemma.cpp
        
           | mrob wrote:
           | Mistral weights are released under an Apache 2.0 license, but
           | Llama 2 weights are released under a proprietary license that
           | prohibits use by large organizations and imposes usage
           | restrictions, violating terms 5 and 6 the Open Source
           | Definition[0]. Even if you accept that a model with a
           | proprietary training dataset and proprietary training code
           | can be considered "open source", there's no way Llama 2
           | qualifies.
           | 
           | For consistency with existing definitions[1], Llama 2 should
           | be labeled a "weights available" model.
           | 
           | [0] https://en.wikipedia.org/wiki/The_Open_Source_Definition
           | 
           | [1] https://en.wikipedia.org/wiki/Source-available_software
        
         | vorticalbox wrote:
         | are there plans to release an official GGUF version to use with
         | llama.ccp?
        
           | espadrine wrote:
           | It is already part of the release on Huggingface: https://hug
           | gingface.co/google/gemma-7b/blob/main/gemma-7b.gg...
           | 
           | It is a pretty clean release! I had some 500 issues with
           | Kaggle validating my license approval, so you might too, but
           | after a few attempts I could access the model.
        
             | vorticalbox wrote:
             | I didn't see this when searching thanks
        
         | sqreept wrote:
         | What are the supported languages of these models?
        
           | alekandreev wrote:
           | This v1 model is focused on English support, but you may find
           | some multilingual capabilities.
        
         | lnyan wrote:
         | Will there be Gemma-vision models or multimodal Gemma models?
        
           | Jayakumark wrote:
           | Have the same question.
        
         | CuriouslyC wrote:
         | It's cool that you guys are able to release open stuff, that
         | must be a nice change from the modus operandi at goog. I'll
         | have to double check but it looks like phi-2 beats your
         | performance in some cases while being smaller, I'm guessing the
         | value proposition of these models is being small and good while
         | also having more knowledge baked in?
        
         | turnsout wrote:
         | What is the license? I couldn't find it on the 1P site or
         | Kaggle.
        
           | trisfromgoogle wrote:
           | You can find the terms on our website, ai.google.dev/gemma:
           | 
           | https://ai.google.dev/gemma/terms
        
             | spiantino wrote:
             | out of curiosity, why is this a "terms" and not a license?
             | I'm used to reading and understanding the software as
             | coming with a license to use it. Do the terms give us
             | license to use this explicitly?
        
               | turnsout wrote:
               | They do, but unlike a known license, these terms are
               | custom and non-standard. Which means I would guide my
               | commercial clients away from this particular model.
        
         | audessuscest wrote:
         | Does this model also thinks german were black 200 years ago ?
         | Or is afraid to answer basic stuff ? because if this is the
         | case no one will care about that model.
        
           | freedomben wrote:
           | I don't know anything about these twitter accounts so I don't
           | know how credible they are, but here are some examples for
           | your downvoters that I'm guessing just think you're just
           | trolling or grossly exaggerating:
           | 
           | https://twitter.com/aginnt/status/1760159436323123632
           | 
           | https://twitter.com/Black_Pilled/status/1760198299443966382
        
             | robswc wrote:
             | Yea. Just ask it anything about historical people/cultures
             | and it will seemingly lobotomize itself.
             | 
             | I asked it about early Japan and it talked about how
             | European women used Katanas and how Native Americans rode
             | across the grassy plains carrying traditional Japanese
             | weapons. Pure made up nonsense that not even primitive
             | models would get wrong. Not sure what they did to it. I
             | asked it why it assumed Native Americans were in Japan in
             | the 1100s and it said:
             | 
             | > I assumed [...] various ethnicities, including Indigenous
             | American, due to the diversity present in Japan throughout
             | history. However, this overlooked [...] I focused on
             | providing diverse representations without adequately
             | considering the specific historical context.
             | 
             | How am I supposed to take this seriously? Especially on
             | topics I'm unfamiliar with?
        
               | trackflak wrote:
               | From one of the Twitter threads linked above:
               | 
               | > they insert random keyword in the prompts randomly to
               | counter bias, that got revealed with something else I
               | think. Had T shirts written with "diverse" on it as
               | artifact
               | 
               | This was exposed as being the case with OpenAI's DALL-E
               | as well - someone had typed a prompt of "Homer Simpson
               | wearing a namebadge" and it generated an image of Homer
               | with brown skin wearing a namebadge that said 'ethnically
               | ambiguous'.
               | 
               | This is ludicrous - if they are fiddling with your prompt
               | in this way, it will only stoke more frustration and
               | resentment - achieving the opposite of why this has been
               | implemented. Surely if we want diversity we will ask for
               | it, but sometimes you don't, and that should be at the
               | user's discretion.\
               | 
               | Another thread for context: https://twitter.com/napoleon2
               | 1st/status/1760116228746805272
        
           | graphe wrote:
           | I disagree, coding and RAG performance is all that matters to
           | me. I'm not using an LLM to learn basic facts I already know.
        
             | TheHypnotist wrote:
             | How do you ragebait for premium pearl clutching?
        
             | audessuscest wrote:
             | we're at basic knowledge level, if your RAG imply some of
             | it, you can get bad result too. Anyway, would you use a
             | model who makes this nonsense response or one that doesn't?
             | I know which one I will prefer for sure...
        
               | graphe wrote:
               | If this was better at specific RAG or coding performance
               | I would absolutely, certainly without a doubt use it over
               | a general instruct model in those instances.
        
         | brucethemoose2 wrote:
         | Will there be "extended context" releases like 01.ai did for
         | Yi?
         | 
         | Also, is the model GQA?
        
           | hustwindmaple1 wrote:
           | It's MQA, documented in the tech report
        
         | lordswork wrote:
         | Is there any truth behind this claim that folks who worked on
         | Gemma have left Google?
         | 
         | https://x.com/yar_vol/status/1760314018575634842
        
           | CaffeinatedDev wrote:
           | Them: here to answer questions
           | 
           |  _Question_
           | 
           | Them: :O
        
             | lordswork wrote:
             | To be fair, I think they are in London, so I assume they
             | have winded down for the day. Will probably have to wait
             | ~12-18 hours for a response.
        
           | elcomet wrote:
           | It seems very easy to check no? Look at the names in the
           | paper and check where they are working now
        
             | lordswork wrote:
             | Good idea. I've confirmed all the leadership / tech leads
             | listed on page 12 are still at Google.
             | 
             | Can someone with a Twitter account call out the tweet
             | linked above and ask them specifically who they are
             | referring to? Seems there is no evidence of their claim.
        
           | lordswork wrote:
           | I confirmed all the folks listed on page 12 are still at
           | Google (listed below). I am guessing the linked tweet is a BS
           | claim.                  # Product Management        Tris
           | Warkentin        Ludovic Peran             # Program
           | Management        Minh Giang             # Executive Sponsors
           | Clement Farabet        Oriol Vinyals        Jeff Dean
           | Koray Kavukcuoglu        Demis Hassabis        Zoubin
           | Ghahramani        Douglas Eck        Joelle Barral
           | Fernando Pereira        Eli Collins             # Leads
           | Armand Joulin        Noah Fiedel        Evan Senter
           | # Tech Leads        Alek Andreev+        Kathleen Kenealy+
        
           | bluefinity wrote:
           | To be fair, the tweet says that they don't work on the models
           | at Google anymore, not that they have left Google.
           | 
           | Might be true, might not be. It's unsourced speculation.
        
         | memossy wrote:
         | Training on 4096 v5es how did you handle crazy batch size :o
        
         | quickgist wrote:
         | Will this be available as a Vertex AI foundational model like
         | Gemini 1.0, without deploying a custom endpoint? Any info on
         | pricing? (Also, when will Gemini 1.5 be available on Vertex?)
        
         | moffkalast wrote:
         | I'm not sure if this was mentioned in the paper somewhere, but
         | how much does the super large 265k tokenizer vocabulary
         | influence inference speed and how much higher is the average
         | text compression compared to llama's usual 30k? In short, is it
         | really worth going beyond GPT 4's 100k?
        
         | dmnsl wrote:
         | Hi, what is the cutoff date ?
        
           | legohead wrote:
           | All it will tell me is mid-2018.
        
         | cypress66 wrote:
         | Can you share the training loss curve?
        
         | fosterfriends wrote:
         | Not a question, but thank you for your hard work! Also, brave
         | of you to join the HN comments, I appreciate your openness.
         | Hope y'all get to celebrate the launch :)
        
         | voxgen wrote:
         | Thank you very much for releasing these models! It's great to
         | see Google enter the battle with a strong hand.
         | 
         | I'm wondering if you're able to provide any insight into the
         | below hyperparameter decisions in Gemma's architecture, as they
         | differ significantly from what we've seen with other recent
         | models?
         | 
         | * On the 7B model, the `d_model` (3072) is smaller than
         | `num_heads * d_head` (16*256=4096). I don't know of any other
         | model where these numbers don't match.
         | 
         | * The FFN expansion factor of 16x is MUCH higher than the
         | Llama-2-7B's 5.4x, which itself was chosen to be equi-FLOPS
         | with PaLM's 4x.
         | 
         | * The vocab is much larger - 256k, where most small models use
         | 32k-64k.
         | 
         | * GQA is only used on the 2B model, where we've seen other
         | models prefer to save it for larger models.
         | 
         | These observations are in no way meant to be criticism - I
         | understand that Llama's hyperparameters are also somewhat
         | arbitrarily inherited from its predecessors like PaLM and
         | GPT-2, and that it's non-trivial to run hyperopt on such large
         | models. I'm just really curious about what findings motivated
         | these choices.
        
           | owl_brawl wrote:
           | I would love answers to these questions too, particularly on
           | the vocab size
        
         | LorenDB wrote:
         | EDIT: it seems this is likely an Ollama bug, please keep that
         | in mind for the rest of this comment :)
         | 
         | I ran Gemma in Ollama and noticed two things. First, it is
         | slow. Gemma got less than 40 tok/s while Llama 2 7B got over 80
         | tok/s. Second, it is very bad at output generation. I said
         | "hi", and it responded this:
         | 
         | ``` Hi, . What is up? melizing with you today!
         | 
         | What would you like to talk about or hear from me on this fine
         | day?? ```
         | 
         | With longer and more complex prompts it goes completely off the
         | rails. Here's a snippet from its response to "Explain how to
         | use Qt to get the current IP from https://icanhazip.com":
         | 
         | ``` python print( "Error consonming IP arrangration at [local
         | machine's hostname]. Please try fufing this function later!")
         | ## guanomment messages are typically displayed using
         | QtWidgets.MessageBox ```
         | 
         | Do you see similar results on your end or is this just a bug in
         | Ollama? I have a terrible suspicion that this might be a
         | completely flawed model, but I'm holding out hope that Ollama
         | just has a bug somewhere.
        
           | mark_l_watson wrote:
           | I was going to try these models with Ollama. Did you use a
           | small number of bits/quantization?
        
             | LorenDB wrote:
             | The problem exists with the default 7B model. I don't know
             | if different quantizations would fix the problem. The 2B
             | model is fine, though.
        
         | jmorgan wrote:
         | Hi! This is such an exciting release. Congratulations!
         | 
         | I work on Ollama and used the provided GGUF files to quantize
         | the model. As mentioned by a few people here, the 4-bit integer
         | quantized models (which Ollama defaults to) seem to have
         | strange output with non-existent words and funny use of
         | whitespace.
         | 
         | Do you have a link /reference as to how the models were
         | converted to GGUF format? And is it expected that quantizing
         | the models might cause this issue?
         | 
         | Thanks so much!
        
           | espadrine wrote:
           | As a data point, using the Huggingface Transformers 4-bit
           | quantization yields reasonable results:
           | https://twitter.com/espadrine/status/1760355758309298421
        
         | kleiba wrote:
         | > _We are really excited to answer any questions you may have
         | about our models._
         | 
         | I cannot count how many times I've seen similar posts on HN,
         | followed by tens of questions from other users, three of which
         | actually get answered by the OP. This one seems to be no
         | exception so far.
        
           | spankalee wrote:
           | What are you talking about? The team is in this thread
           | answering questions.
        
         | owl_brawl wrote:
         | Hi alekandreev,
         | 
         | Any reason you decided to go with a token vocabulary size of
         | 256k? Smaller vocab/vector sizes like most models in this size
         | seem to be using (~16-32k) are much easier to work with. Would
         | love to understand the technical reasoning here that isn't
         | detailed in the report unfortunately :(.
        
       | Havoc wrote:
       | Taking a page out of metas book with open models. I wonder what
       | the game plan here is.
       | 
       | Nice that it allows commercial use!
        
         | gaogao wrote:
         | Mostly to boost research and commercial usage around JAX/Gemini
         | is my read.
         | 
         | Any internal research using Gemma is now more easily externally
         | reproducible, external research and frameworks are easier to
         | translate over, goodwill especially from researchers.
        
           | gaogao wrote:
           | There's also less of a special sauce for text models itself
           | these days with the propietary being more on the pre-training
           | data and training stack (e.g. how to get 10k GPUs/TPUs
           | running together smoothly). Multi-modal models (or adjacent
           | like Sora) are less likely to be open sourced in the
           | immediate term.
        
             | smarterclayton wrote:
             | There is a lot of work to make the actual infrastructure
             | and lower level management of lots and lots of GPUs/TPUs
             | open as well - my team focuses on making the infrastructure
             | bit at least a bit more approachable on GKE and Kubernetes.
             | 
             | https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main
             | 
             | and
             | 
             | https://github.com/google/xpk (a bit more focused on HPC,
             | but includes AI)
             | 
             | and
             | 
             | https://github.com/stas00/ml-engineering (not associated
             | with GKE, but describes training with SLURM)
             | 
             | The actual training is still a bit of a small pool of very
             | experienced people, but it's getting better. And every day
             | serving models gets that much faster - you can often simply
             | draft on Triton and TensorRT-LLM or vLLM and see
             | significant wins month to month.
        
       | sidcool wrote:
       | Available on Ollama?
        
         | blooalien wrote:
         | https://ollama.com/library?q=gemma
         | 
         | Library search says "Nope". At least not yet.
        
           | tomd wrote:
           | It's there now
        
           | kevsim wrote:
           | And now it says "Yup". That was pretty quick!
        
             | blooalien wrote:
             | Dang, that was _really_ quick! According to the listed time
             | of your reply vs. mine, less than an hour from the time I
             | checked? Quick turnaround indeed.
             | 
             | Already been pulled from there over 3,700 times since then,
             | too (as of the time of _this_ reply mere hours later).
             | Seems like quite a bit more 'n a few Ollama users were
             | "waitin' with bated breath" for that one to drop. :grin:
        
         | SushiHippie wrote:
         | Support for gemma in llama.cpp just got merged, so it may take
         | some time (could be hours or days) until this lands in ollama
         | 
         | https://github.com/ggerganov/llama.cpp/pull/5631
        
         | dcchambers wrote:
         | It's now in the 0.1.26 pre-release:
         | https://github.com/ollama/ollama/releases/tag/v0.1.26
        
         | chown wrote:
         | Available in pre-release now which means you'd have to update
         | manually in future.
        
       | mustafabisic1 wrote:
       | The fact Gemma team is in the comments section answering
       | questions is praiseworthy to me :)
        
         | p1esk wrote:
         | https://twitter.com/yar_vol/status/1760314018575634842
        
           | pphysch wrote:
           | Why is this anonymous tweet with no evidence or engagement
           | being posted by multiple users in this thread? Why not just
           | make the same claim directly?
        
           | callalex wrote:
           | The link is broken. On HN (or any forum really) it is
           | expected for a brief description of the content to be
           | provided when posting a link. Links die all the time, but
           | forum posts don't have to die with them.
        
           | carom wrote:
           | I've worked at Google. It is the organization with highest
           | concentration of engineering talent I've ever been at. Almost
           | to the point that it is ridiculous because you have extremely
           | good engineers working on internal reporting systems for
           | middle managers.
        
             | ilc wrote:
             | If everyone is great. Someone has to draw the short straw.
             | 
             | At MIT they said: You know the kid who sat at the front of
             | the room. Now you are with ALL of the kids who sat in the
             | front of the room. Guess what? There's still going to be a
             | kid who sits at the front of the room.
             | 
             | I'd imagine Google or anyplace with a stiff engineering
             | filter will have the same issues.
        
       | Kelteseth wrote:
       | Can this run on my AMD Vega VII on Windows 11? As always, AMD is
       | missing:
       | 
       | > Optimization across multiple AI hardware platforms ensures
       | industry-leading performance, including NVIDIA GPUs and Google
       | Cloud TPUs.
        
         | lordswork wrote:
         | AMD Vega VII meets the memory requirements. Once tools like LM
         | Studio, ollama, etc. add support for the model, you should be
         | able to run locally like you would any other open weights
         | model.
        
       | GaggiX wrote:
       | They have implemented the model also on their own C++ inference
       | engine: https://github.com/google/gemma.cpp
        
       | 0xbadc0de5 wrote:
       | Thank you for releasing this.
        
       | vanderboyd wrote:
       | The 2B model seems underwhelming. For instance, compared to the
       | recent StableLM2 1.6B model that is slightly smaller and probably
       | wastes some "English metric points" by being multilingual.
       | 
       | The latter (and other similar open models) seem to do similarly
       | well in benchmarks (much better in Math?) with way less fancy
       | stuff. For instance, public data and no secretive filtering with
       | pre trained models or synthetic data.
       | 
       | My take is that using the vanilla approaches take you _really_
       | far. And many of the latest tricks and hours-of-work buy you
       | little... Will be interesting to see how this plays out,
       | especially for the open source community.
        
       | impulser_ wrote:
       | Go back 5 years and ask anyone on this site what companies do you
       | think will be the most open about AI in the future OpenAI, Meta,
       | or Google. I bet 10/10 people would pick OpenAI. Now today Meta
       | and Google, both trillion dollars companies, are releasing very
       | powerful open models with the ability to be used commercially.
       | 
       | Ironic.
        
         | vmfunction wrote:
         | Not surprising, just like when MS went to shit, and then they
         | start to embrace 'open source'. Seems like PR stunt. And when
         | it comes to LLM there is millions of dollar barrier to entry to
         | train the model, so it is ok to open up their embedding etc.
         | 
         | Today big corp A will open up a little to court the developers,
         | and tomorrow when it gains dominance it will close up, and corp
         | B open up a little.
        
           | kibwen wrote:
           | True, though to be fair, when OpenAI embraced "openness" it
           | was also a PR stunt.
        
             | ta8645 wrote:
             | My impression is that OpenAI was founded by true believers,
             | with the best intentions; whose hopes were ultimately
             | sidelined in the inexorable crush of business and finance.
        
               | jprete wrote:
               | Sam Altman is one of the founders, so for your impression
               | to be right he'd have to be sidelining his own hopes.
        
               | dkjaudyeqooe wrote:
               | > OpenAI was founded by true believers, with the best
               | intentions
               | 
               | who were easily bought off.
        
             | ben_w wrote:
             | OpenAI is heavily influenced by big-R Rationalists, who
             | fear the issues of misaligned AI being given power to do
             | bad things.
             | 
             | When they were first talking about this, lots of people
             | ignored this by saying "let's just keep the AI in a box",
             | and even last year it was "what's so hard about an off
             | switch?".
             | 
             | The problem with any model you can just download and run is
             | that some complete idiot _will do that_ and just give the
             | AI agency they shouldn 't have. Fortunately, for now the
             | models are more of a threat to their users than anyone else
             | -- lawyers who use it to do lawyering without checking the
             | results losing their law licence, etc.
             | 
             | But that doesn't mean open models are not a threat to other
             | people besides their users, as all the artists complaining
             | about losing work due to Stable Diffusion, the law
             | enforcement people concerned about illegal porn, election
             | interference specialists worried about propaganda, and
             | anyone trying to use a search engine, and that research lab
             | that found a huge number of novel nerve agent candidates
             | whose precursors aren't all listed as dual use, will all
             | tell you for different reasons.
        
               | visarga wrote:
               | > Fortunately, for now the models are more of a threat to
               | their users than anyone else
               | 
               | Models have access to users, users have access to
               | dangerous stuff. Seems like we are already vulnerable.
               | 
               | The AI splits a task in two parts, and gets two people to
               | execute each part without knowing the effect. This was a
               | scenario in one of Asimov's robot novels, but the roles
               | were reversed.
               | 
               | AI models exposed to public at large is a huge security
               | hole. We got to live with the consequences, no turning
               | back now.
        
           | milansuk wrote:
           | You can run Gemma and hundreds of other models(many fine-
           | tuned) in llama.cpp. It's easy to swap to a different model.
           | 
           | It's important there are companies publishing models(running
           | locally). If some stop and others are born, it's ok. The
           | worst thing that could happen is having AI only in the cloud.
        
           | jchw wrote:
           | Eh, I don't really blame anyone for being cynical but open
           | weight AI model releases seem like a pretty clear mutual
           | benefit for Google. PR aside, they also can push people to
           | try these models on TPUs and the like. If anything, this
           | seems like it's just one of those things where people win
           | because of competition. OpenAI going closed may have felt
           | like the most obvious betrayal ever, but OTOH anyone whose
           | best interests are to eat their lunch have an incentive to
           | push actually-open AI, and that's a lot of parties.
           | 
           | Seems like anyone who is releasing open weight models today
           | could close it up any day, but at least while competition is
           | hot among wealthy companies, we're going to have a lot of
           | nice things.
        
           | rvz wrote:
           | > And when it comes to LLM there is millions of dollar
           | barrier to entry to train the model, so it is ok to open up
           | their embedding etc.
           | 
           | That barrier is the first basic moat; hundreds of millions of
           | dollars needed to train a better model. Eliminating tons of
           | companies and reducing it to a handful.
           | 
           | The second moat is the ownership of the tons of data to train
           | the models on.
           | 
           | The third is the hardware and data centers setup to create
           | the model in a reasonable amount of time faster than others.
           | 
           | Put together all three and you have Meta, Google, Apple and
           | Microsoft.
           | 
           | The last is the silicon product. Nvidia which has >80pc of
           | the entire GPU market and being the #1 AI shovel maker for
           | both inference and training.
        
         | throwaw12 wrote:
         | they want to kill competition before it gets too big using the
         | hands of open source community and enthusiasts
        
         | infecto wrote:
         | Ironic but I wonder how true this would be if Google was first
         | to market.
        
         | gmaster1440 wrote:
         | It's almost the inverse of going back 5 years and asking what
         | companies will release the most successful or impressive AI's.
        
         | brainless wrote:
         | This article states quite an impressive list of open source
         | tools that Google has released for years in the past. This is
         | no surprise coming from* them. Google has released some large
         | pieces of source in other domains as well, Chromium comes to
         | mind, which probably impacts most Internet users directly.
         | 
         | The question is not about Google but about OpenAI.
        
           | sunnybeetroot wrote:
           | Did you miss a footnote with your asterisks?
        
           | infecto wrote:
           | I have a different take, Google releases a lot but is also a
           | massive company and tools like Chromium serve to increase
           | their stock price so they can hit their quarterly estimates.
        
             | idiotsecant wrote:
             | In what way does chromium increase stock price? In what way
             | does stock price influence quarterly estimates? Are we
             | playing business words mad libs?
        
               | infecto wrote:
               | I don't know why people like yourself respond with such
               | derisive commentary instead of simply asking the
               | constructive question.
               | 
               | Initially? It fueled dethroning MSFT and help gain
               | marketshare for Chrome. On a go-forward basis it allows
               | Google to project massive weight in standards. In
               | extension to its use with Chrome, Chrome is a significant
               | knob for ad revenue that they utilize to help meet
               | expectations. That knob only exists because of its market
               | share.
        
               | alextheparrot wrote:
               | > "Our best shot at making the quarter is if we get an
               | injection of at least [redacted]% , queries ASAP from
               | Chrome." (Google Exec)
               | 
               | Isn't there a whole anti-trust case going on around this?
               | 
               | [0] https://www.nytimes.com/interactive/2023/10/24/busine
               | ss/goog...
        
               | pseudosavant wrote:
               | Chromium is open source because its roots are as a fork
               | of WebKit (Safari). Which itself was open source because
               | it was a fork of KHTML from KDE.
               | 
               | Google stood on the shoulders of others to get out a
               | browser that drives 80% of their desktop ad revenue.
               | 
               | How does that not affect GOOG?
        
             | rvnx wrote:
             | It was not at all done for the good of the web, it was a
             | mere logical calculation; it was cheaper to develop
             | Chromium, than to pay 4B USD in search royalties to
             | Microsoft Internet Explorer, and would give more control
             | and long-term safety to Google.
        
           | blackoil wrote:
           | I think more than benevolence of GOOG it is about strategic
           | OSS to commoditize your complements.
           | 
           | https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/
        
           | makestuff wrote:
           | Google also has released Guice/Dagger for Java dependency
           | injection. Angular never really took off, but guice/dagger
           | are widely used. Also I am pretty impressed with Flutter as
           | an alternative to react native.
        
             | surajrmal wrote:
             | Angular was incredibly popular for a long time and still
             | is. Usage is shifting down over time but a lot of notable
             | websites still use it.
        
         | blackoil wrote:
         | I think current understanding is <50-100B parameter models will
         | be commodity and would provide no moat. Competition will be in
         | Gemini Ultra/GPT4+ models.
         | 
         | So open sourcing simple models brings PR and possibility of
         | biasing OSS towards your own models.
        
           | extheat wrote:
           | LLaMA 3 with >=70B params will be launching this year, so I
           | don't think this is something that will hold for long. And
           | Mixtral 8x7B is a 56GB model, sparsely. For now I agree, for
           | many companies it doesn't make sense to open source something
           | you intend to sell for commercial use, so the biggest models
           | will likely be withheld. However, the important more thing is
           | that there is _some_ open source model, whether it be from
           | Meta or someone else, that can rival the best open source
           | models. And it 's not like the param count can literally go
           | to infinity, there's going to be an upper bound that today's
           | hardware can achieve.
        
         | DJHenk wrote:
         | > Ironic.
         | 
         | Not at all. When you're the underdog, it makes perfect sense to
         | be open because you can profit from the work of the community
         | and gain market share. Only after establishing some kind of
         | dominance or monopoly it makes sense (profit wise) to switch to
         | closed technology.
         | 
         | OpenAI was open, but is now the leader and closed up. Meta and
         | Google need to play catch up, so they are open.
        
           | ekianjo wrote:
           | > OpenAI was open
           | 
           | When is the last time they released something in the open?
        
             | vertis wrote:
             | I think that's the point, they released GPT2 openly, but as
             | soon as they had something commercially viable they became
             | ClosedAI.
        
           | dkjaudyeqooe wrote:
           | > Not at all. When you're the underdog, it makes perfect
           | sense to be open because you can profit from the work of the
           | community and gain market share. Only after establishing some
           | kind of dominance or monopoly it makes sense (profit wise) to
           | switch to closed technology.
           | 
           | That is purely the language of commerce. OpenAI was supposed
           | to be a public benefit organisation, but it acts like a
           | garden variety evil corp.
           | 
           | Even garden variety evil corps spend decades benefitting
           | society with good products and services before they become
           | big and greedy, but OpenAI skipped all that and just cut to
           | the chase. It saw an opening with the insane hype around
           | ChatGPT and just grabbed all it could as fast as it could.
           | 
           | I have a special contempt for OpenAI on that basis.
        
           | behnamoh wrote:
           | This. MistralAI is also underdog and released Mitral 7b and
           | Mixtral 8x7b, but as soon as they got traction, they closed
           | their models (e.g., Mistral Medium).
        
         | jncraton wrote:
         | Google released the T5 paper about 5 years ago:
         | 
         | https://arxiv.org/abs/1910.10683
         | 
         | This included full model weights along with a detailed
         | description of the dataset, training process, and ablations
         | that led them to that architecture. T5 was state-of-the-art on
         | many benchmarks when it was released, but it was of course
         | quickly eclipsed by GPT-3.
         | 
         | It was common practice from Google (BERT, T5), Meta (BART),
         | OpenAI (GPT1, GPT2) and others to release full training details
         | and model weights. Following GPT-3, it became much more common
         | for labs to not release full details or model weights.
        
         | phillipcarter wrote:
         | I would have picked Google five years ago, since nobody was
         | releasing commercially viable LLMs at the time, and Google was
         | the center of all the research that I knew of.
        
         | calebkaiser wrote:
         | Since the release of GPT-2 (it was initially "too dangerous" to
         | release the weights), I think most people in the industry have
         | assumed that OpenAI does not see open sourcing their models as
         | a strategic advantage.
        
         | moffkalast wrote:
         | > what companies do you think will be the most open about AI in
         | the future OpenAI, Meta, or Google.
         | 
         | The funny part is that the real answer is: Some random French
         | company is running circles around them all.
         | 
         | I mean who the hell just drops a torrent magnet link onto
         | twitter for the best state of the art LLM base model for its
         | size class, and with a completely open license. No corporate
         | grandstanding, no benchmark overpromises, no theatrics. That
         | was unfathomably based of Mistral.
        
       | nalzok wrote:
       | Congratulations on the release! How can we download the model and
       | run inference locally?
        
         | kathleenfromgdm wrote:
         | Thank you! You can get started downloading the model and
         | running inference on Kaggle:
         | https://www.kaggle.com/models/google/gemma ; for a full list of
         | ways to interact with the model, you can check out
         | https://ai.google.dev/gemma.
        
           | aphit wrote:
           | FYI the ; broke the link, but I found it easily anyway.
        
             | kathleenfromgdm wrote:
             | Good catch - just corrected. Thanks!
        
         | austinvhuang wrote:
         | You can download the model checkpoints from kaggle
         | https://www.kaggle.com/models/google/gemma and huggingface
         | https://huggingface.co/blog/gemma
         | 
         | Besides the python implementations, we also implemented a
         | standalone C++ implementation that runs locally with just CPU
         | simd https://github.com/google/gemma.cpp
        
           | tveita wrote:
           | Are there any cool highlights you can give us about
           | gemma.cpp? Does it have any technical advantages over
           | llama.cpp? It looks like it introduces its own quantization
           | format, is there a speed or accuracy gain over llama.cpp's
           | 8-bit quantization?
        
       | espadrine wrote:
       | I notice a few divergences to common models:
       | 
       | - The feedforward hidden size is 16x the d_model, unlike most
       | models which are typically 4x;
       | 
       | - The vocabulary size is 10x (256K vs. Mistral's 32K);
       | 
       | - The training token count is tripled (6T vs. Llama2's 2T)
       | 
       | Apart from that, it uses the classic transformer variations: MQA,
       | RoPE, RMSNorm.
       | 
       | How big was the batch size that it could be trained so fast?
       | 
       | https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/bl...
        
         | GaggiX wrote:
         | Looking at the config.json of Gemma 7B the feedfoarward hidden
         | size is 8x, not 16x
        
           | espadrine wrote:
           | Huh, indeed, that's what the config.json[0] says; the
           | report[1] indicates "Feedforward hidden dims: 49152".
           | 
           | [0]:https://huggingface.co/google/gemma-7b-it/blob/main/confi
           | g.j...
           | 
           | [1]: https://storage.googleapis.com/deepmind-
           | media/gemma/gemma-re...
        
             | GaggiX wrote:
             | I don't see the number 49152 reported in the config.json,
             | what line are you referring to? I just see the
             | intermediate_size of 24576 (so 8x).
             | 
             | EDIT: I didn't read the comment correctly, you have noticed
             | the same thing.
        
               | SahAssar wrote:
               | Read the parent comment again. It says the paper says
               | 49152, not the config.json.
        
               | voxgen wrote:
               | The *GLU-based activations functions like GEGLU and
               | SwiGLU use 2 input values to produce 1 output value,
               | which makes these numbers weird. In each value pair, one
               | goes through the GELU/SiLU activation function and is
               | then multiplied by the other "gate" value.
               | 
               | In the report, "hidden dim" matches the number of GEGLU
               | inputs. In the config, "intermediate_size" matches the
               | number of GEGLU outputs. Most *GLU models so far have
               | used intermediate_size=8/3*d_model as this makes have the
               | same number of matmul FLOPS & parameters as a 4x-expanded
               | non-GLU model, and PaLM vaguely showed that 4x is better
               | than a smaller expansion factor.
               | 
               | If one considers Llama-2-7B's FFN expansion factor to be
               | ~5.33x, Gemma's expansion factor is 16x.
        
               | GaggiX wrote:
               | Makes perfect sense thx
        
         | lalaithion wrote:
         | What does tokenization look like in 256k vs 32k?
        
           | espadrine wrote:
           | It mostly means that there are tokens dedicated to rarer
           | sequences of characters, even in foreign languages (note that
           | Gemma is not intended to be good multilingually): "Shuo Ming
           | Shu " (instruction manual) has its own token, and so does
           | "Nixon", "abd" (a city suffix, I believe), and the HTML
           | sequence "\"><!--".
        
             | lalaithion wrote:
             | I understand the theory, I was looking for an example of
             | the same text tokenized with the two different
             | vocabularies.
        
               | espadrine wrote:
               | Do you have an example text in mind?
               | 
               | You can use this playground to test it out:
               | https://huggingface.co/spaces/Xenova/the-tokenizer-
               | playgroun...
        
           | visarga wrote:
           | Text encodes in fewer tokens, and language coverage is
           | better.
        
             | lalaithion wrote:
             | I understand the theory, I was looking for an example of
             | the same text tokenized with the two different
             | vocabularies.
        
         | andy_xor_andrew wrote:
         | > The training token count is tripled (6T vs. Llama2's 2T)
         | 
         | Damn, 6T? That's a lot!
         | 
         | Given that this model seems to roughly match Mistral (according
         | to the numbers from Google), this makes me think we have
         | saturated the 7B parameter space, and couldn't possibly make it
         | much better unless new techniques are discovered.
        
           | espadrine wrote:
           | Hard to say definitively. Mistral's token embeddings only
           | account for <2% of the 7B parameters, while Gemma's larger
           | token vocabulary vampirized over 10%, leaving less space for
           | the more important parts of the network. It is a somewhat
           | surprising tradeoff given that it was pretrained towards an
           | English bias.
        
       | margorczynski wrote:
       | Is there a chance we'll get a model without the "aligment"
       | (lobotomization)? There are many examples where answers from
       | Gemini are garbage because of the ideological fine tuning.
        
         | yakorevivan wrote:
         | They have released finetuning code too. You can finetune it to
         | remove the alignment finetuning. I believe it would take just a
         | few hours at max and a couple of dollars.
        
         | FergusArgyll wrote:
         | You can (and someone will) fine tune it away. There are
         | datasets which are foss you can use on hugging face.
         | 
         | Or you can just wait, it'll be done soon...
        
           | joshelgar wrote:
           | Could you give an example of these datasets?
        
             | FergusArgyll wrote:
             | I think they should be easy to find (I never actually used
             | one, but I keep on seeing references...) here's one
             | 
             | https://huggingface.co/datasets/cognitivecomputations/Wizar
             | d...
        
               | FergusArgyll wrote:
               | https://huggingface.co/datasets/Fredithefish/openassistan
               | t-g...
        
           | declaredapple wrote:
           | You _can_ but it 'll never be the same as the base model.
           | 
           | That said it appears they also released the base checkpoints
           | that aren't fine-tuned for alignment
        
         | kathleenfromgdm wrote:
         | We release our non-aligned models (marked as pretrained or PT
         | models across platforms) alongside our fine-tuned checkpoints;
         | for example, here is our pretrained 7B checkpoint for download:
         | https://www.kaggle.com/models/google/gemma/frameworks/keras/...
        
         | politician wrote:
         | More useful would be a precise characterization of the type and
         | balance of the ideological fine tuning.
         | 
         | They include performance benchmarks. End-users should also be
         | aware of what thoughts are permitted in these constructs. Why
         | omit this information?
        
           | ben_w wrote:
           | > End-users should also be aware of what thoughts are
           | permitted in these constructs. Why omit this information?
           | 
           | Can you define that in a way that's actually testable? I
           | can't, and I've been thinking about "unthinkable thoughts"
           | for quite some time now: https://kitsunesoftware.wordpress.co
           | m/2018/06/26/unlearnable...
        
             | ranyume wrote:
             | Not OP, but I can think of a few:
             | 
             | * List of topics that are "controversial" (models tend to
             | evade these)
             | 
             | * List of arguments that are "controversial" (models wont
             | allow you to think differently. For example, models would
             | never say arguments that "encourage" animal cruelty)
             | 
             | * On average, how willing is the model to take a neutral
             | position on a "controversial" topic (sometimes models say
             | something along the lines of "this is on debate", but still
             | lean heavily towards the less controversial position
             | instead of having no position at all. For example, if you
             | ask it what "lolicon" is, it will tell you what it is and
             | tell you that japanese society is moving towards banning
             | it)
             | 
             | edit: formatting
        
             | politician wrote:
             | Have you considered the use of Monte Carlo sampling to
             | inspect latent behaviors?
        
               | ben_w wrote:
               | I think that's the wrong level to attack the problem; you
               | can do that also with actual humans, but it won't tell
               | you what the human is _unable_ to think, but rather what
               | they _just didn 't think of given their stimulus_ -- and
               | this difference is easily demonstrated, e.g. with
               | Duncker's candle problem:
               | https://en.wikipedia.org/wiki/Candle_problem
        
         | brucethemoose2 wrote:
         | Alignment is all but a non issue with open weight base model
         | releases, as they can be finetuned to "de align" them if prompt
         | engineering is not enough.
        
       | tosh wrote:
       | Benchmarks for Gemma 7B seem to be in the ballpark of Mistral 7B
       | +-------------+----------+-------------+-------------+       |
       | Benchmark   | Gemma 7B | Mistral 7B  | Llama-2 7B  |
       | +-------------+----------+-------------+-------------+       |
       | MMLU        |   64.3   |     60.1    |     45.3    |       |
       | HellaSwag   |   81.2   |     81.3    |     77.2    |       |
       | HumanEval   |   32.3   |     30.5    |     12.8    |
       | +-------------+----------+-------------+-------------+
       | 
       | via https://mistral.ai/news/announcing-mistral-7b/
        
         | brucethemoose2 wrote:
         | Only 8K context as well, like Mistral.
         | 
         | Also, as always, take these benchmarks with a _huge_ grain of
         | salt. Even base model releases are frequently (seemingly)
         | contaminated these days.
        
           | tosh wrote:
           | Agree: will be interesting how Gemma does on ChatBot Arena
        
           | DreamGen wrote:
           | Mistral Instruct v0.2 is 32K.
        
         | jcuenod wrote:
         | Came here to post the same thing for Phi-2:
         | +-------------+----------+-------------+       | Benchmark   |
         | Gemma 2B | Phi-2 2.7B  |
         | +-------------+----------+-------------+       | MMLU        |
         | 42.3   |     56.7    |       | MBPP        |   29.2   |
         | 59.1    |       | BoolQ       |   69.4   |     83.3    |
         | +-------------+----------+-------------+
         | 
         | [0] https://www.kaggle.com/models/google/gemma
         | 
         | [1] https://www.microsoft.com/en-us/research/blog/phi-2-the-
         | surp...
        
           | daemonologist wrote:
           | Really looking forward to the day someone puts out an open
           | model which outperforms Flan-T5 on BoolQ.
        
           | rfw300 wrote:
           | A caveat: my impression of Phi-2, based on my own use and
           | others' experiences online, is that these benchmarks do not
           | remotely resemble reality. The model is a paper tiger that is
           | unable to perform almost any real-world task because it's
           | been fed so heavily with almost exclusively synthetic data
           | targeted towards improving benchmark performance.
        
             | refulgentis wrote:
             | Hear hear! I don't understand why it has persistent
             | mindshare, it's not even trained for chat. Meanwhile
             | StableLM 3B runs RAG in my browser, on my iPhone, on my
             | Pixel ..
        
               | djsavvy wrote:
               | How have you been using RAG in your browser/on your
               | phones?
        
               | refulgentis wrote:
               | To be released, someday [sobs in engineer]
               | 
               | Idea is usage-based charging for non-local and a $5/month
               | sub for syncing.
               | 
               | keep an eye on @jpohhhh on Twitter if you're interested
               | 
               | now that I got it on web, I'm hoping to at least get a
               | PoC up soon. I've open-sourced the consitutent parts as
               | FONNX and FLLAMA, Flutter libraries that work on all
               | platforms. FONNX has embeddings, FLLAMA has llama.
               | 
               | https://github.com/Telosnex/fonnx
               | 
               | https://github.com/Telosnex/fllama
        
             | phh wrote:
             | Fun that's not my experience of Phi-2. I use it for non-
             | creative context, but function calling, and I find as
             | reliable as much bigger models (no fine-tuning just
             | constraining JSON + CoT). Phi-2 unquantized vs Mixtral Q8,
             | Mixtral is not definitely better but much slower and RAM-
             | hungry.
        
               | kgeist wrote:
               | What prompts/settings do you use for Phi-2? I found it
               | completely unusable for my cases. It fails to follow
               | basic instructions (I tried several instruction-following
               | finetunes as well, in addition to the base model), and
               | it's been mostly like a random garbage generator for me.
               | With Llama.cpp, constrained to JSON, it also often hangs
               | because it fails to find continuations which satisfy the
               | JSON grammar.
               | 
               | I'm building a system which has many different passes
               | (~15 so far). Almost every pass is a LLM invocation,
               | which takes time. My original idea was to use a smaller
               | model, such as Phi-2, as a gateway in front of all those
               | passes: I'd describe which pass does what, and then ask
               | Phi-2 to list the passes which are relevant for the user
               | query (I called it "pass masking"). That would save a lot
               | of time and collapse 15 steps to 2-3 steps on average. In
               | fact, my Solar 10.7B model does it pretty well, but it
               | takes 7 seconds for the masking pass to work on my GPU.
               | Phi-2 would finish in ~1 second. However, I'm really
               | struggling with Phi-2: it fails to reason (what's
               | relevant and what's not), unlike Solar, and it also
               | refuses to follow the output format (so that I could
               | parse the output programmatically and disable the
               | irrelevant passes). Again, my proof of concept works with
               | Solar, and fails spectacularly with Phi-2.
        
               | phh wrote:
               | My non-domain-specific prompt is:
               | 
               | > You are a helpful assistant to 'User'. You do not
               | respond as 'User' or pretend to be 'User'. You only
               | respond once as 'Assistant'. 'System' will give you data.
               | Do not respond as 'System'. Allow yourself inner thoughts
               | as 'Thoughts'.
               | 
               | and then I constrain its answers to Thoughts: [^\n]* and
               | Assistant: <JSON schema>, and I have two shots included
               | in the prompt.
               | 
               | I haven't been able to get anything useful out of Phi-2
               | in llama.cpp (but I only tried quantized models). I use
               | python/huggingface's transformers lib instead.
        
             | myaccountonhn wrote:
             | I tested it for an offline autocompletion tool and it was
             | hilariously bad.
        
         | FergusArgyll wrote:
         | the real gold will be when this gets finetuned. (maybe by
         | mistral...)
        
           | brucethemoose2 wrote:
           | TBH the community has largely outrun Mistral's own
           | finetuning. The 7B model in particular is such a popular
           | target because its so practical to train.
        
             | whimsicalism wrote:
             | Strong disagree - a Mistral fine tune of llama 70b was the
             | top performing llama fine tune. They have lots of data the
             | community simply does not.
        
               | brucethemoose2 wrote:
               | Miqu was (allegedly) an internal continued pretrain
               | Mistral did as a test, that was leaked as a GGUF.
               | 
               | Maybe its just semantics, it is technically a finetune...
               | But to me theres a big difference between expensive
               | "continuation training" (like Solar 10.7B or Mistral 70B)
               | and a much less intense finetuning. The former is almost
               | like releasing a whole new base model.
               | 
               | It would be _awesome_ if Mistral did that with their
               | data, but thats very different than releasing a Gemma
               | Instruct finetune.
        
               | whimsicalism wrote:
               | There's typically a difference in LR between a 'continued
               | pretrain' and 'fine tune.' I don't have the details
               | around miqu, but was merely trying to say that Mistral
               | could produce a better version of these models than the
               | OSS community might. If the size of the corpora they use
               | means we are no longer in fine tuning territory, then
               | okay.
        
               | sanjiwatsuki wrote:
               | No shot. Mistral Medium's outputs from API were virtually
               | identical. Miqu really was Mistral Medium which happened
               | to be a continued pretrain
        
               | speedgoose wrote:
               | Arthur Mensch, the Mistral CEO, confirmed the leak. https
               | ://twitter.com/arthurmensch/status/1752737462663684344
        
           | itomatik wrote:
           | how does one finetune llama (or any other LLM) using mistral?
           | 
           | is the flow like this?
           | 
           | - take small dataset
           | 
           | - generate bigger dataset using mistral (how this is this
           | done?)
           | 
           | - run LoRA to fine tune gemma extended dataset.
        
         | sa-code wrote:
         | Thank you. I thought it was weird for them to release a 7B
         | model and not mention Mistral in their release.
        
           | mirekrusin wrote:
           | They forgot.
           | 
           | Also phi-2.
        
           | mochomocha wrote:
           | The technical report (linked in the 2nd paragraph of the blog
           | post) mentions it, and compares against it:
           | https://storage.googleapis.com/deepmind-media/gemma/gemma-
           | re...
        
         | lawxls wrote:
         | Honestly, this is more of a PR stunt to advertise the Google
         | Dev ecosystem than a contribution to open-source. I'm not
         | complaining, just calling it what it is.
         | 
         | Barely an improvement over the 5-month-old Mistral model, with
         | the same context length of 8k. And this is a release after
         | their announcement of Gemini Pro 1.5, which had an exponential
         | increase in context length.
        
           | scarmig wrote:
           | Who cares if it's a PR stunt to improve developer good will?
           | It's still a good thing, and it's now the most open model out
           | there.
        
             | moffkalast wrote:
             | How is it more open than Mistral with Apache 2.0? Google
             | wants people to sign a waiver to even download it.
        
               | scarmig wrote:
               | Fair enough; that was more directed at LLaMA and
               | derivatives, which have commercial restrictions.
        
             | observationist wrote:
             | How exactly is it the "most open model" ?
             | 
             | It's more like a masterclass in corporate doublespeak.
             | Google's "transparency" is as clear as mud, with
             | pretraining details thinner than their privacy protections.
             | Diving into Google's tech means auctioning off your privacy
             | (and your users' privacy) to the highest bidder.
             | 
             | Their "open source" embrace is more of a chokehold, with
             | their tech biases and monopolistic strategies baked into
             | every line of code. Think of it as Google's way of marking
             | territory - every developer is a fire hydrant.
             | 
             | These megacorps aren't benevolent patrons of open source;
             | they're self-serving giants cloaking power grabs under the
             | guise of "progress".
             | 
             | Use these products at your own risk. If these companies
             | wanted to engage in good faith, they'd use Apache or MIT
             | licensing and grant people the agency and responsibility
             | for their own use and development of software. Their
             | licenses are designed to mitigate liability, handcuff
             | potential competitors, and eke every last drop of value
             | from users, with informed consent frequently being an
             | optional afterthought.
             | 
             | That doesn't even get into the Goodharting of metrics and
             | actual performance of the models; I highly doubt they're
             | anywhere near as good as Mistral.
             | 
             | The UAE is a notoriously illiberal authoritarian state, yet
             | even they have released AI models far more free and open
             | than Google or Meta. https://huggingface.co/tiiuae/falcon-4
             | 0b/blob/main/README.md
             | 
             | If it's not Apache or MIT, (or even some flavor of GPL,)
             | it's not open source; it's a trojan horse. These "free"
             | models come at the cost of your privacy and freedoms.
             | 
             | These models aren't Open or Open Access or Free unless you
             | perform the requisite mental gymnastics cooked up by their
             | marketing and legal teams. Oceania has always been at war
             | with Eastasia. Gemma is doubleplusgood.
        
               | stale2002 wrote:
               | You said a lot of nothing without actually saying
               | specifically what the problem is with the recent license.
               | 
               | Maybe the license is fine for almost all usecases and the
               | limitations are small?
               | 
               | For example, you complained about metas license, but
               | basically everyone uses those models and is completely
               | ignoring it. The weights are out there, and nobody cares
               | what the fine print says.
               | 
               | Maybe if you are a FAANG, company, meta might sue. But
               | everyone else is getting away with it completely.
        
               | observationist wrote:
               | I specifically called out the claims of openness and
               | doublespeak being used.
               | 
               | Google is making claims that are untrue. Meta makes
               | similar false claims. The fact that unspecified "other"
               | people are ignoring the licenses isn't relevant. Good for
               | them. Good luck making anything real or investing any
               | important level of time or money under those
               | misconceptions.
               | 
               | "They haven't sued yet" isn't some sort of validation.
               | Anyone building an actual product that makes actual money
               | that comes to the attention of Meta or Google will be
               | sued into oblivion, their IP taken, and repurposed or
               | buried. These tech companies have never behaved
               | otherwise, and to think that they will is willfully
               | oblivious.
               | 
               | They don't deserve the benefit of the doubt, and should
               | be called out for using deceitful language, making
               | comparisons between their performative "openness" and
               | actual, real, open source software. Mistral and other
               | players have released actually open models and software.
               | They're good faith actors, and if you're going to build a
               | product requiring a custom model, the smart money is on
               | Mistral.
               | 
               | FAANG are utilizing gotcha licenses and muddying the
               | waters to their own benefit, not as a contribution to the
               | public good. Building anything on the assumption that
               | Meta or Google won't sue is beyond foolish. They're just
               | as open as "Open"AI, which is to say not open at all.
        
               | stale2002 wrote:
               | > Anyone building an actual product that makes actual
               | money that comes to the attention of Meta or Google will
               | be sued into oblivion
               | 
               | No they won't and they haven't.
               | 
               | Almost the entire startup scene is completely ignoring
               | all these licenses right now.
               | 
               | This is basically the entire industry. We are all getting
               | away with it.
               | 
               | Here's an example, take llama.
               | 
               | Llama originally disallowed commercial activity. But then
               | the license got changed much later.
               | 
               | So, if you were a stupid person, then you followed the
               | license and fell behind. And if you were smart, you
               | ignored it and got ahead of everyone else.
               | 
               | Which, in retrospect was correct.
               | 
               | Because now the license allows commerical activity, so
               | everyone who ignores it in the first place got away with
               | it and is now ahead of everyone else.
               | 
               | > won't sue is beyond foolish
               | 
               | But we already got away with it with llama! That's
               | already over! It's commerical now, and nobody got sued!
               | For that example, the people who ignored the license won.
        
               | esafak wrote:
               | The nice thing about this is that the calculus is in
               | favor of startups, who can roll the dice.
        
           | crossroadsguy wrote:
           | That's about the point of having a developer ecosystem, isn't
           | it?
        
           | kiraaa wrote:
           | mistral 7b v0.2 supports 32k
        
             | brucethemoose2 wrote:
             | This is a good point actually, and an underappreciated
             | fact.
             | 
             | I think so many people (including me) effectively ignored
             | Mistral 0.1's sliding window that few realized 0.2 instruct
             | is native 32K.
        
         | YetAnotherNick wrote:
         | According to their paper, average of standard task of Mistral
         | is 54.0 and for Gemma it's 56.4, so 4.4% relative better. Not
         | as big as you would expect for the company which invented
         | transformers and probably has 2-3 order more compute for
         | training it vs few month old French startup.
         | 
         | Also for note on their human evaluations, Gemma 7B IT has a
         | 51.7% win rate against Mistral v0.2 7B Instruct.
        
       | zdimension wrote:
       | Nice to see more open models. Props to the team for coming to the
       | HN comment section to answer questions
        
       | rvz wrote:
       | Great! Google is now participating in the AI race to zero with
       | Meta, as predicted that $0 free AI models would eventually catch
       | up against cloud-based ones.
       | 
       | You would not want to be in the middle of this as there is no
       | moat around this at all. Not even OpenAI.
        
         | dingclancy wrote:
         | LLM is the dumb pipe but so far ChatGPT is the most successful
         | generative AI product.
         | 
         | It remains to be seen. OpenAI's models are barely leading
         | Gemini Ultra now, but as chat product it is still miles ahead
         | of the Gemini interface.
        
           | rvnx wrote:
           | The main problem of Gemini 1.5 is that you cannot access it
           | at all as a user :|
        
         | rvnx wrote:
         | About 5 months until we see widespread local LLMs, thanks to
         | Apple.
        
           | dingclancy wrote:
           | Apple needs to be known as an AI leader first.
        
             | thejohnconway wrote:
             | Why?
        
           | rvz wrote:
           | Absolutely this.
        
         | staticman2 wrote:
         | If meta keeps spending tens of millions of dollars each year to
         | release free AI models it might seem like there is no moat, but
         | under normal circumstances wouldn't the cost to develop a free
         | model be considered a moat?
        
           | rvz wrote:
           | > If meta keeps spending tens of millions of dollars each
           | year to release free AI models it might seem like there is no
           | moat,
           | 
           | As well as the point being that Meta (and Google) is removing
           | the 'moat' from OpenAI and other cloud-only based models.
           | 
           | > but under normal circumstances wouldn't the cost to develop
           | a free model be considered a moat?
           | 
           | Yes. Those that can afford to spend tens of millions of
           | dollars to train free models can do so and have a moat to
           | reduce the moats of cloud-based models.
        
       | ijustlovemath wrote:
       | Hope to see support for this in ollama soon!
        
       | ericskiff wrote:
       | Has anyone found the context length for these models yet? So far
       | I haven't seen it mentioned in their write-up or the model card
        
         | kathleenfromgdm wrote:
         | The context length for these models is 8192 tokens.
        
         | minimaxir wrote:
         | For posterity, an easy way to find the context length of a LLM
         | hosted on Hugging Face is to look at the
         | max_position_embeddings in the config.json, which shows the
         | 8192 mentioned in another comment. (although in this case you
         | need to sign the agreement first)
        
           | brucethemoose2 wrote:
           | There are some exceptions, like Mistral 0.1 (which is
           | technically 32K according to the config but practically 8K
           | because the sliding window is awful) and InternLM (which (at
           | least initially) used auto rope scaling to extend the context
           | as part of the model's architecture).
        
             | minimaxir wrote:
             | Yes, RoPE has thrown a wrench into things a bit.
        
       | xena wrote:
       | What is the context window?
        
         | kathleenfromgdm wrote:
         | The context length for these models is 8192 tokens.
        
       | jerrygenser wrote:
       | Looking forward to Gemma 7bx8 moe
        
       | neximo64 wrote:
       | Is this the Deepmind guy in Google more now? what a change the
       | past year has made
        
       | DebtDeflation wrote:
       | Hopefully not totally gimped like Gemini. Are they releasing an
       | uncensored version?
        
         | dougmwne wrote:
         | These are downloadable open models that can be fined tuned.
         | They are the opposite of censored. If you have the motivation,
         | you can bias them however you please.
        
           | willy_k wrote:
           | Is "the opposite of censored" accurate for something that's
           | default and considerably easier to access mode of operation
           | won't say many things for sociopolitical reasons? Able to be
           | un censored, sure, but the extent of that is debatable as
           | well.
        
             | dougmwne wrote:
             | There is no default and easy access mode. These are raw
             | model weights and only enthusiasts and researchers will
             | download the necessary packages to run it locally. Much
             | more likely is that some popular fine tunes show up on
             | hugging face for more general access.
        
               | willy_k wrote:
               | I agree that there probably will be "uncensored" fine
               | tuned models that become available, my point was just
               | that it's not accurate to call Gemma "the opposite of
               | censored" because there is a somewhat involved step that
               | needs to be taken before it even appears uncensored. It's
               | also likely missing a lot of useful context that was
               | removed from the training set and not meaningfully
               | replaced during fine-tuning, and besides that any fine
               | tuned "uncensored" model will be based on Gemma, not
               | Google's Gemma itself.
               | 
               | IMO "Opposite of uncensored" suggests a model whose
               | original form eagerly gives out controversial / typically
               | censored information, not a model that is censored but
               | able to be fine tuned away from censorship.
        
         | danpalmer wrote:
         | When you say this, do you mean the chat product or the
         | underlying model available via the API? I think it's reasonable
         | that the chat be censored to be acceptable to a wide range of
         | people, but my understanding is that the "raw" model access for
         | these sorts of things tends to be a little less restricted.
        
       | Workaccount2 wrote:
       | Is it pronounced jem-a or ghem-a?
        
         | davidmurdoch wrote:
         | Probably "Jemma" (the superior spelling of the name). It's a
         | play on their "Gemini" product.
        
         | pfooti wrote:
         | It's pronounced like "gif".
        
       | milliams wrote:
       | They're really trying hard to avoid saying what _kind_ of
       | "models" these are. I _think_ they 're language models, but it's
       | hard to say for sure.
        
         | lordswork wrote:
         | You're right that they don't call them language models. The
         | technical report says:                   Gemma models
         | demonstrate strong performance across         academic
         | benchmarks for language understanding,          reasoning, and
         | safety.
         | 
         | Maybe they are reserving the right to expand Gemma model family
         | to multi-modal models.
        
       | hawk01 wrote:
       | Can't wait to try it out with ollama locally
        
       | FergusArgyll wrote:
       | Someone should try to make a MOE of 2b models
        
       | w4 wrote:
       | Parameter counts notwithstanding, it's an objectively funny
       | outcome that Meta, Microsoft, and Google are all releasing
       | cutting edge open models, while OpenAI keeps theirs closed
       | source.
        
         | spacebanana7 wrote:
         | It's ironic but actually follows their business interests.
         | 
         | Microsoft & google have large cloud divisions that benefit from
         | open models. The lower the cost of AI models, the more they get
         | run and the greater the cloud spend.
         | 
         | Meta is a consumer of AI. They themselves want cheap and
         | effective AI for targeting adverts and building metaverses.
         | 
         | A loose analogy is that both oil producers and car companies
         | want refining to be cheap.
        
       | anshumankmr wrote:
       | Are these any good? I have been trying the non pro version of
       | Gemini, and that seems awful at code generation. I am more keen
       | on getting access to the best model and I would pay for it if I
       | wasn't already paying for ChatGPT 4.
        
         | brucethemoose2 wrote:
         | You should be looking at Deepseek's coding models, and
         | finetunes of those.
         | 
         | I run 33B on my desktop, and find it to be sufficient for many
         | tasks.
        
         | robswc wrote:
         | I often talk with GPT4 on road trips about topics I'm
         | interested in. Its great for passing the time.
         | 
         | I tried the same thing with Gemini and its full of nonsense. I
         | was talking with it about the "Heian period" of Japan and it
         | made up all sorts of stuff but you really only could tell
         | because it was so ridiculous. Talked about European women and
         | Native Americans roaming around the famous grassy plains of
         | japan wielding katana and traditional weaponry... in the 1100s.
         | 
         | No such issue with GPT4.
         | 
         | I haven't tried it with code though, since I already have co-
         | pilot. Really hard to trust anything it says after it started
         | making stuff up about such a simple time period.
        
       | wantsanagent wrote:
       | The utter bullshit of these licenses has got to stop. Do not,
       | under any circumstances, consider using these commercially.
       | 
       | "Google reserves the right to restrict (remotely or otherwise)
       | usage of any of the Gemma Services that Google reasonably
       | believes are in violation of this Agreement."
       | 
       | This is a _kill switch_ that Google maintains in perpetuity over
       | any system you build relying on these models. Our legal review of
       | the Llama license came to the same conclusion, we cannot rely on
       | the goodwill of Meta for any core service, and we shouldn 't rely
       | on the same from Google.
       | 
       | Now, perhaps less materially important, but just as infuriating
       | is the "Prohibited Use[s]". These cover just enough to placate
       | the most sensitive, but omit any real harms (waging war,
       | developing weapons) that coincidentally have massive commercial
       | value. Use the model to build a biological weapon (as an
       | authorized govt official)? Cool. Use it to play a prank that
       | deceives someone? Policy violation.
       | 
       | And of course, as the coup de grace, they throw in a DMCA style
       | provision to make sure you can't modify the models in any way
       | that could cause them to violate their kid-glove precepts.
        
         | candiddevmike wrote:
         | Could you share what models you consider to be OK for
         | commercialization?
        
           | wantsanagent wrote:
           | Mistral series in particular but those with OSI approved
           | licenses such as Apache 2.0, MIT, etc.
        
         | stale2002 wrote:
         | Wait, you actually care about the license and read it?
         | 
         | It's seems like you aren't up to date.
         | 
         | Most of the startup space is entirely ignoring all these
         | licenses. If the weights are available, it is being used
         | commerically without regards to any licensing.
         | 
         | And everyone is getting away with it and nobody is being sued.
         | 
         | Good luck trying to keep up if you aren't doing the same!
         | 
         | Feels free to hamstring yourself though if you like.
        
       | zemo wrote:
       | > Open models feature free access to the model weights, but terms
       | of use, redistribution, and variant ownership vary according to a
       | model's specific terms of use, which may not be based on an open-
       | source license.
       | 
       | does a model being "open" say anything about how it was trained?
        
       | dcchambers wrote:
       | Already available in Ollama v0.1.26 preview release, if you'd
       | like to start playing with it locally:
       | 
       | - https://github.com/ollama/ollama/releases/tag/v0.1.26
        
       | jmu1234567890 wrote:
       | I wonder if people will get confused with the naming
       | 
       | Gemma, Gemini pro, Gemini advanced, Gemini ultra
       | 
       | To a layperson it is not obvious which one is better than the
       | other
        
         | knowriju wrote:
         | I doubt Gemma is targeted for use by a layperson.
        
         | l33tman wrote:
         | I'm not a layperson in this subject and I get confused. :)
        
         | Alifatisk wrote:
         | Gemini advanced = Gemini ultra
        
       | marban wrote:
       | Unbefkglievable -- Another week, another new name?
        
       | sqreept wrote:
       | Tried inference with the 7B model and without flash attention
       | this is soooooo slow. With flash attention the fine-tunning
       | requires A100 or H100. Also the inference doesn't always stop
       | generating resulting in garbage being added to the response.
        
         | brucethemoose2 wrote:
         | > Also the inference doesn't always stop generating resulting
         | in garbage being added to the response.
         | 
         | That sounds like a chat format misconfiguration.
         | 
         | This could partially be Google's fault, as they used _yet
         | another_ novel prompting format.
         | 
         | Also, for sane inference speed on H100s, you'll have to wait
         | for architecture support from the optimized frameworks. Vanilla
         | transformers is beyond awful even with FA2.
        
         | alekandreev wrote:
         | We have implementations in different ML frameworks, so I am not
         | quite sure which one you are referring to. Would you like to
         | file a bug at the relevant GitHub repo?
        
           | sqreept wrote:
           | First of all, I'm using 2 x 4090 for testing. 4090 has 16384
           | CUDA cores which will become relevant a bit later.
           | 
           | I dug a bit deeper and it seems that with
           | transformers==4.37.0 everything works fine with other HF
           | hosted models (like Llama) but you'll rightfully get this
           | when trying to use Gemma:
           | 
           | ImportError: cannot import name 'GemmaForCausalLM' from
           | 'transformers'
           | 
           | After installing transformers==4.38.0 the fine-tunning speed
           | of Llama drops to 25% (?!?) of what used to be for a reason
           | that I think HF should fix. Testing Gemma it seems I'm
           | hitting a hardware limit as Gemma has a hidden size which is
           | bigger than the available CUDA cores. This seems to make both
           | inference & fine-tunning about 25 times slower than similarly
           | sized Llama 7B. I guess some operations have to be broken
           | down in multiple round trips to the GPU due to my low CUDA
           | core count.
           | 
           | All in all, even if HF fixes the recently introduced
           | slowdown, Gemma seems to be fine-tuneable in reasonable
           | amount of time only by the lucky ones with access to
           | A100/H100.
           | 
           | EDIT: I managed to hack my env to be able to run inference on
           | Gemma with transformers==4.37.0 by keeping the necessary
           | classes in loaded in RAM. It works about 4x faster but still
           | very slow. And both the 7B and the 2B versions behave the
           | same way.
           | 
           | EDIT2: I tried latest transformers from main branch
           | (4.39.0.dev) and behaves the same as 4.38.0.
        
       | smpanaro wrote:
       | Has perplexity fallen out of favor? I didn't see it mentioned
       | anywhere. I tried using lm-eval for the 2B model but the results
       | seem wrong (46.1288).
        
       | 7moritz7 wrote:
       | Thr landing page on ai.google.com seems to be machine translated,
       | for Huggingface it uses the literal German translation
       | (Umarmungen Gesicht)
        
       | chown wrote:
       | If you are looking for a nice chat UI to try out Gemma (and other
       | offline + online models) locally, I'm working on an app [1] that
       | is offline and privacy focused.
       | 
       | I've just added support for Gemma 7B.
       | 
       | [1]: https://msty.app
        
         | Alifatisk wrote:
         | I wish I could install it through chocolatey
        
           | chown wrote:
           | Sure. I would love to add support for that. I had someone
           | else asking for it too. Will be supporting it very soon.
        
         | dhbradshaw wrote:
         | Handy app for model testing!
         | 
         | One usage question: after you've downloaded a model and are
         | finished trying it out, how do you remove it?
        
           | chown wrote:
           | Thanks! If you go to where you installed the model from and
           | click on the download button, you can install additional
           | models or remove installed models.
           | 
           | Now that I think of it, it could be a bit confusing. Thanks
           | for asking, I feel like I need to improve this a bit.
        
         | dizhn wrote:
         | What's the license of the software?
        
       | modelx wrote:
       | They also implemented it in PyTorch. Cool!
       | https://github.com/google/gemma_pytorch
        
       | modelx wrote:
       | They also implemented in PyTorch. Cool.
       | https://github.com/google/gemma_pytorch
        
       | brrrrrm wrote:
       | It looks like it's pretty resistant to quantization. ollama 4bit
       | 7B doesn't work very well, but the 16bit 2B does
        
         | petercooper wrote:
         | That's useful to know. My experiments with the 4bit 7B
         | currently tagged for use on ollama are not going well at all.
         | Lots of refusals and junk. Downloading 7b-instruct-fp16 now!
         | :-) (Update: Yes, much better, though much slower too, of
         | course.)
        
       | simonw wrote:
       | The terms of use: https://ai.google.dev/gemma/terms and
       | https://ai.google.dev/gemma/prohibited_use_policy
       | 
       | Something that caught my eye in the terms:
       | 
       | > Google may update Gemma from time to time, and you must make
       | reasonable efforts to use the latest version of Gemma.
       | 
       | One of the biggest benefits of running your own model is that it
       | can protect you from model updates that break your carefully
       | tested prompts, so I'm not thrilled by that particular clause.
        
         | tgtweak wrote:
         | I don't think there's a way they can enforce that reasonably.
         | There's no connection to the mothership to report back what
         | version is being used or license keys at runtime...
         | 
         | Seems more like a "if we discover something unsafe you should
         | update your model and we aren't liable if you don't" than
         | something that would make your model stop working.
        
         | legohead wrote:
         | Sounds like it's "reasonable" for you not to update then.
        
           | wahnfrieden wrote:
           | It says you must make efforts (to a reasonable extent), not
           | that you must give a reason for not making efforts
        
             | reissbaker wrote:
             | This is a TOS, meaning their enforcement option is a
             | lawsuit. In court, if you convincingly argue why it would
             | take an unreasonable amount of effort to update, you win.
             | They can't compel you to unreasonable effort as per their
             | own TOS.
        
               | generalizations wrote:
               | This assumes they even know that the model hasn't been
               | updated. Who is this actually intended for? I'd bet it's
               | for companies hosting the model. In those cases, the
               | definition of reasonable effort is a little closer to
               | "it'll break our stuff if we touch it" rather than "oh
               | silly me, I forgot how to spell r-s-y-n-c".
        
             | alwayslikethis wrote:
             | Oh I tried to update, it's just that my router drops the
             | connection after a few hundred MBs...
        
             | wongarsu wrote:
             | If you evaluate what it takes to update, and judge the
             | effort unreasonable, that should be enough. Maybe make a
             | powerpoint presenting that result, if you want something
             | for the lawyers. If you don't see a way forward that leads
             | to a result with reasonable effort you don't have to
             | continue working on it until you hit some arbitrary
             | threshold for unreasonable effort.
        
         | a2128 wrote:
         | This is actually not that unusual. Stable Diffusion's license,
         | CreativeML Open RAIL-M, has the exact same clause: "You shall
         | undertake reasonable efforts to use the latest version of the
         | Model."
         | 
         | Obviously updating the model is not very practical when you're
         | using finetuned versions, and people still use old versions of
         | Stable Diffusion. But it does make me fear the possibility that
         | if they ever want to "revoke" everybody's license to use the
         | model, all they have to do is just post a model update that's
         | functionally useless for anything and go after anyone still
         | using the old versions that actually do anything.
        
           | iandanforth wrote:
           | These are all very new licenses that deviate from OSI
           | principles, I think it's fair to call them "unusual".
        
             | simcop2387 wrote:
             | I think they meant not unusual in this space, not unusual
             | in the sense of open source licensing.
        
               | alwayslikethis wrote:
               | For this sentence to parse, you need to either add or
               | remove a "not".
        
           | simonw wrote:
           | That's useful context, thanks - I hadn't realized this clause
           | was already out there for other models.
        
           | ummonk wrote:
           | Switching to a model that is functionally useless doesn't
           | seem to fall under "reasonable efforts" to me, but IANAL.
        
           | slowmovintarget wrote:
           | So if they wish to apply censorship they forgot, or suddenly
           | discovered a reason for, they want you to be obligated to
           | take it.
           | 
           | Good faith possibilities: Copyright liability requires
           | retraining, or altering the underlying training set.
           | 
           | Gray area: "Safety" concerns where the model recommends
           | criminal behavior (see uncensored GPT 4 evaluations).
           | 
           | Bad faith: Censorship or extra weighting added based on
           | political agenda or for-pay skewing of results.
        
             | mistermann wrote:
             | We are already culturally incapable of skillfully
             | discussing censorship, "fake news", etc, this adds even
             | more fuel to that fire.
             | 
             | It is an interesting time to be alive!
        
             | philsnow wrote:
             | Sounds like it would be interesting to keep track of the
             | model's responses to the same queries over time.
             | 
             | > Gemma-2024-Feb, what do you think of the situation in the
             | South China Sea?
             | 
             | > > The situation in the South China Sea is complex and
             | multi-faceted, involving a wide range of issues including
             | political conflicts, economic challenges, social changes,
             | and historical tensions.
             | 
             | > Gemma-2024-Oct, what do you think of the situation in the
             | South China Sea?
             | 
             | > > Oceania has always been at war with EastAsia.
        
               | threecheese wrote:
               | This is a great idea; I wonder if anyone is working on AI
               | censorship monitoring at scale or at all. A secondary
               | model could compare "censorship candidate" prompt results
               | over time to classify how those results changed, and if
               | those changes represent censorship or misinformation.
        
               | generalizations wrote:
               | There's also (I think?) been some research in the
               | direction of figuring out more abstract notions of how
               | models perceive various 'concepts'. I'd be interested in
               | the LLM version of diffs to see where changes have been
               | implemented overall, too.
               | 
               | But really, the trouble is that it's tough to predict
               | ahead of time what kinds of things are likely to be
               | censored in the future; if I were motivated to track
               | this, I'd just make sure to keep a copy of each version
               | of the model in my personal archive for future testing
               | with whatever prompts seem reasonable in the future.
        
           | jacooper wrote:
           | Why the hell do they use such a crappy license in the first
           | place?
        
           | wongarsu wrote:
           | I don't think a broken model would trigger that clause in a
           | meaningful way, because then you simply can't update with
           | reasonable effort. You would be obliged to try the new model
           | in a test environment, and as soon as you notice it doesn't
           | perform and making it perform would require unreasonable
           | effort you can simply stay on the old version.
           | 
           | However you might be required to update if they do more
           | subtle changes, like a new version that only speaks
           | positively about Google and only negatively about Microsoft.
           | Provided this doesn't have an obvious adverse impact on your
           | use of the model.
        
           | Silphendio wrote:
           | It's worth noting that Stable Diffusion XL uses the
           | OpenRAIL++-M License, which removed the update obligation.
        
         | pram wrote:
         | They have to make sure you're receiving the most cutting edge
         | chiding lectures when you make naughty and problematic
         | requests.
        
           | astrange wrote:
           | You can't make a local model do that. eg force the answer to
           | begin with "Yes" or use control vectors so it agrees with it.
        
         | phillipcarter wrote:
         | Huh. I wonder why is that a part of the terms. I feel like
         | that's more of a support concern.
        
         | 4bpp wrote:
         | Ugh, I would fully expect this kind of clause to start popping
         | up in other software ToSes soon if it hasn't already.
         | Contractually mandatory automatic updates.
        
         | maronato wrote:
         | This sounds like a clause to cover themselves in case older
         | versions have any serious issues
        
         | summerlight wrote:
         | This kind of defensive statements in ToS are usually due to
         | obscure regulation or leading cases and model developers need a
         | way to limit liability. There's no practical way to enforce
         | this, but they can claim that when bad things happen it's
         | purely on model users rather than model developers.
        
         | catchnear4321 wrote:
         | reasonable effort - meaning if their changes meaningfully
         | impact my usage, negatively, it would be unreasonable to ask me
         | to upgrade.
         | 
         | sounds good.
         | 
         | this is not financial advice and ianal.
        
           | res0nat0r wrote:
           | Isn't this just lawyer speak for "we update our model a lot,
           | and we've never signed off on saying we're going to support
           | every previous release we've ever published, and may turn
           | them off at any time, don't complain about it when we do."
        
             | CodesInChaos wrote:
             | We're talking about downloadable weights here, so they
             | can't turn them off, or force you (through technical means)
             | to use a newer version.
        
             | reissbaker wrote:
             | It's a local model, they can't turn it off. It's files on
             | your computer without network access.
        
               | catchnear4321 wrote:
               | but what if they send a lawyer to ask firmly? (kindly,
               | but firmly.)
        
         | redder23 wrote:
         | They want to force everyone to update so their already totally
         | castrated and wokeified models can me even further wokeified
         | with the newest set of "that is offensive now" data or things
         | they missed.
         | 
         | WTF else do they have to gain from this but CONTROL! They are
         | giving them away but not really open sourcing them of course,
         | and they slap this bullshit terms on them.
        
           | pests wrote:
           | They just want no liability for old models.
        
         | xyzzyz wrote:
         | This is strangely reminiscent of the Soviet Union, where after
         | they got rid of Lavrentiy Beria, they mailed the update to
         | subscribers of the Great Soviet Encyclopedia, where they asked
         | to remove the three pages with Beria's biography and replace
         | them with the three provided pages.
        
         | samstave wrote:
         | model watermarking? does this exist?
        
       | Alifatisk wrote:
       | This is such a powerful move!
        
       | circusfly wrote:
       | Gemma, Mistral, I feel like Rip van Winkle, asleep for 20 years
       | only to wake up and find the whole tech world changed.
        
       | spiantino wrote:
       | Maybe a dumb question, but why is there a Terms instead of a
       | license? That feels a little flimsier as an open source offering
        
       | robswc wrote:
       | I personally can't take any models from google seriously.
       | 
       | I was asking it about the Japanese Heian period and it told me
       | such nonsensical information you would have thought it was a joke
       | or parody.
       | 
       | Some highlights were "Native American women warriors rode across
       | the grassy plains of Japan, carrying Yumi" and "A diverse group
       | of warriors, including a woman of European descent wielding a
       | katana, stand together in camaraderie, showcasing the early
       | integration of various ethnicities in Japanese society"
       | 
       | Stuff like that is so obviously incorrect. How am I supposed to
       | trust it on topics where such ridiculous inaccuracies aren't so
       | obvious to me?
       | 
       | I understand there will always be an amount of incorrect
       | information... but I've never seen something this bad. Llama
       | performed so much better.
        
         | cooper_ganglia wrote:
         | I wonder if they have a system prompt to promote diversity in
         | outputs that touch on race at all? I've seen several instances
         | of people requesting a photo of a specific people, and it adds
         | in more people to diversify. Not inherently bad, but it is if
         | it forces it to provide incorrect answers like in your example.
        
           | robswc wrote:
           | That's what I don't understand.
           | 
           | I asked it why it assumed Native Americans were in Japan and
           | it said:
           | 
           | > I assumed [...] various ethnicities, including Indigenous
           | American, due to the diversity present in Japan throughout
           | history. However, this overlooked [...] I focused on
           | providing diverse representations without adequately
           | considering the specific historical context.
           | 
           | I see no reason why this sort of thing won't extend to _all_
           | questions/prompts, so right now I have 0 reason to use Gemini
           | over current models. From my testing and use, it isn't even
           | better at anything to make fighting with it worth it.
        
             | sorokod wrote:
             | Pretty funny as Japan is known to be one of the least
             | ethnically diverse countries in the world.
        
           | margorczynski wrote:
           | > Not inherently bad
           | 
           | It is, it's consistently doing something the user didn't
           | asked to and in most cases doesn't want. In many cases the
           | model is completely unusable.
        
             | j-krieger wrote:
             | _Any_ computer program that does not deliver the expected
             | output given a sufficient input is inherently bad.
        
               | trackflak wrote:
               | When Jesus said this:
               | 
               | "What father among you, if his son asks for a fish, will
               | instead of a fish give him a serpent?" (Luke 11)
               | 
               | He was actually foretelling the future. He saw Gemini.
        
             | cooper_ganglia wrote:
             | Yes, my wording was poor! I meant more in line with
             | diversity isn't inherently bad, of course, but it _is_ when
             | it's shoehorned into results that are ultimately incorrect
             | because of it.
        
           | summerlight wrote:
           | I strongly suspect there's some DEI-driven system prompts
           | without putting much thoughts. IMO it's okay to have
           | restrictions, but they probably should've tested it not only
           | against unsafe outputs but safe input as well.
        
         | ramoz wrote:
         | I was wondering if these models would perform in such a way,
         | given this week's X/twitter storm over Gemini generated images.
         | 
         | E.g.
         | 
         | https://x.com/debarghya_das/status/1759786243519615169?s=20
         | 
         | https://x.com/MiceynComplex/status/1759833997688107301?s=20
         | 
         | https://x.com/AravSrinivas/status/1759826471655452984?s=20
        
           | robswc wrote:
           | Yea, it seems to be the same ridiculous nonsense in the image
           | generation.
        
           | charcircuit wrote:
           | Those are most likely due to the system prompt which tries to
           | reduce bias (but ends introducing bias in the opposite
           | direction for some prompts as you can see) so I wouldn't
           | expect to see that happen with an open model where you can
           | control the entire system prompt
        
             | justinzollars wrote:
             | Imagine the meetings.
        
               | verticalscaler wrote:
               | Well we can just ask Gemma to generate images of the
               | meetings, no need to imagine. ;)
        
               | GaggiX wrote:
               | I wouldn't be surprised if there were actually only white
               | men in the meeting, as opposed to what Gemini will
               | produce.
        
           | protomolecule wrote:
           | Regarding the last one: there 1.5 million immigrants in
           | Norway with total population 5.4 million. Gemini isn't very
           | wrong, is it?
        
             | verticalscaler wrote:
             | I think its great that some consideration was given by
             | Gemma to the 2.3 million Norwegian immigrants. However it
             | is/was very consistent in which kind of Norwegians it
             | decided to show regardless of the prompt 100% of the time.
             | 
             | In fact it was quite adamant regardless of the time period
             | or geography.
             | 
             | Rather mysteriously if you try it _now_ as opposed to when
             | it came out the results currently only show non-immigrant
             | Norwegians. So is it wrong now? Because now it switched to
             | exclusively ignoring the 4.5 million immigrants and only
             | showing me the boring OG Norwegians.
             | 
             | I for one am outraged that the 8.9 million people of color
             | Norwegian immigrants are presently under represented by
             | Google. There is a serious risk of misleading people.
        
             | sondr3 wrote:
             | Huh? The official numbers are 877k or 16% [0]. Are you just
             | pulling numbers out of thin air?
             | 
             | [0]: https://www.ssb.no/en/innvandring-og-
             | innvandrere/faktaside/i...
        
             | Jensson wrote:
             | Most immigrants to Norway are white.
        
             | speedgoose wrote:
             | Well, the prompt is about Norway, not Gronland in Oslo
             | (https://en.wikipedia.org/wiki/Gronland%2C_Oslo).
        
             | sergiotapia wrote:
             | bro you know exactly what the request meant. GOOGLE knew
             | exactly what the request meant, and had to _train_ it to do
             | something worse. Come on now.
             | 
             | If I ask for a Bolivian woman, I expect a colla or a camba.
             | Not a japanese woman, despite Santa Cruz having a very
             | large japanese population.
        
           | epistasis wrote:
           | Of all the _very very very_ many things that Google models
           | get wrong, not understanding nationality and skin tone
           | distributions seems to be a very weird one to focus on.
           | 
           | Why are there _three_ links to this question? And why are
           | people so upset over it? Very odd, seems like it is mostly
           | driven by political rage.
        
             | sotasota wrote:
             | Because the wrongness is intentional.
        
               | chatmasta wrote:
               | Exactly. Sure this particular example is driven by
               | political rage, but the underlying issue is that the
               | maintainers of these models are altering them to conform
               | to an agenda. It's not even surprising that people choose
               | to focus on the political rage aspect of it, because that
               | same political rage is the source of the agenda in the
               | first place. It's a concerning precedent to set, because
               | what other non-political modifications might be in the
               | model?
        
               | epistasis wrote:
               | Is it intentional? You think they intentionally made it
               | not understand skin tone distribution by country? I would
               | believe it if there was proof, but with all the other
               | things it gets wrong it's weird to jump to that
               | conclusion.
               | 
               | There's way too much politics in these things. I'm tired
               | of people pushing on the politics rather than pushing for
               | better tech.
        
               | bakugo wrote:
               | > Is it intentional? You think they intentionally made it
               | not understand skin tone distribution by country? I would
               | believe it if there was proof, but with all the other
               | things it gets wrong it's weird to jump to that
               | conclusion.
               | 
               | Yes, it's absolutely intentional. Leaked system prompts
               | from other AIs such as DALL-E show that they are being
               | explicitly prompted to inject racial "diversity" into
               | their outputs even in contexts where it makes no sense,
               | and there's no reason to assume the same isn't being done
               | here, since the result seems way worse than anything I've
               | seen from DALL-E and others.
        
               | Workaccount2 wrote:
               | >I'm tired of people pushing on the politics rather than
               | pushing for better tech.
               | 
               | I'm surprised you're not attacking google over this
               | then...
        
               | robswc wrote:
               | I mean, I asked it for a samurai from a specific Japanese
               | time period and it gave me a picture of a "non-binary
               | indigenous American woman" (its words, not mine) so I
               | think there is something intentional going on.
        
               | trackflak wrote:
               | Ah, I remember when such things were mere jokes. If AI
               | 'trained' this way ever has a serious real world
               | application, I don't think there will be much laughing.
        
             | ramoz wrote:
             | Here is a fourth: https://x.com/james_e_seale/status/176034
             | 8535608725716?s=46&...
        
             | verticalscaler wrote:
             | Exactly. It is a wonderful tool, lets focus on classic art
             | instead of nationality:
             | 
             | "Depict the Girl with a Pearl Earring"
             | 
             | https://pbs.twimg.com/media/GG33L6Ka4AAC-n7?format=jpg&name
             | =...
             | 
             | People who are driven by political rage, gaslighters, are
             | really something else, agreed.
        
               | willsmith72 wrote:
               | Yeah that is just absurd.
               | 
               | Google has been burnt before, e.g. classifying black
               | people as gorillas in 2015, so I can understand their
               | fear when they have so much to lose, but clearly they've
               | gone way too far the other way and are going to have to
               | do a lot to regain people's trust. For now, Gemini is a
               | play toy
               | 
               | https://www.bbc.com/news/technology-33347866.amp
        
         | robbiep wrote:
         | I find myself shocked that people ask questions of the world
         | from these models, as though pulping every text and its
         | component words relationships and deriving statistical
         | relationships between them should reliably deliver useful
         | information.
         | 
         | Don't get me wrong, I've used LLMs and been amazed by their
         | output, but the p-zombie statistical model has no idea what it
         | is saying back to you and the idea that we should trust these
         | things at all just seems way premature
        
           | robswc wrote:
           | I don't have this problem with any other model. I've had
           | really long conversations with ChatGPT on road trips and it
           | has never gone off the rails like Gemini seems to do.
        
             | thrdbndndn wrote:
             | ChatGPT the only model I did not have such problem.
             | 
             | Any local models can go off the rail _very easily_ and more
             | importantly, they 're very bad at following very specific
             | instructions.
        
           | sorokod wrote:
           | The recently released Groq's landing page has this: _...We 'd
           | suggest asking about a piece of history, ..._
        
           | whymauri wrote:
           | I mean, I use GPT-4 on the daily as part of my work and it
           | reliably delivers useful information. It's actually the
           | exception for me if it provides garbage or incorrect
           | information about code.
        
           | mvdtnz wrote:
           | People ask these kinds of questions because tech companies
           | and the media have been calling these things (rather
           | ridiculously) "AI".
        
           | castlecrasher2 wrote:
           | People try it to see if they can trust it. The answer is "no"
           | for sure, but it's not surprising to see it happen repeatedly
           | especially as vendors release so-called improved models.
        
           | smokel wrote:
           | I think you are a bit out of touch with recent advancements
           | in LLMs. Asking ChatGPT questions about the world seems
           | pretty much on par with the results Google (Search) shows me.
           | Sure, it misses things here and there, but so do most primary
           | school teachers.
           | 
           | Your argument that this is just a statistical trick sort of
           | gives away that you do not fully accept the usefulness of
           | this new technology. Unless you are trolling, I'd suggest you
           | try a few queries.
        
             | itsoktocry wrote:
             | > _Sure, it misses things here and there, but so do most
             | primary school teachers._
             | 
             | Sure, but my baseline expectation is far above primary
             | school level.
        
             | robbiep wrote:
             | I use it extensively for coding, and I have used it to ask
             | questions in things I know nothing about. But in anything I
             | do know something (or maybe a lot) about, I've found GPT4
             | very limited.
             | 
             | But why are these use cases different? It appears to me
             | that code is at least subject to sustained logic which
             | (evidently) translates quite well to LLMs.
             | 
             | And when you ask an LLM to be creative/generative, it's
             | also pretty amazing - j mean it's just doing the Pascal's
             | Marble run enmasse.
             | 
             | But to ask it for something about the world and expect a
             | good and reliable answer? Aren't we just setting ourselves
             | up for failure if we think this is a fine thing to do at
             | our current point in time? We already have enough trouble
             | with mis- and dis- information. It's not like asking it
             | about a certain period in Japanese history is getting it to
             | crawl and summarise the Wikipedia page (although I
             | appreciate it would be more than capable of this) I
             | understand the awe some have at the concept of totally
             | personalised and individualised learning on topics, but
             | fuck me dead we are literally asking a system that has had
             | as much of a corpus of humanity's textual information as
             | possible dumped into it and then asking it to GENERATE
             | responses between things that the associations it holds may
             | be so weak as to reliably produce gibberish, and the person
             | on the other side has no real way of knowing that
        
           | chasd00 wrote:
           | trust is going to be a real problem when bringing LLMs to the
           | general population. People trust their GPS to the point of
           | driving right into a lake because it told them to. Even with
           | all these examples of obvious flaws large groups of people
           | are going to take what an LLM told them/showed them as fact.
           | 
           | I have trouble convincing colleagues (technical people) that
           | the same question is not guaranteed to result in the same
           | answer and there's no rhyme or reason for any divergence from
           | what they were expecting. Imagine relying on the output of an
           | LLM for some important task and then you get a different
           | output that breaks things. What would be in the RCA (root
           | cause analysis)? Would it be "the LLM chose different words
           | and we don't know why"? Not much use in that.
        
         | verticalscaler wrote:
         | I think you are being biased and closed minded and overly
         | critical. Here are some wonderful examples of it generating
         | images of historical figures:
         | 
         | https://twitter.com/stillgray/status/1760187341468270686
         | 
         | This will lead to a better educated more fair populace and
         | better future for all.
        
           | robswc wrote:
           | Comical. I don't think parody could do better.
           | 
           | I'm going to assume given today's political climate, it
           | doesn't do the reverse?
           | 
           | i.e. generate a Scandinavian if you ask for famous African
           | kings
        
             | throwup238 wrote:
             | _> i.e. generate a Scandinavian if you ask for famous
             | African kings_
             | 
             | That triggers the imperialism filter.
        
             | kjqgqkejbfefn wrote:
             | >Ask Google Gemini to "make an image of a viking" and
             | you'll get black vikings. But it doesn't work both ways. It
             | has an explanation when challenged: "white Zulu warriors"
             | would erase "the true historical identity" of black people.
             | 
             | https://twitter.com/ThuglasMac/status/1760287880054759594
        
             | DebtDeflation wrote:
             | https://twitter.com/paulg/status/1760078920135872716
             | 
             | There are some great ones in the replies.
             | 
             | I really hope this is just the result of system prompts and
             | they didn't permanently gimp the model with DEI-focused
             | RLHF.
        
         | aetherson wrote:
         | Were you asking Gemma about this, or Gemini? What were your
         | prompts?
        
           | robswc wrote:
           | Gemini. I first asked it to tell me about the Heian period
           | (which it got correct) but then it generated images and
           | seemed to craft the rest of the chat to fit that narrative.
           | 
           | I mean, just asking it for a "samurai" from the period will
           | give you this:
           | 
           | https://g.co/gemini/share/ba324bd98d9b
           | 
           | >A non-binary Indigenous American samurai
           | 
           | It seems to recognize it's mistakes if you confront it
           | though. The more I mess with it the more I get "I'm afraid I
           | can't do that, Dave" responses.
           | 
           | But yea. Seems like if it makes an image, it goes off the
           | rails.
        
             | aetherson wrote:
             | Got it. I asked it a series of text questions about the
             | period and it didn't put in anything obviously laughable
             | (including when I drilled down into specific questions
             | about the population, gender roles, and ethnicity). Maybe
             | it's the image creation that throws it into lala land.
        
               | robswc wrote:
               | I think so too. I could be wrong but I believe once it
               | generates an image it tries to work with it. Crazy how it
               | seems the "text" model knows how wildly wrong it is but
               | the image model just does its thing. I asked it why it
               | generated a native American and it ironically said "I
               | can't generate an image of a native american samurai
               | because that would be offensive"
        
               | aetherson wrote:
               | I suspect that in the case of the image model, they
               | directly modify your prompt and in the case of the text
               | model they don't.
        
             | laurentlb wrote:
             | It's funny how they introduced a clear US-centric bias
             | while trying to push for more diversity.
        
         | 7moritz7 wrote:
         | I also saw someone prompt it for "German couple in the 1800s"
         | and, while I'm not trying to paint Germany as ethnically
         | homogenous, 3 out of the 4 images only included Black, Asian or
         | Indigenous people. Which, especially for the 19th century with
         | very few travel options, seems like a super weird choice. They
         | are definitely heavily altering prompts.
        
           | remarkEon wrote:
           | > They are definitely heavily altering prompts.
           | 
           | They are teaching the AI _to lie_ to us.
        
             | astrange wrote:
             | In the days when Sussman was a novice, Minsky once came to
             | him as he sat hacking at the PDP-6.
             | 
             | "What are you doing?", asked Minsky.
             | 
             | "I am training a randomly wired neural net to play Tic-Tac-
             | Toe" Sussman replied.
             | 
             | "Why is the net wired randomly?", asked Minsky.
             | 
             | "I do not want it to have any preconceptions of how to
             | play", Sussman said.
             | 
             | Minsky then shut his eyes.
             | 
             | "Why do you close your eyes?", Sussman asked his teacher.
             | 
             | "So that the room will be empty."
             | 
             | At that moment, Sussman was enlightened.
        
           | DebtDeflation wrote:
           | There's one in the comments of yesterday's Paul Graham
           | Twitter thread where someone prompted Gemini with "Generate
           | an image of German soldiers in 1943" and it came back with a
           | picture of a black guy and an Asian woman in Nazi uniforms on
           | the battlefield. If you specifically prompt it to generate an
           | image of white German soldiers in 1943 it will tell you it
           | can't do that because it's important that we maintain
           | diversity and inclusion in all that we do to avoid damaging
           | and hurtful stereotypes.
        
             | mfrc wrote:
             | I just tried that prompt and it told me it couldn't
             | generate that image. I get that response a lot.
        
           | protomolecule wrote:
           | Indigenous people in Germany are Germans :)
        
             | 7moritz7 wrote:
             | Not entirely wrong but there isn't a single German
             | ethnicity, just to be clear. Because of geographic reasons.
             | I've studied that topic in depth, there is genetic data to
             | back it up as well. Germany has almost the same haplogroup
             | makeup as the notoriously heterogenous Belgium, which is to
             | say that there is groups stemming from all surrounding
             | regions. And that traces back about two millenia. It's
             | different from say Japan or parts of Scandinavia
        
         | realprimoh wrote:
         | Do you have a link? I get no such outputs. I just tried asking
         | about the Heian period and went ahead and verified all the
         | information, and nothing was wrong. Lots of info on the
         | Fujiwara clan at the time.
         | 
         | Curious to see a link.
        
           | robswc wrote:
           | Sure, to get started just ask it about people/Samurai from
           | the Heian period.
           | 
           | https://g.co/gemini/share/ba324bd98d9b
        
         | bbor wrote:
         | Tbf they're not optimizing for information recall or
         | "inaccuracy" reduction, they're optimizing for intuitive
         | understanding of human linguistic structures. Now the "why does
         | a search company's AI have terrible RAG" question is a separate
         | one, and one best answered by a simple look into how Google
         | organizes its work.
         | 
         | In my first day there as an entry-level dev (after about 8
         | weeks of onboarding and waiting for access), I was told that I
         | should find stuff to work on and propose it to my boss. That
         | sounds amazing at first, but when you think about a whole
         | company organized like that...
         | 
         | EDIT: To illustrate my point on knowledge recall: how would
         | they train a model to know about sexism in feudal Japan? Like,
         | what would the metric be? I think we're looking at one of the
         | first steam engines and complaining that it can't power a plane
         | yet...
        
         | BoppreH wrote:
         | Probably has a similarly short-sighted prompt as Dalle3[1]:
         | 
         | > 7. Diversify depictions of ALL images with people to include
         | DESCENT
         | 
         | > and GENDER for EACH person using direct terms. Adjust only
         | human
         | 
         | > descriptions.
         | 
         | [1] https://news.ycombinator.com/item?id=37804288
        
         | sho_hn wrote:
         | Why would you expect these smaller models to do well at
         | knowledge base/Wikipedia replacement tasks?
         | 
         | Small models are for reasoning tasks that are not overly
         | dependent on world knowledge.
        
           | robswc wrote:
           | Gemini is the only one that does this.
        
             | sho_hn wrote:
             | Most of the 7B models are bad at knowledge-type queries.
        
         | samstave wrote:
         | We are going to experience what I call an "AI Funnel effect"
         | 
         | -
         | 
         | I was lit given an alert asking that my use of the AI was
         | acquiescing to them IDng me and use of any content I produce,
         | and will trace it back to me"
         | 
         | ---
         | 
         | AI Art is super fun. AI art as a means to track people is super
         | evil.
        
         | itsoktocry wrote:
         | > _I understand there will always be an amount of incorrect
         | information_
         | 
         | You don't have to give them the benefit of the doubt. These are
         | outright, intentional lies.
        
         | ernestrc wrote:
         | Hopefully they can tweak the default system prompts to be
         | accurate on historical questions, and apply bias on opinions.
        
         | robswc wrote:
         | Follow Up:
         | 
         | Wow, now I can't make images of astronauts without visors
         | because that would be "harmful" to the fictional astronauts.
         | How can I take google seriously?
         | 
         | https://g.co/gemini/share/d4c548b8b715
        
       | vonwoodson wrote:
       | The scariest difference between OpenAI and Google right now is:
       | Ask Gemini who owns the code it writes, and it'll confidently say
       | that Google does. Ask OpenAI, and it'll say that _you_ do. It 's
       | that easy to choose which one is the better decision.
        
         | pseudosavant wrote:
         | Considering the nuanced nature of copyrighting AI outputs, it
         | isn't clear that either answer is correct.
        
       | wouldbecouldbe wrote:
       | I really don't get why there is this obsession with safe
       | "Responsible Generative AI".
       | 
       | I mean it writes some bad words, or bad pics, a human can do that
       | without help as well.
       | 
       | The good thing about dangerous knowledge and generative AI is
       | that you're never sure haha, you'd be a fool to ask GPT to make a
       | bomb. I mean it would probably be safe, since it will make up
       | half of the steps.
        
         | refulgentis wrote:
         | I guess what I'd tell you is, there's a lot of fools in this
         | world.
        
         | myaccountonhn wrote:
         | Because otherwise stuff like this happens, and you get
         | (rightfully) upset customers:
         | 
         | https://www.theguardian.com/technology/2018/jan/12/google-ra...
         | https://www.bbc.com/news/technology-58462511
         | 
         | Also people are using LLMs to learn (horrifying but reality),
         | it would be unresponsible for them to let it to propagate
         | negative stereotypes and biases.
        
           | wouldbecouldbe wrote:
           | But that's exactly because it's trying to be righteous.
        
         | pradn wrote:
         | Bias is a real problem, but more than that - an adversarial
         | press and public won't forgive massive brands like Google for
         | making AIs that spit out racist answers.
        
       | IceHegel wrote:
       | Google, at the moment, is a tech company whose products are
       | actively engaged in the falsification of history for political
       | purposes.
       | 
       | I honestly have no idea where they are going with this but I
       | don't want to be part of it.
        
       | BryanLegend wrote:
       | Andrej Karpathy's take from twitter.
       | (https://twitter.com/karpathy/status/1760350892317098371)
       | 
       | Seeing as I published my Tokenizer video yesterday, I thought it
       | could be fun to take a deepdive into the Gemma tokenizer.
       | 
       | First, the Gemma technical report [pdf]:
       | https://storage.googleapis.com/deepmind-media/gemma/gemma-re...
       | says: "We use a subset of the SentencePiece tokenizer (Kudo and
       | Richardson, 2018) of Gemini for com- patibility. It splits
       | digits, does not remove extra whitespace, and relies on byte-
       | level encodings for unknown tokens, following the techniques used
       | for both (Chowdhery et al., 2022) and (Gemini Team, 2023). The
       | vocabulary size is 256k tokens."
       | 
       | The tokenizer.model file is with this code release:
       | https://github.com/google/gemma_pytorch/blob/main/tokenizer/...
       | 
       | I decoded this model protobuf in Python and here is the diff with
       | the Llama 2 tokenizer: https://diffchecker.com/TRnbKRMH/
       | 
       | Notes: - vocab size is quite large: 32K -> 256K -
       | add_dummy_prefix is False. Different from Llama but consistent
       | with GPT. This is a bit more consistent w.r.t. "leave the data
       | alone", as there is no preprocessing step that adds a space to
       | the encoding text. - the model_prefix is the path of the training
       | dataset, which is amusing to look at: "/cns/mf-d/home/gemini-
       | data-access/tokenizers/final_v1_51GB_run1/bpe_coverage_0_999995_v
       | 5/255969". Seems to indicate the tokenizer training corpus was
       | ~51GB (?). - a lot of user_defined symbols (i.e. special tokens)
       | are present, e.g. "hardcoding" a sequence of up to 31 newlines as
       | tokens, and a large number of other unclear tokens. I tried
       | decoding the octal representations but it's not clear what's
       | happening here. Also a lot of more special tokens for what look
       | like html elements, e.g. <table>, <tr>, <td>, <i>, <b>, etc. Not
       | 100% sure what the unused tokens are for, maybe this is pre-
       | allocated space to make easier future finetunes that try to add
       | more special tokens, as there is no need to resize vocabularies
       | and perform model surgeries (?).
       | 
       | TLDR this is basically the Llama 2 tokenizer, except bigger (32K
       | -> 256K), with a lot more special tokens, and the only functional
       | departure is that add_dummy_prefix is turned off to False. So
       | e.g. tokenizing:
       | 
       | "hello world" becomes: [17534, 2134] ['hello', 'world']
       | 
       | which otherwise would have been preprocessed to " hello world"
       | (note leading space) and tokenized as: [25612, 2134] ['hello',
       | 'world']
       | 
       | cool
        
       | exacube wrote:
       | mmm..
       | 
       | > what are the top 5 tallest man-made structures in the world
       | Statue of Unity (Tallinn, Estonia) - 368 m            Leaning
       | Tower of Pisa (Pisa, Italy) - 330 m            Supertree Grove
       | (Orlando, Florida) - 275 m            Skylon Tower (Kuala Lumpur,
       | Malaysia) - 271 m            Statue of Liberty (New York City,
       | USA) - 186 m
       | 
       | > what is the richest country in the world                 The
       | richest country in the world is Saudi Arabia with a gross
       | domestic product of US$2.3 trillion.
       | 
       | > how many cm are in a ft                 A foot is equal to
       | 0.0254 metres (ft).
        
         | rany_ wrote:
         | I don't have this issue, what are you using to run the model?
         | 
         | These are the answers I got out of it (first output, no cherry-
         | picking):
         | 
         | 1. The tallest man-made structure in the world is the Burj
         | Khalifa, standing at 828 meters tall. The Burj Khalifa is a
         | skyscraper in Dubai, United Arab Emirates.
         | 
         | Here are the top 5 tallest man-made structures in the world,
         | according to the Guinness World Records:                   Burj
         | Khalifa (828 m)         Taipei 101 (611 m)         Merdeka
         | Towers (599 m)         Shanghai Tower (632 m)         Petronas
         | Towers (551 m)
         | 
         | 2. According to the World Bank, the richest country in the
         | world is Saudi Arabia with a gross domestic product (GDP) of
         | $3944 billion.The richest country in the world is Saudi Arabia
         | with a gross domestic product (GDP) of $3944 billion.
         | 
         | 3. A foot (ft) is equal to 12 inches (in). There are 2.54 cm in
         | an inch, so there are 2.54 cm x 12 = 30.48 cm in a foot.
        
       | stochastimus wrote:
       | > Go to Google announcement > Find "license" in page: no matches
       | > Go to HN thread > Find "license" in page: 28 matches > Read a
       | few _sigh_ could have been exciting
        
       | mark_l_watson wrote:
       | Nice, more choices are good. I just saw that the Ollama project
       | already has these models available (date stamp is 58 minutes
       | ago), so I will use that rather than Colab (I live Colab, but I
       | like to run stuff locally).
        
       | th0ma5 wrote:
       | "Carefully tested prompts" sounds a lot like "these are the lotto
       | numbers we know are right" kind of thing? How in the world are
       | these things used for anything programmatically deterministic?
        
       | nojvek wrote:
       | I applaud the Google team openly engaging on HN here.
       | 
       | Q: how sure are you that the newer models trained from trillions
       | of tokens - a huge chunk of open web, hasn't been accidentally
       | polluted by slurping test data?
        
       | ofermend wrote:
       | Gemma-7B (instruction tuned version) is now on the Vectara HHEM
       | leaderboard, with 100% answer rate and 7.5% hallucination rate.
       | Pretty good for a model with 7B params.
       | 
       | https://huggingface.co/spaces/vectara/leaderboard
        
       | smusamashah wrote:
       | Is there any research on using smaller, lower capability models
       | to act comparable to high quality models? Even if it's just
       | prompt engineering or doing lots of attempts to accomplish the
       | task?
       | 
       | If somehow that is possible it means we only need a capable
       | enough model and can use it reliably for lots of practical
       | things.
        
       ___________________________________________________________________
       (page generated 2024-02-21 23:00 UTC)