[HN Gopher] Hello OLMo: A truly open LLM
___________________________________________________________________
Hello OLMo: A truly open LLM
Author : tosh
Score : 366 points
Date : 2024-04-08 22:26 UTC (1 days ago)
(HTM) web link (blog.allenai.org)
(TXT) w3m dump (blog.allenai.org)
| blackeyeblitzar wrote:
| This is the only LLM that is exciting to me. Clearly, LLMs are
| powerful tools that may end up replacing search and even go much
| further than simple searches by performing the research for you
| and producing final answers. Closed models like those from Open
| AI (ironically) or Anthropic cannot be audited. When most users
| will end up blindly hitting Microsoft's Copilot button, which
| they are forcing OEMs to adopt, who's to say how the information
| a user gets is being curated or manipulated by OpenAI or
| Microsoft or whoever?
|
| We've already seen real world examples of severe bias injected
| into LLMs. For example, Google's Gemini had secret meta prompts
| that biased it towards certain types of answers and also caused
| it to produce hallucinated images that were funny but also
| dystopian (https://arstechnica.com/information-
| technology/2024/02/googl...). I don't think we can just let
| closed AI systems take over society when they can easily be
| manipulated by the model owners without transparency.
|
| What I like about AI2's approach with OLMo is that they are
| _actually open_ , not just trading on the marketing benefits of
| the word "open". Most "open" models are just open _weights_ not
| open _source_. That's like sharing an executable and not the
| source code. In my view, being open means that others have to be
| able to reproduce the final product (the model) if they wanted to
| and had the means (in terms of training hardware). It also means
| that they should be able to use whatever is provided freely for
| any purpose, rather than being subject to proprietary licensing.
| AI2 shares the training source code, training data, evaluation
| suite, and the model weights that they've produced by running the
| training process. It all uses the Apache license. And it's also
| interesting that they used AMD hardware to train this LLM rather
| than Nvidia /CUDA.
|
| Open weight models like Llama keep repeatedly catching up to the
| best closed models from OpenAI or Anthropic or others. My hope is
| that truly open models like OLMa keep developing quickly enough
| to also keep up. Lastly, I hope that regulation does not block
| open source private development of AI systems. These systems will
| be the vehicle for speech for much of society in the future, so
| blocking private AI systems is a lot like restricting speech. But
| leaving that aside, open development will also drive innovation
| and reducing competitive pressure will hurt innovation.
| blackeyeblitzar wrote:
| One thing I wanted to add and call attention to is the
| importance of licensing in open models. This is often
| overlooked when we blindly accept the vague branding of models
| as "open", but I am noticing that many open weight models are
| actually using encumbered proprietary licenses rather than
| standard open source licenses that are OSI approved
| (https://opensource.org/licenses). As an example, Databricks's
| DBRX model has a proprietary license that forces adherence to
| their highly restrictive Acceptable Use Policy _by referencing
| a live website hosting their AUP_
| (https://github.com/databricks/dbrx/blob/main/LICENSE), which
| means as they change their AUP, you may be further restricted
| in the future. Meta's Llama is similar
| (https://github.com/meta-llama/llama/blob/main/LICENSE ). I'm
| not sure who can depend on these models given this flaw.
| idle_zealot wrote:
| Do we even know if these licenses are binding? AFAIK we have
| no ruling on whether model weights are even eligible for
| copyright. They're machine-produced derivatives of other
| work, so it's not a guarantee that copyright protects them.
| blackeyeblitzar wrote:
| That's a great point and I hope more people speak up to
| treat models as just numerical derivative works so they
| aren't automatically granted these protections. It's better
| if society meaningfully debates this and chooses the right
| approach.
| gremlinunderway wrote:
| > For example, Google's Gemini had secret meta prompts that
| biased it towards certain types of answers and also caused it
| to produce hallucinated images that were funny but also
| dystopian (https://arstechnica.com/information-
| technology/2024/02/googl...).
|
| Such a bizarre take to call this "dystopian".
|
| The model happened to create some out-there pictures. I mean,
| it's no more outlandish then giant dragons and snakes and such
| being created yet the thought of a person of color being
| something historically inaccurate is this massive outcry
| against revisionism? Who cares?
|
| Besides, the article identifies the probable goal which was to
| eliminate very known biases in existing models (i.e. when
| generating "angry person" you mainly got black people). Clearly
| this one wasnt tuned well for that goal, but the objective is
| not only noble but absolutely should be required for anyone
| producing LLM models.
| blackeyeblitzar wrote:
| If I may explain: the dystopian part to me is the lack of
| transparency around training code, training data sources,
| tuning, meta prompting, and so forth. In Google's case,
| they're a large corporation that controls how much of society
| accesses information. If they're secretly curating what that
| information is, rather than presenting it as neutrally as
| they can, it does feel dystopian to me. I'd like transparency
| as a consumer of information, so I know to the extent
| possible, what the sources of information were or how I am
| being manipulated by choices the humans building these
| systems made.
|
| I appreciate the issue you're drawing attention to in the
| example you shared about images of an angry person. I think I
| agree that focused tuning for situations like that might be
| noble and I would be okay with a model correcting for that
| specific example you shared. But I also struggle with how to
| _clearly_ draw that line where such tuning may go too far,
| which is why I favor less manual biasing. But I disagree that
| such tuning should be _required_ , if you meant required _by
| the law_. Like with speech or art in general, I think anyone
| should be able to produce software systems that generate
| controversial or offensive speech or art. Individual
| consumers can choose what they want to interact with, and
| reject LLMs that don't meet their personal standards.
| lynx23 wrote:
| Right, "who cares" about the truth in our dystopian world?
| 1984 is apparently too long ago for people to remember the
| ministry of truth...
| simonw wrote:
| Pet peeve: Google's Gemini LLM model was not to blame for the
| image generation weirdness.
|
| That would be like blaming DALL-E weirdness on GPT-4.
|
| Unfortunately, Google marketing decided to slap the "Gemini"
| brand on both the end-user interface used to interact with the
| model AND the actual model itself, hence people constantly
| calling out Gemini-the-model for weird decisions made as part
| of Gemini-the-user-interface.
| yk wrote:
| Did anybody manage to get the entire prompt out of gemini, or
| what are you basing your claim on?
| simonw wrote:
| That's my point. The system prompt isn't part of the model
| - it's part of the UI system that wraps the model.
| michaelt wrote:
| _> That would be like blaming DALL-E weirdness on GPT-4._
|
| Actually when you trigger DALL-E through GPT-4 (i.e. with the
| LLM generating the prompt to give the diffusion model then
| returning the resulting image to the user) the LLM's system
| instructions [1] say _" 7. Diversify depictions of ALL images
| with people to always include always DESCENT and GENDER for
| EACH person using direct terms."_ and a bunch of stuff along
| those lines.
|
| In OpenAI's system this doesn't always trigger; if the user
| asks for an image of trash being collected, the user hasn't
| explicitly asked for any people to be depicted, so the LLM
| doesn't find anything in the prompt that needs diversity
| added. The trash-being-collected prompt gets passed to DALL-E
| unmodified, and the resulting image has all male workers.
|
| [1] https://raw.githubusercontent.com/spdustin/ChatGPT-
| AutoExper...
| simonw wrote:
| Yeah, I wrote about that last year:
| https://simonwillison.net/2023/Oct/26/add-a-
| walrus/#diversif...
|
| Again, that's not a GPT-4 thing: that's a ChatGPT interface
| running GPT-4 with DALL-E as a tool thing.
| espadrine wrote:
| > _Google 's Gemini LLM model was not to blame for the image
| generation weirdness. That would be like blaming DALL-E
| weirdness on GPT-4._
|
| The way I read the Gemini technical report, it seemed like,
| unlike GPT-4 vs DALL-E, Gemini was pretrained with multimodal
| outputs. Is that not the case?
| simonw wrote:
| Is that right? I didn't think Gemini was generating images
| directly, I assumed it was using a separate image
| generation tool.
|
| The paper here https://arxiv.org/pdf/2403.05530.pdf has a
| model card for Gemini 1.5 Pro that says:
| Output(s): Generated text in response to the input
| (e.g., an answer to the question, a summary of
| multiple documents, comparing documents/videos).
| espadrine wrote:
| Huh, that is true in both the model cards of Gemini 1.5
| Pro and Gemini 1.0.
|
| That feels like it runs counter to this statement from
| the Gemini 1.0 technical report[0]:
|
| > _Gemini models are trained to accommodate textual input
| interleaved with a wide variety of audio and visual
| inputs, such as natural images, charts, screenshots,
| PDFs, and videos, and they can produce text and image
| outputs_
|
| [0]: https://arxiv.org/pdf/2312.11805.pdf
| simonw wrote:
| Yeah what does that bit about "image outputs" mean I
| wonder?
| theshackleford wrote:
| > Open weight models like Llama keep repeatedly catching up to
| the best closed models from OpenAI or Anthropic or others.
|
| Since when? I've had the complete opposite experience.
| timmg wrote:
| Has their site been hugged-to-death or is it my hotel wifi?
| Havoc wrote:
| Notably "The Pile" doesn't seem to be part of the training data.
| So this might be more sound legally than many other "open" LLMs
| sgu999 wrote:
| For those also wondering: https://pile.eleuther.ai
|
| > The Pile is a 825 GiB diverse, open source language modelling
| data set that consists of 22 smaller, high-quality datasets
| combined together.
|
| By what's the legal complication with it?
| blackeyeblitzar wrote:
| It received DMCA takedowns:
| https://en.wikipedia.org/wiki/The_Pile_(dataset)
|
| > The Books3 component of the dataset contains copyrighted
| material compiled from Bibliotik, a pirate website. In July
| 2023, the Rights Alliance took copies of The Pile down
| through DMCA notices. Users responded by creating copies of
| The Pile with the offending content removed.
| simonw wrote:
| It is absolutely absolutely packed with unlicensed,
| copyrighted data.
|
| Books3 is the most notable example - nearly 200,000 pirated
| ebooks - but a lot of the rest of it is (unlicensed) scraped
| web data.
|
| The legal questions over whether this is a problem are
| currently still unresolved. Many people are also bothered by
| the ethical implications, which is a separate issue from the
| legal questions.
| 23B1 wrote:
| Ironic that even our everyday governance has little
| 'Alignment' between ethics and law.
| jacobn wrote:
| Ethics are a lot more nuanced and change a lot faster
| than laws.
|
| Heck, a large fraction of ethics seem to be so fickle
| that they're subject to potential revision by every
| generation.
|
| In fact, I'd argue that those revisions are a significant
| portion of how one generation distinguishes itself from
| their parents.
|
| Yet strangely every generation feels like they have
| arrived at a set of "universal laws" in their ethics.
| KarlKemp wrote:
| In this case, both ethics and the law are murky.
|
| Pretty excellent alignment, for once?
| ben_w wrote:
| We wouldn't need lawyers if all the rules could be
| expressed as "be ethical".
| 23B1 wrote:
| The lawyers certainly agree with you on that!
| codazoda wrote:
| I took a quick peak at this last time it was mentioned and it
| had dozens of my own repos of unlicensed source code in it.
| All of that was published on GitHub and made public, but much
| of it has no license specified.
| mysteria wrote:
| Is this one of the first LLMs of note that was successfully
| trained on AMD GPUs? I wonder how seamless the process was and if
| they faced any issues there.
| sanxiyn wrote:
| Databricks (who also participated in OLMo, it's probably the
| same codebase) trained on AMD before, see 2023 post
| https://www.databricks.com/blog/amd-mi250. It was probably
| seamless, as any issues were fixed by Databricks in 2023.
| otuutti wrote:
| https://huggingface.co/LumiOpen/Poro-34B Also fully trained on
| LUMI.
|
| (more models here: https://huggingface.co/LumiOpen)
| lostmsu wrote:
| Too bad they did not put any comparison tables into the blog
| post.
| mysteria wrote:
| They're on Hugging Face. Interestingly enough they don't
| compare it against Mistral 7B.
|
| https://huggingface.co/allenai/OLMo-7B
| polygamous_bat wrote:
| I commented this somewhere else, but word in the ether is
| that OLMo is not actually that good of a model given its size
| and compute budget. I am not entirely sure why, and it's
| still good to have the full recipe for at least one model out
| in the open, but the current OLMo definitely is a cautionary
| tale for people training their own model.
| refulgentis wrote:
| This is 2 months old.
| btbuildem wrote:
| And yet it's topical and relevant.
| timsuchanek wrote:
| Great to see e2e openness. One of the only true OSS models out
| there, vs most of the models releasing the binaries (weights).
| Surprised that they didn't mention Mistral 7b in the comparisons.
| sanxiyn wrote:
| Falcon also released open dataset.
| vjeux wrote:
| If I read the license correctly, it seems that if you want to use
| the LLM, you need to tell the authors what you are doing with it.
|
| Am I reading this correctly? https://allenai.org/licenses/impact-
| mr
|
| "Derivative Impact Reports. AI2 seeks to encourage transparency
| around Derivatives through the use of Derivative Impact Reports,
| available here. Before releasing a Model Derivative or Data
| Derivative, You will share with AI2 the intended use(s) of Your
| Derivative by completing a Derivative Impact Report or otherwise
| providing AI2 with substantially similar information in writing.
| You agree that AI2 may publish, post, or make available such
| information about Your Derivative for review by the general
| public.
|
| You will use good faith efforts to be transparent about the
| intended use(s) of Your Derivatives by making the information
| freely available to others who may access or use Your
| Derivatives. You acknowledge that Derivative Impact Reports are
| not intended to penalize any good faith disclosures about
| Derivatives. Accordingly, if You initiate or participate in any
| lawsuit or other legal action against a Third Party based on
| information in such Third Party's Derivative Impact Report, then
| this MR Agreement will terminate immediately as of the date such
| lawsuit or legal action is filed or commenced."
| blackeyeblitzar wrote:
| Interesting. I recall seeing Apache licenses in their official
| repositories. I wonder how these additional restrictions get
| pulled in.
| mkl wrote:
| Does that apply to this model? On huggingface it says "License:
| The code and model are released under Apache 2.0."
| whimsicalism wrote:
| no, this is apache license-d. yes it is confusing that AI2 has
| custom licenses but they aren't using them here
| lolinder wrote:
| It looks like the weights [0] and code [1] are Apache
| licensed, but the training data [2] is using the license that
| OP is quoting from.
|
| [0] https://huggingface.co/allenai/OLMo-7B
|
| [1] https://github.com/allenai/OLMo
|
| [2] https://huggingface.co/datasets/allenai/dolma
| 6gvONxR4sf7o wrote:
| Is the license not transitive? Like could your impact
| report be "i want to remove this part of the license?"
| gardnr wrote:
| I like the way you think but 2b might prevent that.
| Chris2048 wrote:
| > if You initiate or participate in any lawsuit or other legal
| action ... this MR Agreement will terminate immediately
|
| Is this legal? Restricting legal options by making an agreement
| dependant on it?
| jrm4 wrote:
| Weird. So even if these things are well intentioned, seems like
| they don't have any teeth.
|
| Are there any out there that have licenses which are (dare I
| say) simpler, like the GPL?
| kikoreis wrote:
| What does the risk classification applied to the dataset actually
| mean? The licensing page [1] AI2 provides for their datasets is
| really nice but it doesn't really explain [2] what risk means in
| the context.
|
| Does it mean "risk that the items contained in this set are
| licensed in a manner incompatible with its use in a training
| dataset"?
|
| [1] https://allenai.org/impact-license
|
| [2] "the AI2 ImpACT Licenses are artifact-agnostic and are
| instead structured according to the risk level we've assigned a
| given artifact"
| pksebben wrote:
| It's odd. Running inference on this (and other models in its
| class) and I keep running into a "repeating token" situation with
| moderate-to-long context windows.
|
| It feels almost as if, during inference, the model hits some
| format of local minimum that it careens around, and while
| temperature _seems_ to affect this - it doesn 't really _fix_ it.
|
| at temp 0.2:
|
| > [{'generated_text': 'What follows is a transcript of a talk
| between a mysterious man and an agent of a bureau dedicated to
| investigating things which is typically referred to by some
| assortment of letters in the alphabet. The identity, origins, and
| motivations of the man were not known then and remain so. This
| transcript is not meant to scare, but provided simply to
| enlighten the concerned citizen of all the various and sundry
| things that may or may not go bump in the night. AGENT: Please
| state your name for the record. MYSTERIOUS STRANGER: I am the
| man. AGENT: Thank you. I am an agent of the Bureau of
| Investigation. I am here to investigate the following: 1. The
| following: 2. The following: 3. The following: 4. The following:
| 5. The following: 6. The following: 7. The following: 8. The
| following: 9. The following: 10. The following: 11. The
| following: 12. The following: 13. The following: 14. The
| following: 15. The following: 16. The following: 17. The
| following: 18. The following: 19. The following: 20. The
| following: 21. The following: 22. The following: 23. The
| following: 24. The following'}]
|
| ...and at temp 0.4:
|
| > [{'generated_text': 'What follows is a transcript of a talk
| between a mysterious man and an agent of a bureau dedicated to
| investigating things which is typically referred to by some
| assortment of letters in the alphabet. The identity, origins, and
| motivations of the man were not known then and remain so. This
| transcript is not meant to scare, but provided simply to
| enlighten the concerned citizen of all the various and sundry
| things that may or may not go bump in the night. AGENT: Please
| state your name for the record. MYSTERIOUS STRANGER: My name is
| not important. AGENT: My name is Agent Cyanide. MYSTERIOUS
| STRANGER: Agent Cyanide. AGENT: I am an agent of the Bureau of
| Investigations. MYSTERIOUS STRANGER: The Bureau of
| Investigations. AGENT: The Bureau of Investigations. MYSTERIOUS
| STRANGER: The Bureau of Investigations. AGENT: The Bureau of
| Investigations. MYSTERIOUS STRANGER: The Bureau of
| Investigations. AGENT: The Bureau of Investigations. MYSTERIOUS
| STRANGER: The Bureau of Investigations. AGENT: The Bureau of
| Investigations. MYSTERIOUS STRANGER: The Bureau of
| Investigations. AGENT: The Bureau of Investigations. MYSTERIOUS
| STRANGER: The Bureau of Investigations'}]
| pksebben wrote:
| ... this can get a little goofy even with do_sample=False and
| no temp:
|
| | [{'generated_text': "DAUGHTER: tell me a story FATHER: but
| it's late DAUGHTER: please? FATHER: okay, once upon a time
| there was a little girl who lived in a little house with her
| mother and father and her brother and sister and her dog and
| her cat and her hamster and her fish and her bird and her
| rabbit and her horse and her cow and her sheep and her goat and
| her pig and her chicken and her duck and her turkey and her
| goose and her llama and her alpaca and her camel and her zebra
| and her giraffe and her elephant and her hippopotamus and her
| rhinoceros and her kangaroo and her koala and her panda and her
| bear and her wolf and her fox and her cat and her dog and her
| bird and her fish and her hamster and her cat and her dog and
| her bird and her fish and her hamster and her cat and her dog
| and her bird and her fish and her hamster and her cat and her
| dog and her bird and her fish and her hamster and"}]
| gpderetta wrote:
| That's seems a perfect story to put a little child to bed :D.
|
| I have used a similar recursive story in the past. My son
| still jokes about it.
| fho wrote:
| There actually was a podcast around that concept when (I
| think) GPT2 was current.
|
| Basically one generated story per day. Absurd in places.
| polygamous_bat wrote:
| From what I heard through the grapevine, OLMo is not nearly the
| best model for its size or compute budget. Apparently something
| didn't quite go right and AI2 didn't have the money to train
| until they got it right.
| ein0p wrote:
| Seems to be surprisingly fast at smaller sizes, too.
| wg0 wrote:
| The hype around LLMs won't last past 2030 I suppose. LLMs - we
| have statistical inference soup that gets outdated like stagnant
| pond water and by each passing day, becoming less accurate.
|
| I am curious how long the hype wave lasts. Ones I have recently
| seen was K8S. It settled down and won TBH.
| Grimblewald wrote:
| I think the hype dies down and theyll become part of a bigger
| thing, like dense neural networks.
| michaelmior wrote:
| The transformer architecture probably won't last and we might
| start calling them something else, but I can't see something
| that could reasonably be called an LLM going away any time
| soon.
| margorczynski wrote:
| > 1. No biases. Following LLaMA, PaLM, and others, we exclude all
| bias terms from our architecture in order to improve training
| stability.
|
| What does this mean? What is a "bias term"?
| polygamous_bat wrote:
| Think of the term b in y = Wx+b. W is called weight, b is
| called bias.
| arcza wrote:
| sToP bLOgGinG wITh Medium!
| egKYzyXeIL wrote:
| Why shouldn't people use Medium? I'm probably out of the loop.
| arcza wrote:
| The nags, the dark patterns, the horrific UI, the soft
| paywalls, and the tracking, to name a few reasons
| gadflyinyoureye wrote:
| They often require log in to see the whole article. Later
| they cap your access to articles to N per some period of
| time. The only way around that is to purchase a subscription.
| Given the weak offering of Medium, it's seldom worth the
| $/month cost of a subscription for the few jewels that might
| appear.
| barfbagginus wrote:
| sToP bLOgGinG wITh Medium!
| flotzam wrote:
| https://scribe.rip/hello-olmo-a-truly-open-llm-43f7e7359222
___________________________________________________________________
(page generated 2024-04-09 23:02 UTC)