[HN Gopher] The Open Source AI Definition RC1 Is Available for C...
___________________________________________________________________
The Open Source AI Definition RC1 Is Available for Comments
Author : foxbee
Score : 36 points
Date : 2024-10-09 19:00 UTC (4 hours ago)
(HTM) web link (opensource.org)
(TXT) w3m dump (opensource.org)
| swyx wrote:
| D.O.A without adoption from the major model labs (including the
| "opener" ones like AI2 and lets say Together/Eleuther). i dont
| like the open source old guard feeling like they have any say in
| defining things when they dont have skin in the game. (and yes,
| this is coming from a fan of their current work defending the
| "open source" term in traditional dev tools). a good way to
| ensure decline to irrelevance is to do a lot of busywork without
| ensuring a credible quorum of the major players at the table.
|
| please dont let me discourage tho, i think this could be
| important work but if and only if this gets endorsement from >1
| large model lab producing any interesting work
| blackeyeblitzar wrote:
| Why should the "old guard" not have to have the say when they
| came up with the idea of open source? It is misleading to adopt
| terminology with well known definitions and abuse it. People
| like Meta are free to use some other terminology that isn't
| "open source" to describe their models, which I cannot
| reproduce because they've release nothing except weights and
| inference code.
| sigh_again wrote:
| > they have any say in defining things when they dont have skin
| in the game.
|
| Then, maybe don't go around stealing and bastardizing the "open
| source" concept when absolutely none of the serious AI research
| is open source or reproductible. Just because you read a fancy
| word online once and think you can use it doesn't mean you're
| right.
| jszymborski wrote:
| > D.O.A without adoption from the major model labs
|
| I definitely disagree. Adoption of open licenses has
| historically been "bottom-up", starting with academia and
| hobbyists and then eventually used by big names. I have zero
| idea why that can't be the case here.
|
| I know I'll be releasing my models under an open license once
| finalized.
| tananaev wrote:
| The definition is good because currently many call their open
| model weights as open "source". But I suspect most companies will
| still call their models open source even when they're not.
| datascientist wrote:
| also see https://gradientflow.com/open-source-principles-in-
| foundatio...
| exac wrote:
| > The aim of Open Source is not and has never been to enable
| reproducible software.
|
| Okay, well just because you have the domain name "opensource.org"
| doesn't give you the ability to speak for the community, and the
| community's understanding of the term.
|
| opensource.org is irrelevant.
| FrustratedMonky wrote:
| I agree.
|
| "never been to enable reproducible software"
|
| I'd say, sure "Never" is a big word.
|
| Having open code that everyone can read and run, was partly to
| allow for reproducibility. In the closed world, how is anybody
| reproducing anything, being open does enable that.
| saurik wrote:
| The article seems to cover this nuance in the next
| paragraphs?
| saurik wrote:
| I mean, I've never understood "open source" to require
| reproducibility? That concept barely even existed as a thing
| people strove for until 15 years ago, a lot of software still
| only barely supports such, and there are tons of tradeoffs that
| come with it (as you effectively then also inherit your entire
| toolchain as vendor maintained, and a lot of projects end up
| making that result in awkward binaries, as almost no one
| reproduces entirely from a small bit of bootstrapped lisp).
| blackeyeblitzar wrote:
| A reinforcement of definitions is needed. Open weights is NOT
| open source. But there are people like Meta that are rampantly
| open washing their work. The point of open source is that you can
| recreate the product yourself, for example by compiling the
| source code. Clearly the equivalent for an LLM is being able to
| retrain the model to produce the weights. Yes I realize this is
| impractical without access to the hardware, but the transparency
| is still important, so we know how these models are designed, and
| how they may be influencing us through biases/censorship.
|
| The only actually open source model I am aware of is AI2's OLMo
| (https://blog.allenai.org/olmo-open-language-
| model-87ccfc95f5...), which includes training data, training
| code, evaluation code, fine tuning code, etc.
|
| The license also matters. A burdened license that restricts what
| you can do with the software is not really open source.
|
| I do have concerns about where OSI is going with all this. For
| example, why are they now saying that reproducibility is not a
| part of the definition? These two paragraphs below contradict
| each other - what does it mean to be able to "meaningfully fork"
| something and be able to make it more useful if you don't have
| the ingredients to reproduce it in the first place?
|
| > The aim of Open Source is not and has never been to enable
| reproducible software. The same is true for Open Source AI:
| reproducibility of AI science is not the objective. Open Source's
| role is merely not to be an impediment to reproducibility. In
| other words, one can always add more requirements on top of Open
| Source, just like the Reproducible Builds effort does.
|
| > Open Source means giving anyone the ability to meaningfully
| "fork" (study and modify) a system, without requiring additional
| permissions, to make it more useful for themselves and also for
| everyone.
| MichaelNolan wrote:
| > what does it mean to be able to "meaningfully fork" something
| and be able to make it more useful if you don't have the
| ingredients to reproduce it in the first place?
|
| I could be misunderstanding them, but my takeaway is that exact
| bit for bit reproducibility is not required. Most software,
| including open source, is not bit for bit reproducible. Exact
| reproducibility is a fairly new concept. Even with all the
| training data, and all the code, you are unlikely to get the
| exact same model as before.
|
| Though if that is what they mean, then they should be more
| explicit about it.
| glkanb wrote:
| Ok, decent first steps. Now approve a BSD license with an
| additional clause that prohibits use for "AI" training.
|
| Just like a free grazing field would allow living animals, but
| not a combine harvester. The old rules of "for any purpose" no
| longer apply.
| godelski wrote:
| I don't think this makes sense nor is consistent with itself, let
| alone its other definition[0] > The aim of Open
| Source is not and has never been to enable reproducible software.
| ... > Open Source means giving anyone the ability to
| meaningfully "fork" (study and modify) a system, without
| requiring additional permissions, to make it more useful for
| themselves and also for everyone. ... > Forking in
| the machine learning context has the same meaning as with
| software: having the ability and the rights to build a system
| that behaves differently than its original status. Things that a
| fork may achieve are: fixing security issues, improving behavior,
| removing bias.
|
| For these things, it does mean what most people are asking for:
| training details.
|
| So far companies are just releasing checkpoints and architecture.
| It is better than nothing and this is a great step (especially
| with how entrenched businesses are[1]). But if we really want to
| do things like fixing security issues or remove bias, you have to
| be able to understand the data that it was originally trained on
| AND the training procedures. Both of these introduce certain
| biases (via statistical definition, which is more general). These
| issues can't all be solved by tuning and the ability to tune is
| significantly influenced by these decisions.
|
| The reason we care about reproducible builds is because it
| matters to things like security, where we know what we're looking
| at is the same thing that's in the actual program. It is fair to
| say that the "aim" isn't about reproducible software, but it is a
| direct consequence of the software being open source. Trust
| matters, but the saying is "trust but verify". Sure, you can also
| fix vulns and bugs in closed source software, hell, you can even
| edit or build on top of it. But we don't call these things open
| source (or source available) for a reason.
|
| If we're going to be consistent in our definitions, we need to
| understand what these things are at at least a minimal level of
| abstraction. And frankly, as a ML researcher, I just don't see
| it.
|
| That said, I'm generally fine with "source available" and like
| most people use it synonymous with "open source". But if you're
| going to go around telling everyone they're wrong about the OSS
| definition, at least be consistent and stick to your values.
|
| [0] https://opensource.org/osd
|
| [1] Businesses who's entire model depends on OSS (by OS's
| definition) and freely available research
| ensignavenger wrote:
| "Reproducible build" is a term used to refer to getting an
| exact binary match out of a build. This is outside the scope of
| the OSD. I am not certain, but it sounds like this is what they
| are talking about here. Just because you run the build yourself
| doesn't mean you will get an exact match of what the original
| producer built. Something as simple as a random number
| generator or using a timestamp in the build will result in a
| mismatch.
| wmf wrote:
| Various organizations are willing to release open weights but not
| open source weights according to this definition, so this is
| going to be a no-op. Open source already existed before the OSI
| codified it, but now they're trying to will open source AI into
| existence against tons of incentives not to.
___________________________________________________________________
(page generated 2024-10-09 23:01 UTC)