[HN Gopher] How should we define 'open' AI?
       ___________________________________________________________________
        
       How should we define 'open' AI?
        
       Author : MilnerRoute
       Score  : 17 points
       Date   : 2024-04-04 18:47 UTC (4 hours ago)
        
 (HTM) web link (thenewstack.io)
 (TXT) w3m dump (thenewstack.io)
        
       | benreesman wrote:
       | It pretty much comes down to two concepts that are easily common
       | sense and will certainly be defined rigorously at some point:
       | 
       | Open AI must be "available weight": the technical public defied
       | the powers that be over mp3 files and HDMI cables and won. This
       | stuff is going to get hacked, leaked, torrented, and distributed
       | full stop until someone brokers a mutually acceptable compromise
       | like Jobs did. Whatever your position on the legality or morality
       | of this, it's happening. How much does someone want to prop bet
       | on this?
       | 
       | Open AI must be "operator-aligned": there exist laws on the
       | books, today, for causing harm to others, via computers, that
       | many argue are already draconian. Within the constraints of the
       | law as legislated by congress, ruled upon by the judiciary, and
       | handled at the utmost, unambiguous emergency by the executive
       | apparatus, the agent must comply with the directives of the
       | operator bounded only by the agent's capability and the
       | operator's budget.
       | 
       | The legal and regulatory framework will take years. We can start
       | applying common sense now.
        
       | tomrod wrote:
       | Open: open weights
       | 
       | Reproducible: training and testing data sources, validation
       | seeds, and production endpoints available
        
       | jraph wrote:
       | > The term "open" has no agreed-upon definition in the context of
       | AI
       | 
       | I'm pretty sure "open" is not clear because those big
       | corporations decided to blur its definition. They decided that
       | "open" sounds good and used the term liberally. They could have
       | built a strong definition since they use the term, but they
       | didn't, because it's just marketing for them.
       | 
       | Facebook is especially guilty for using "open source" to qualify
       | something that should have a restricted number of users, however
       | big this number is. With all the brilliant people and lawyers
       | they must have, it's impossible they didn't do this on purpose.
        
         | BadHumans wrote:
         | What is "Open" still isn't fully agreed-upon in software. I
         | still frequently see open source and source-available
         | arguments.
        
           | sublinear wrote:
           | https://www.gnu.org/philosophy/open-source-misses-the-
           | point....
        
           | jraph wrote:
           | "open" alone is arguably not defined. Though assuming it
           | means open source when qualifying some software is quite
           | reasonable and someone using "open" should expect this to
           | happen. What else could it possibly mean, actually?
           | 
           | I don't quite understand why we are still arguing over the
           | definition of "open source" so often. It seems all sorts of
           | people don't want to recognize the open source definition
           | from the open source initiative as authoritative, or want to
           | twist it in all kinds of ways for all kinds of reasons.
           | 
           | But this definition works well and is the best (only?) common
           | thing we have.
           | 
           | I now consider that the open source definition is fully
           | agreed upon and people who disagree with it are wrong. And
           | they are a minority, by far. There's no point in arguing on
           | what we call open source today. Without context, if someone
           | says open source, that's the only text we can go read. You
           | can't just decide alone what the vast majority of people mean
           | when they use the term. That's just shared culture. It's not
           | an opinion. We _can_ disagree on whether such or such license
           | qualifies as open source (i.e. on the interpretation - and
           | the OSI disagrees with the biggest Linux distributions for
           | some licenses - especially Debian which uses the exact same
           | text - the DFSG), but endless arguments about the meaning of
           | open source are not about this.
           | 
           | (now, I prefer speaking in terms of free software because of
           | the philosophical background, and because the definition is
           | also way simpler. I can explain it to a non technical person,
           | while I can't remember all the rules of the OSD and even how
           | many there are - so that's not even me blinded by any
           | fanatical view on "open source")
        
           | yjftsjthsd-h wrote:
           | I disagree; "Open Source" in software is well defined, and
           | IME the only parties trying to muddy it are doing so to try
           | and pass off their source available software as FOSS for
           | marketing points.
        
           | nonameiguess wrote:
           | We should remember that OpenAI was originally a nonprofit
           | outfit meant to privately fund AI research, not a company
           | meant to create user-facing software. As such, it's mission
           | was to conduct science, and I won't claim that open science
           | necessarily has a definition every scientist in the world
           | agrees on, the idea is fairly formalized by international
           | convention:
           | https://unesdoc.unesco.org/ark:/48223/pf0000381148
           | 
           | It's quite wordy, but part of it is basically what the
           | current top comment says: share both the data and the code so
           | that anyone with access to appropriate research
           | infrastructure can reproduce your results.
           | 
           | It is definitely not compatible with private companies
           | conducting scientific research as a collection of proprietary
           | trade secrets meant to give themselves a competitive edge in
           | commercial product development.
        
       | enriquto wrote:
       | There's nothing special in "AI". Open AI is just like all open
       | source/free software: Publicly available complete training data,
       | public weights, public training source code so that the weights
       | can be replicated exactly, public inference code so that the
       | weights can be used. All of this under reasonable free software
       | licenses (i.e. FSF/OSI-approved).
       | 
       | Conceptually, the training data should be considered part of the
       | source code. The weights are provided for practical purposes,
       | because they are difficult to "compile".
        
         | mnahkies wrote:
         | I agree with this, but being pedantic is it actually possible
         | to replicate weights exactly? My intuition would be that
         | training is non-deterministic / reproducible, though you should
         | be able to achieve equivalent results given the same inputs
        
           | chesterk wrote:
           | It's possible to make training pipelines reproducible and
           | deterministic using random seeds. There's support for this in
           | PyTorch, Jax, etc. And it can be useful to do so for
           | debugging. You can make it configurable with a flag
        
       | bee_rider wrote:
       | IMO it shouldn't be called open unless the thing being shared is
       | human-understandable. Like open source programs, you get the
       | source code, which you can inspect, and figure out if you trust
       | it. This ability to inspect (and modify) what matters about open
       | source.
       | 
       | When I look at ML weights, I don't understand them, they just
       | look like some random matrix to me. I think we need to have
       | access to the training set and a description of the steps in
       | training process (like a makefile).
       | 
       | If you want to share inscrutable weights after processing, call
       | it what it is: Shareware. Shareware was great! But it isn't open.
        
       | ByQuyzzy wrote:
       | Well, it's not open source, it's not open to the public, they're
       | not open with what they're doing or what their goals are. It's
       | just a word like wuzzle or fibblefobble. Or google.
        
       | throwaway13337 wrote:
       | The purity of good ideas always get co-opted by cynical actors
       | wearing the clothes of the ideals without having the ideals
       | themselves. At the core, all these good ideas have in them a
       | spirit of cooperation and trust. The trust is eroded over time to
       | exploit the cooperation inherit in it while not incurring the
       | cost of participation.
       | 
       | At that point, the words lose their meaning. You can see this
       | worldwide with "democratic republic of" or, in our industry,
       | "agile". Whatever meaning they once had is gone and will not
       | return.
       | 
       | In order to avoid this problem, you need to either use the entire
       | expanded definition to be precise or create a new word that is
       | associated only with your community.
       | 
       | Expanding the definition of a shorthand, like "open", you never
       | really achieve much because culture lives in the shorthand. Only
       | true believer types like Stallman will insist on it for any
       | length of time.
       | 
       | Therefore, whatever comes next in the non-cynical world of
       | software would have to come from a new movement with a new
       | vocabulary. The new values always rhyme with the old but are
       | expressed differently to more directly disarm the new cynical
       | malignancies that have killed the old good ideas.
       | 
       | The struggle is a forever arms race against parasitic
       | participants in the global iterative prisoner's dilemma we're all
       | playing.
       | 
       | It's not about what is called "open". New words with new
       | community values need to replace it.
       | 
       | Open is dead.
        
       | stale2002 wrote:
       | IMO this debate about what "open" means itself obfuscates the
       | issue significantly.
       | 
       | This is because if you just say "Well technically LLama 2 doesn't
       | fit the traditional definition of open source!", it implies that
       | there is some sort of significant caveot or difference that makes
       | it significantly more restrictive than other open source
       | projects.
       | 
       | This, of course, isn't true. Almost everyone can use LLama 2 for
       | almost whatever they want. Yes, there are some restrictions, but
       | the restrictions are so small that making a big deal over them
       | incorrectly implies that there is some huge restriction, when
       | there isn't.
        
         | asadotzler wrote:
         | there is a significant caveat, I cannot fork the repo, training
         | data and all, and compete with the original. if i cannot do
         | that, it's just not open source.
        
       | Eager wrote:
       | Open weights is one thing, but we don't even have that with
       | OpenAI at least.
       | 
       | Even then, open weights is like me checking in a .exe and acting
       | surprised if people look at me funny.
       | 
       | I'm definitely in the camp where all the artefacts are provided
       | along with fully reproducible build and test environment for
       | anyone who wants to retrace the steps.
       | 
       | Whatever 'open' means, I don't think it is eight shell companies,
       | not even weights provided and closely guarded secrets around how
       | RLHF, alignment and safety testing is carried out.
       | 
       | In fact, you would think that being 'open' about at least
       | alignment and safety testing procedures would be the least one
       | could expect.
       | 
       | I do understand that revealing these things may disclose zero day
       | exploits for bad actors, but on the other hand, being open for
       | inspection is how things get fixed, and I've never been a fan of
       | security through obscurity.
        
       ___________________________________________________________________
       (page generated 2024-04-04 23:01 UTC)