[HN Gopher] Open source AI is the path forward
___________________________________________________________________
Open source AI is the path forward
Author : atgctg
Score : 1018 points
Date : 2024-07-23 15:08 UTC (7 hours ago)
(HTM) web link (about.fb.com)
(TXT) w3m dump (about.fb.com)
| amusingimpala75 wrote:
| Sure but under what license? Because slapping "open source" on
| the model doesn't make it open source if it's not actually
| license that way. The 3.1 license still contains their non-
| commercial clause (over 700m users) and requires derivatives,
| whether fine tunings or trained on generated data, to use the
| llama name.
| redleader55 wrote:
| "Use it for whatever you want(conditions apply), but not if you
| are Google, Amazon, etc. If you become big enough talk to us."
| That's how I read the license, but obviously I might be missing
| some nuance.
| mesebrec wrote:
| You also can't use it for training or improving other models.
|
| You also can't use it if you're the government of India.
|
| Neither can sex workers use it. (Do you know if your
| customers are sex workers?)
|
| There are also very vague restrictions for things like
| discrimination, racism etc.
| war321 wrote:
| They're actually updating their license to allow LLAMA
| outputs for training!
|
| https://x.com/AIatMeta/status/1815766335219249513
| aliljet wrote:
| And this is happening RIGHT as a new potential leader is emerging
| in Llama 3.1. I'm really curious about how this is going to match
| up on the leaderboards...
| kart23 wrote:
| > This is how we've managed security on our social networks - our
| more robust AI systems identify and stop threats from less
| sophisticated actors who often use smaller scale AI systems.
|
| Ok, first of all, has this really worked? AI moderators still
| can't capture the mass of obvious spam/bots on all their
| platforms, threads included. Second, AI detection doesn't work,
| and with how much better the systems are getting, it's probably
| never going to, unless you keep the best models for yourself, and
| it's is clear from the rest of the note that its not zuck's
| intention to do so.
|
| > As long as everyone has access to similar generations of models
| - which open source promotes - then governments and institutions
| with more compute resources will be able to check bad actors with
| less compute.
|
| This just doesn't make sense. How are you going to prevent AI
| spam, AI deepfakes from causing harm with more compute? What are
| you gonna do with more compute about nonconsensual deepfakes?
| People are already using AI to bypass identity verification on
| your social media networks, and pump out loads of spam.
| OpenComment wrote:
| Interesting quotes. _Less sophisticated actors_ just means
| humans who already write in 2020 what the NYT wrote in early
| 2022 to prepare for Biden 's State Of The Union 180deg policy
| reversals (manufacturing consent).
|
| FB was notorious for censorship. Anyway, what is with the
| "actions/actors" terminology? This is straightforward
| totalitarian language.
| simonw wrote:
| "AI detection doesn't work, and with how much better the
| systems are getting, it's probably never going to, unless you
| keep the best models for yourself"
|
| I don't think that's true. I don't think even the best
| privately held models will be able to detect AI text reliably
| enough for that to be worthwhile.
| blackeyeblitzar wrote:
| Only if it is truly open source (open data sets, transparent
| curation/moderation/censorship of data sets, open training source
| code, open evaluation suites, and an OSI approved open source
| license).
|
| Open weights (and open inference code) is NOT open source, but
| just some weak open washing marketing.
|
| The model that comes closest to being TRULY open is AI2's OLMo.
| See their blog post on their approach:
|
| https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e73...
|
| I think the only thing they're not open about is how they've
| curated/censored their "Dolma" training data set, as I don't
| think they explicitly share each decision made or the original
| uncensored dataset:
|
| https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-co...
|
| By the way, OSI is working on defining open source for AI. They
| post weekly updates to their blog. Example:
|
| https://opensource.org/blog/open-source-ai-definition-weekly...
| JumpCrisscross wrote:
| > _Only if it is truly open source (open data sets, transparent
| curation /moderation/censorship of data sets, open training
| source code, open evaluation suites, and an OSI approved open
| source license)_
|
| You're missing a then to your if. What happens if it's "truly"
| open per your definition versus not?
| blackeyeblitzar wrote:
| I think you are asking what the benefits are? The main
| benefit is that we can trust what these systems are doing
| better. Or we can self host them. If we just take the
| weights, then it is unclear how these systems might be lying
| to us or manipulating us.
|
| Another benefit is that we can learn from how the training
| and other steps actually work. We can change them to suit our
| needs (although costs are impractical today). Etc. It's all
| the usual open source benefits.
| haolez wrote:
| There is also the risk of companies like Meta introducing ads
| in the training itself, instead of inference time.
| itissid wrote:
| Yeah, though I do wonder for a big model like 405B if the
| original training recipe, really matters for where models are
| heading, practically speaking which is smaller and more
| specific?
|
| I imagine its main use would be to train other models by
| distilling them down with LoRA/Quantization etc(assuming we
| have a tokenizer). Or use them to generate training data for
| smaller models directly.
|
| But, I do think there is always a way to share without
| disclosing too many specifics, like this[1] lecture from this
| year's spring course at Stanford. You can always say, for
| example:
|
| - The most common technique for filtering was using voting LLMs
| ( _without disclosing said llms or quantity of data_ ).
|
| - We built on top of a filtering technique for removing poor
| code using ____ by ____ authors ( _without disclosing or
| handwaving how you exactly filtered, but saying that you had to
| filter_ ).
|
| - We mixed certain proportion of this data with that data to
| make it better ( _without saying what proportion_ )
|
| [1]
| https://www.youtube.com/watch?v=jm2hyJLFfN8&list=PLoROMvodv4...
| JumpCrisscross wrote:
| "The Heavy Press Program was a Cold War-era program of the United
| States Air Force to build the largest forging presses and
| extrusion presses in the world." This "program began in 1944 and
| concluded in 1957 after construction of four forging presses and
| six extruders, at an overall cost of $279 million. Six of them
| are still in operation today, manufacturing structural parts for
| military and commercial aircraft" [1].
|
| $279mm in 1957 dollars is about $3.2bn today [2]. A public
| cluster of GPUs provided for free to American universities,
| companies and non-profits might not be a bad idea.
|
| [1] https://en.m.wikipedia.org/wiki/Heavy_Press_Program
|
| [2] https://data.bls.gov/cgi-
| bin/cpicalc.pl?cost1=279&year1=1957...
| CardenB wrote:
| Doubtful that GPUs purchased today would be in use for a
| similar time scale. Govt investment would also drive the cost
| of GPUs up a great deal.
|
| Not sure why a publicly accessible GPU cluster would be a
| better solution than the current system of research grants.
| JumpCrisscross wrote:
| > _Doubtful that GPUs purchased today would be in use for a
| similar time scale_
|
| Totally agree. That doesn't mean it can't generate massive
| ROI.
|
| > _Govt investment would also drive the cost of GPUs up a
| great deal_
|
| Difficult to say this _ex ante_. On its own, yes. But it
| would displace some demand. And it could help boost chip
| production in the long run.
|
| > _Not sure why a publicly accessible GPU cluster would be a
| better solution than the current system of research grants_
|
| Those receiving the grants have to pay a private owner of the
| GPUs. That gatekeeping might be both problematic, if there is
| a conflict of interests, and inefficient. (Consider why the
| government runs its own supercomputers versus contracting
| everything to Oracle and IBM.)
| rvnx wrote:
| It would be better that the government removes IP on such
| technology for public use, like drugs got generics.
|
| This way the government pays 2'500 USD per card, not 40'000
| USD or whatever absurd.
| JumpCrisscross wrote:
| > _better that the government removes IP on such
| technology for public use, like drugs got generics_
|
| You want to punish NVIDIA for calling its shots
| correctly? You don't see the many ways that backfires?
| gpm wrote:
| No. But I do want to limit the amount we reward NVIDIA
| for calling the shots correctly to maximize the benefit
| to society. For instance by reducing the duration of the
| government granted monopolies on chip technology that is
| obsolete well before the default duration of 20 years is
| over.
|
| That said, it strikes me that the actual limiting factor
| is fab capacity not nvidia's designs and we probably need
| to lift the monopolies preventing competition there if we
| want to reduce prices.
| JumpCrisscross wrote:
| > _reducing the duration of the government granted
| monopolies on chip technology that is obsolete well
| before the default duration of 20 years is over_
|
| Why do you think these private entities are willing to
| invest the massive capital it takes to keep the frontier
| advancing at that rate?
|
| > _I do want to limit the amount we reward NVIDIA for
| calling the shots correctly to maximize the benefit to
| society_
|
| Why wouldn't NVIDIA be a solid steward of that capital
| given their track record?
| gpm wrote:
| > Why do you think these private entities are willing to
| invest the massive capital it takes to keep the frontier
| advancing at that rate?
|
| Because whether they make 100x or 200x they make a
| shitload of money.
|
| > Why wouldn't NVIDIA be a solid steward of that capital
| given their track record?
|
| The problem isn't who is the steward of the capital. The
| problem is that economically efficient thing to do for a
| single company is (given sufficient fab capacity, and a
| monopoly) to raise prices to extract a greater share of
| the pie at the expense of shrinking the size of the pie.
| I'm not worried about who takes the profit, I'm worried
| about the size of the pie.
| whimsicalism wrote:
| > Because whether they make 100x or 200x they make a
| shitload of money.
|
| It's not a certainty that they 'make a shitload of
| money'. Reducing the right tail payoffs absolutely
| reduces the capital allocated to solve problems - many of
| which are _risky bets_.
|
| Your solution absolutely decreases capital investment at
| the margin, this is indisputable and basic economics.
| Even worse when the taking is not due to some pre-
| existing law, so companies have to deal with the
| additional uncertainty of whether & when future people
| will decide in retrospect that they got too large a
| payoff and arbitrarily decide to take it from them.
| gpm wrote:
| You can't just look at the costs to an action, you also
| have to look at the benefits.
|
| Of course I agree I'm going to stop marginal investments
| from occurring into research into patent-able
| technologies by reducing the expect profit. But I'm going
| to do so _very slightly_ because I 'm not shifting the
| expected value by very much. Meanwhile I'm going to
| greatly increase the investment into the existing
| technology we already have, and allow many more people to
| try to improve upon it, and I'm going to argue the
| benefits greatly outweigh the costs.
|
| Whether I'm right or wrong about the net benefit, the
| basic economics here is that there are both costs and
| benefits to my proposed action.
|
| And yes I'm going to marginally reduce future investments
| because the same might happen in the future and that
| reduces expected value. In fact if I was in charge the
| same _would_ happen in the future. And the trade-off I
| get for this is that society gets the benefit of the same
| _actually_ happening in the future and us not being
| hamstrung by unbreachable monopolies.
| JumpCrisscross wrote:
| > _I 'm going to do so very slightly because I'm not
| shifting the expected value by very much_
|
| You're massively increasing uncertainty.
|
| > _the same would happen in the future. And the trade-off
| I get for this is that society gets the benefit_
|
| Why would you expect it would ever happen again? What you
| want is an unrealized capital gains tax. Not to nuke our
| semiconductor industry.
| whimsicalism wrote:
| > But I'm going to do so very slightly because I'm not
| shifting the expected value by very much
|
| I think you're shifting it by a lot. If the government
| can post-hoc decide to invalidate patents because the
| holder is getting too successful, you are introducing a
| substantial impact on expectations and uncertainty. Your
| action is not taken in a vacuum.
|
| > Meanwhile I'm going to greatly increase the investment
| into the existing technology we already have, and allow
| many more people to try to improve upon it, and I'm going
| to argue the benefits greatly outweigh the costs.
|
| I think this is a much more speculative impact. Why will
| people even fund the improvements if the government might
| just decide they've gotten too large a slice of the pie
| later on down the road?
|
| > the trade-off I get for this is that society gets the
| benefit of the same actually happening in the future and
| us not being hamstrung by unbreachable monopolies.
|
| No the trade-off is that materially less is produced.
| These incentive effects are not small. Take for instance,
| drug price controls - a similar post-facto taking because
| we feel that the profits from R&D are too high.
| Introducing proposed price controls leads to hundreds of
| fewer drugs over the next decade [0] - and likely
| millions of premature deaths downstream of these
| incentive effects. And that's with a policy with a clear
| path towards short-term upside (cheaper drug prices).
| Discounted GPUs by invalidating nvidia's patents has a
| much more tenuous upside and clear downside.
|
| [0]: https://bpb-
| us-w2.wpmucdn.com/voices.uchicago.edu/dist/d/312...
| hluska wrote:
| You have proposed state ownership of all successful IP.
| That is a massive change and yet you have demonstrated
| zero understanding of the possible costs.
|
| Your claim that removing a profit motivation will
| increase investment is flat out wrong. Everything else
| crumbles from there.
| gpm wrote:
| No, I've proposed removing or reducing IP protections,
| not transferring them to the state. Allowing competitors
| to enter the market will obviously increase investment in
| competitors...
| IG_Semmelweiss wrote:
| This is already happening - its called China. There's a
| reason they don't innovate in anything, and they are
| always playing catch-up, except in the art of copying
| (stealing) from others.
|
| I do think there are some serious IP issues, as IP rules
| can be hijacked in the US, but that means you fix those
| problems, not blow up IP that was rightfully earned
| whimsicalism wrote:
| there is no such thing as a lump-sum transfer, this will
| shift expectations and incentives going forward and make
| future large capital projects an increasingly uphill
| battle
| hluska wrote:
| So, if a private company is successful, you will
| nationalize its IP under some guise of maximizing the
| benefit to society? That form of government was tried
| once. It failed miserably.
|
| Under your idea, we'll try a badly broken economic
| philosophy again. And while we're at it, we will
| completely stifle investment in innovation.
| Teever wrote:
| There was a post[0] on here recently about how the US
| went from producing woefully insufficient numbers of
| aircraft to producing 300k by the end of world war 2.
|
| One of the things that the post mentioned was the meager
| profit margin that the companies made during this time.
|
| But the thing is that this set the America auto and
| aviation industry up to rule the world for decades.
|
| A government going to a company and saying 'we need you
| to produce this product for us at a lower margin thab
| you'd like to' isn't the end of the world.
|
| I don't know if this is one of those scenarios but they
| exist.
|
| [0] https://www.construction-physics.com/p/how-to-
| build-300000-a...
| rvnx wrote:
| In the case of NVIDIA it's even more sneaky.
|
| They are an intellectual property company holding the
| rights on plans to make graphic cards, not even a company
| actually making graphic cards.
|
| The government could launch an initiative "OpenGPU" or
| "OpenAI Accelerator", where the government orders GPUs
| from TSMC directly, without the middleman.
|
| It may require some tweaking in the law to allow
| exception to intellectual property for "public interest".
| whimsicalism wrote:
| y'all really don't understand how these actions would
| seriously harm capital markets and make it difficult for
| private capital formation to produce innovations going
| forward.
| freeone3000 wrote:
| If we have public capital formation, we don't necessarily
| need private capital. Private innovation in weather
| modelling isn't outpacing government work by leaps and
| bounds, for instance.
| whimsicalism wrote:
| because it is extremely challenging to capture the
| additional value that is being produced by better weather
| forecasts and generally the forecasts we have right now
| are pretty good.
|
| private capital is absolutely the driving force for the
| vast majority of innovations since the beginning of the
| 20th century. public capital may be involved, but it is
| dwarfed by private capital markets.
| freeone3000 wrote:
| It's challenging to capture the additional value and the
| forecasts are pretty good _because_ of _continual_ large-
| scale government investment into weather forecasting.
| NOAA is launching satellites! it's a big deal!
|
| Private nuclear research is heavily dependent on
| governmental contracts to function. Solar was subsidized
| to heck and back for years. Public investment does work,
| and does make a didference.
|
| I would even say governmental involvement is sometimes
| even the deciding factor, to determine if research is
| worth pursuing. Some major capital investors have decided
| AI models cannot possibly gain enough money to pay for
| their training costs. So what do we do when we believe
| something is a net good for society, but isn't going to
| be profitable?
| inetknght wrote:
| > _y 'all really don't understand how these actions would
| seriously harm capital markets and make it difficult for
| private capital_
|
| Reflexively, I count that harm as a feature. I don't like
| private capital markets because I've been screwed by
| private capital on multiple occasions.
|
| But you are right: I don't understand how these actions
| would harm. So please do expand your concerns.
| panarky wrote:
| To the extent these are incremental units that wouldn't
| have been sold absent the government program, it's
| difficult to see how NVIDIA is "harmed".
| kube-system wrote:
| > It would be better that the government removes IP on
| such technology for public use, like drugs got generics.
|
| 20-25 year old drugs are a lot more useful than 20-25
| year old GPUs, and the manufacturing supply chain is not
| a bottleneck.
|
| There's no generics for the latest and greatest drugs,
| and a fancy gene therapy might run a _lot_ more than
| $40k.
| ygjb wrote:
| Of course they won't. The investment in the Heavy Press
| Program was the initial build, and just citing one example,
| the Alcoa 50,000 ton forging press was built in 1955,
| operated until 2008, and needed ~$100M to get it operational
| again in 2012.
|
| The investment was made to build the press, which created
| significant jobs and capital investment. The press, and
| others like it, were subsequently operated by and then sold
| to a private operator, which in turn enabled the massive
| expansion of both military manufacturing, and commercial
| aviation and other manufacturing.
|
| The Heavy Press Program was a strategic investment that paid
| dividends by both advancing the state of the art in
| manufacturing at the time it was built, and improving
| manufacturing capacity.
|
| A GPU cluster might not be the correct investment, but a
| strategic investment in increasing, for example, the
| availability of training data, or interoperability of tools,
| or ease of use for building, training, and distributing
| models would probably pay big dividends.
| JumpCrisscross wrote:
| > _A GPU cluster might not be the correct investment, but a
| strategic investment in increasing, for example, the
| availability of training data, or interoperability of
| tools, or ease of use for building, training, and
| distributing models would probably pay big dividends_
|
| Would you mind expanding on these options? Universal
| training data sounds intriguing.
| ygjb wrote:
| Sure, just on the training front, building and
| maintaining a broad corpus of properly managed training
| data with metadata that provides attribution (for
| example, content that is known to be human generated
| instead of model generated, what the source of data is
| for datasets such as weather data, census data, etc), and
| that also captures any licensing encumbrance so that
| consumers of the training data can be confident in their
| ability to use it without risk of legal challenge.
|
| Much of this is already available to private sector
| entities, but having a publicly funded organization
| responsible for curating and publishing this would enable
| new entrants to quickly and easily get a foundation
| without having to scrape the internet again, especially
| given how rapidly model generated content is being
| published.
| mnahkies wrote:
| I think the EPC (energy performance certificate) dataset
| in the UK is a nice example of this. Anyone can download
| a full dataset of EPC data from
| https://epc.opendatacommunities.org/
|
| Admittedly it hasn't been cleaned all that much - you
| still need to put a bit of effort into that (newer
| certificates tend to be better quality), but it's very
| low friction overall. I'd love to see them do this with
| more datasets
| dmix wrote:
| I don't think there's a shortage of capital for AI...
| probably the opposite
|
| Of all the things to expand the scope of government
| spending why would they choose AI, or more specifically
| GPUs?
| devmor wrote:
| There may however, be a shortage of capital for _open
| source_ AI, which is the subject under consideration.
|
| As for the why... because there's no shortage of capital
| for AI. It sounds like the government would like to
| encourage redirecting that capital to something that's
| good for the economy at large, rather than good for the
| investors of a handful of Silicon Valley firms interested
| only in their own short term gains.
| hluska wrote:
| Look at it from the perspective of an elected official:
|
| If it succeeds, you were ahead of the curve. If it fails,
| you were prudent enough to fund an investigation early.
| Either way, bleeding edge tech gives you a W.
| whimsicalism wrote:
| there are many things i think are more capital constrained,
| if the government is trying to subsidize things.
| jvanderbot wrote:
| A much better investment would be to (somehow) revolutionize
| production of chips for AI so that it's all cheaper, more
| reliable, and faster to stand up new generations of software
| and hardware codesign. This is probably much closer to the
| program mentioned in the top level comment: It wasn't to
| produce one type of thing, but to allow better production of
| any large thing from lighter alloys.
| light_hue_1 wrote:
| The problem is that any public cluster would be outdated in 2
| years. At the same time, GPUs are massively overpriced.
| Nvidia's profit margins on the H100 are crazy.
|
| Until we get cheaper cards that stand the test of time,
| building a public cluster is just a waste of money. There are
| far better ways to spend $1b in research dollars.
| JumpCrisscross wrote:
| > _any public cluster would be outdated in 2 years_
|
| The private companies buying hundreds of billions of dollars
| of GPUs aren't writing them off in 2 years. They won't be
| cutting edge for long. But that's not the point--they'll
| still be available.
|
| > _Nvidia 's profit margins on the H100 are crazy_
|
| I don't see how the current practice of giving a researcher a
| grant so they can rent time on a Google cluster that runs
| H100s is more efficient. It's just a question of capex or
| opex. As a state, the U.S. has a structual advantage in the
| former.
|
| > _far better ways to spend $1b in research dollars_
|
| One assumes the U.S. government wouldn't be paying list
| price. In any case, the purpose isn't purely research ROI.
| Like the heavy presses, it's in making a prohibitively-
| expensive capital asset generally available.
| ninininino wrote:
| What about dollar cost averaging your purchases of GPUs? So
| that you're always buying a bit of the newest stuff every
| year rather than just a single fixed investment in hardware
| that will become outdated? Say 100 million a year every year
| for 20 years instead of 2 billion in a single year?
| fweimer wrote:
| Don't these public clusters exist today, and have been around
| for decades at this point, with varying architectures? In the
| sense that you submit a proposal, it gets approved, and then
| you get access for your research?
| JumpCrisscross wrote:
| Not--to my knowledge--for the GPUs necessary to train
| cutting-edge LLMs.
| Maxious wrote:
| All of the major cloud providers offer grants for public
| research https://www.amazon.science/research-awards https:/
| /edu.google.com/intl/ALL_us/programs/credits/research
| https://www.microsoft.com/en-us/azure-academic-research/
|
| NVIDIA offers discounts
| https://developer.nvidia.com/education-pricing
|
| eg. for Australia, the National Computing Infrastructure
| allows researchers to reserve time on:
|
| - 160 nodes each containing four Nvidia V100 GPUs and two
| 24-core Intel Xeon Scalable 'Cascade Lake' processors.
|
| - 2 nodes of the NVIDIA DGX A100 system, with 8 A100 GPUs
| per node.
|
| https://nci.org.au/our-systems/hpc-systems
| NewJazz wrote:
| This is the most recent iteration of a national platform.
| They have tons of GPUs (and CPUs, and flash storage) hooked
| up as a Kubernetes cluster, available for teaching and
| research.
|
| https://nationalresearchplatform.org/
| epaulson wrote:
| The National Science Foundation has been doing this for
| decades, starting with the supercomputing centers in the 80s.
| Long before anyone talked about cloud credits, NSF has had a
| bunch of different programs to allocate time on supercomputers
| to researchers at no cost, these days mostly run out of the
| Office of Advanced Cyberinfrastruture. (The office name is from
| the early 00s) - https://new.nsf.gov/cise/oac
|
| (To connect universities to the different supercomputing
| centers, the NSF funded the NSFnet network in the 80s, which
| was basically the backbone of the Internet in the 80s and early
| 90s. The supercomputing funding has really, really paid off for
| the USA)
| JumpCrisscross wrote:
| > _NSF has had a bunch of different programs to allocate time
| on supercomputers to researchers at no cost, these days
| mostly run out of the Office of Advanced Cyberinfrastruture_
|
| This would be the logical place to put such a programme.
| alephnerd wrote:
| The DoE has also been a fairly active purchaser of GPUs for
| almost two decades now thanks to the Exascale Computing
| Project [0] and other predecessor projects.
|
| The DoE helped subsidize development of Kepler, Maxwell,
| Pascal, etc along with the underlying stack like NVLink,
| NGC, CUDA, etc either via purchases or allowing grants to
| be commercialized by Nvidia. They also played matchmaker by
| helping connect private sector research partners with
| Nvidia.
|
| The DoE also did the same thing for AMD and Intel.
|
| [0] - https://www.exascaleproject.org/
| jszymborski wrote:
| As you've rightly pointed out, we have the mechanism, now
| let's fund it properly!
|
| I'm in Canada, and our science funding has likewise fallen
| year after year as a proportion of our GDP. I'm still
| benefiting from A100 clusters funded by tax payer dollars,
| but think of the advantage we'd have over industry if we
| didn't have to fight over resources.
| xena wrote:
| Where do you get access to those as a member of the general
| public?
| cmdrk wrote:
| Yeah, the specific AI/ML-focused program is NAIRR.
|
| https://nairrpilot.org/
|
| Terrible name unless they low-key plan to make AI
| researchers' hair fall out.
| blackeyeblitzar wrote:
| What about distributed training on volunteer hardware? Is that
| feasible?
| oersted wrote:
| It is an exciting concept, there's a huge wealth of gaming
| hardware deployed that is inactive at most hours of the day.
| And I'm sure people are willing to pay well above the
| electricity cost for it.
|
| Unfortunately, the dominant LLM architecture makes it
| relatively infeasible right now.
|
| - Gaming hardware has too limited VRAM for training any kind
| of near-state-of-the-art model. Nvidia is being annoyingly
| smart about this to sell enterprise GPUs at exorbitant
| markups.
|
| - Right now communication between machines seems to be the
| bottleneck, and this is way worse with limited VRAM. Even
| with data-centre-grade interconnect (mostly Infiniband, which
| is also Nvidia, smart-asses), any failed links tend to cause
| big delays in training.
|
| Nevertheless, it is a good direction to push towards, and the
| government could indeed help, but it will take time. We need
| both a more healthy competitive landscape in hardware, and
| research towards model architectures that are easy to train
| in a distributed manner (this was also the key to the success
| of Transformers, but we need to go further).
| codemusings wrote:
| Ever heard of SETI@home?
|
| https://setiathome.berkeley.edu
| ks2048 wrote:
| How about using some of that money to develop CUDA alternatives
| so everyone is not paying the Nvidia tax?
| lukan wrote:
| It would be probably cheaper to negate some IP. There are
| quite some projects and initiatives to make CUDA code run on
| AMD for example, but as far as I know, they all stopped at
| some point, probably because of fear of being sued into
| oblivion.
| whimsicalism wrote:
| It seems like rocm is already fully ready for transformer
| inference, so you are just referring to training?
| janalsncm wrote:
| ROCm is buggy and largely undocumented. That's why we don't
| use it.
| belter wrote:
| Please start with the Windows Tax first for Linux users
| buying hardware...and the Apple Tax for Android users...
| zitterbewegung wrote:
| Either you port Tensorflow (Apple)[1] or PyTorch to your
| platform or you allow CUDA to run on your hardware (AMD) [2].
| Companies are incentives to not have NVIDIA having a monopoly
| but the thing is that CUDA is a huge moat due to
| compatibility of all frameworks and everyone knows it. Also,
| all of the cloud or on premises providers use NVIDIA
| regardless.
|
| [1] https://developer.apple.com/metal/tensorflow-plugin/ [2]
| https://www.xda-developers.com/nvidia-cuda-amd-zluda/
| erickj wrote:
| That's the kind of work that can come out of academia and
| open source communities when societies provide the resources
| required.
| prpl wrote:
| Great idea, too bad the DOE and NSF were there first.
| kjkjadksj wrote:
| The size of the cluster would have to be massive or else your
| job will be on the queue for a year. And even then what are you
| going to do downsize the resources requested so you can get in
| earlier? After a certain point it starts to make more sense to
| just buy your own xeons and run your own cluster.
| Aperocky wrote:
| Imagine if they made a data center with 1957 electronics that
| cost $279 million.
|
| They probably won't be using it now because the phone in your
| pocket is likely more powerful. Moore law did end but data
| center stuff are still evolving order of magnitudes faster than
| forging presses.
| goda90 wrote:
| I'd like to see big programs to increase the amount of cheap,
| clean energy we have. AI compute would be one of many
| beneficiaries of super cheap energy, especially since you
| wouldn't need to chase newer, more efficient hardware just to
| keep costs down.
| Melatonic wrote:
| Yeah this would be the real equivalent of the program people
| are talking about above. That an investing in core networking
| infrastructure (like cables) instead of just giving huge
| handouts to certain corporations that then pocket the
| money.....
| BigParm wrote:
| So we'll have the government bypass markets and force the
| working class to buy toys for the owning class?
|
| If anything, allocate compute to citizens.
| _fat_santa wrote:
| > If anything, allocate compute to citizens.
|
| If something like this were to become a reality, I could see
| something like "CitizenCloud" where once you prove that you
| are a US Citizen (or green card holder or some other
| requirement), you can then be allocated a number of credits
| every month for running workloads on the "CitizenCloud".
| Everyone would get a baseline amount, from there if you can
| prove you are a researcher or own a business related to AI
| then you can get more credits.
| aiauthoritydev wrote:
| Overall government doing anything is a bad idea. There are
| cases however where government is the only entity that can do
| certain things. These are things that involve military, law
| enforcement etc. Outside of this we should rely on private
| industry and for-profit industry as much as possible.
| pavlov wrote:
| The American healthcare industry demonstrates the tremendous
| benefits of rigidly applying this mindset.
|
| Why couldn't law enforcement be private too? You call 911,
| several private security squads rush to solve your immediate
| crime issue, and the ones who manage to shoot the suspect
| send you a $20k bill. Seems efficient. If you don't like the
| size of the bill, you can always get private crime insurance.
| sterlind wrote:
| For a further exploration of this particular utopia, see
| Snowcrash by Neal Stephenson.
| chris_wot wrote:
| That's not correct. The American health care system is an
| extreme example of where private organisations fail overall
| society.
| fragmede wrote:
| > Overall government doing anything is a bad idea.
|
| that is bereft of detail enough to just be wrong. There are
| things that government is good for and things that government
| is bad for, but "anything" is just too broad, and reveals an
| anti-government bias which just isn't well thought out.
| goatlover wrote:
| Why are governments a bad idea? Seems the human race has
| opted for governments doing things since the dawn of
| civilization. Building roads, providing defense, enforcing
| rights, provide social safety nets, funding costly scientific
| endeavors.
| com2kid wrote:
| [delayed]
| varenc wrote:
| I just watched this 1950s DoD video on the heavy press program
| and highly recommend it:
| https://www.youtube.com/watch?v=iZ50nZU3oG8
| spullara wrote:
| It makes much more sense to invest in a next generation fab for
| GPUs than to buy GPUs and more closely matches this kind of
| project.
| maxdo wrote:
| so that North Korea will create small call centers for cheaper,
| since they can get these models for free?
| HanClinto wrote:
| The article argues that the threat of foreign espionage is not
| solved by closing models.
|
| > Some people argue that we must close our models to prevent
| China from gaining access to them, but my view is that this
| will not work and will only disadvantage the US and its allies.
| Our adversaries are great at espionage, stealing models that
| fit on a thumb drive is relatively easy, and most tech
| companies are far from operating in a way that would make this
| more difficult. It seems most likely that a world of only
| closed models results in a small number of big companies plus
| our geopolitical adversaries having access to leading models,
| while startups, universities, and small businesses miss out on
| opportunities.
| tempfile wrote:
| This argument implies that cheap phones are bad since
| telemarketers can use them.
| mrfinn wrote:
| You guys really need to get over your bellicose POV of the
| world. Actually, before it destroys you. Really, is not
| necessary. Most people in the world just want to leave in
| peace, and see their children grow happily. For each data
| center NK would create there will be a thousand of peaceful,
| kind, and well-intentioned AI projects going on. Or maybe more.
| the8thbit wrote:
| "Eventually though, open source Linux gained popularity -
| initially because it allowed developers to modify its code
| however they wanted ..."
|
| I find the language around "open source AI" to be confusing. With
| "open source" there's usually "source" to open, right? As in,
| there is human legible code that can be read and modified by the
| user? If so, then how can current ML models be open source?
| They're very large matrices that are, for the most part,
| inscrutable to the user. They seem akin to binaries, which, yes,
| can be modified by the user, but are extremely obscured to the
| user, and require enormous effort to understand and effectively
| modify.
|
| "Open source" code is not just code that isn't executed remotely
| over an API, and it seems like maybe its being conflated with
| that here?
| orthoxerox wrote:
| Open training dataset + open steps sufficient to train exactly
| the same model.
| the8thbit wrote:
| This isn't what Meta releases with their models, though I
| would like to see more public training data. However, I still
| don't think that would qualify as "open source". Something
| isn't open source just because its reproducible out of
| composable parts. If one, very critical and system defining
| part is a binary (or similar) without publicly available
| source code, then I don't think it can be said to be "open
| source". That would be like saying that Windows 11 is open
| source because Windows Calculator is open source, and its a
| component of Windows.
| blackeyeblitzar wrote:
| Here's one list of what is needed to be actually open
| source:
|
| https://blog.allenai.org/hello-olmo-a-truly-open-
| llm-43f7e73...
| orthoxerox wrote:
| That's what I meant by "open steps", I guess I wasn't clear
| enough.
| the8thbit wrote:
| Is that what you meant? I don't think releasing the
| sequence of steps required to produce the model satisfies
| "open source", which is how I interpreted you, because
| there is still no source code for the model.
| Yizahi wrote:
| They can't release training dataset if it was illegally
| scrapped all over the web without permission :) (taps head)
| bilsbie wrote:
| Can't you do fine tuning on those binaries? That's a
| modification.
| the8thbit wrote:
| You can fine tune the models, and you can modify binaries.
| However, there is no human readable "source" to open in
| either case. The act of "fine tuning" is essentially brute
| forcing the system to gradually alter the weights such that
| loss is reduced against a new training set. This limits what
| you can actually do with the model vs an actual open source
| system where you can understand how the system is working and
| modify specific functionality.
|
| Additionally, models can be (and are) fine tuned via APIs, so
| if that is the threshold required for a system to be "open
| source", then that would also make the GPT4 family and other
| such API only models which allow finetuning open source.
| whimsicalism wrote:
| I don't find this argument super convincing.
|
| There's a pretty clear difference between the 'finetuning'
| offered via API by GPT4 and the ability to do whatever sort
| of finetuning you want and get the weights at the end that
| you can do with open weights models.
|
| "Brute forcing" is not the correct language to use for
| describing fine-tuning. It is not as if you are trying
| weights randomly and seeing which ones work on your dataset
| - you are following a gradient.
| the8thbit wrote:
| "There's a pretty clear difference between the
| 'finetuning' offered via API by GPT4 and the ability to
| do whatever sort of finetuning you want and get the
| weights at the end that you can do with open weights
| models."
|
| Yes, the difference is that one is provided over a remote
| API, and the provider of the API can restrict how you
| interact with it, while the other is performed directly
| by the user. One is a SaaS solution, the other is a
| compiled solution, and neither are open source.
|
| ""Brute forcing" is not the correct language to use for
| describing fine-tuning. It is not as if you are trying
| weights randomly and seeing which ones work on your
| dataset - you are following a gradient."
|
| Whatever you want to call it, this doesn't sound like
| modifying functionality in source code. When I modify
| source code, I might make a change, check what that does,
| change the same functionality again, check the new
| change, etc... up to maybe a couple dozen times. What I
| don't do is have a very simple routine make very small
| modifications to all of the system's functionality, then
| check the result of that small change across the broad
| spectrum of functionality, and repeat millions of times.
| Kubuxu wrote:
| The gap between fine-tuning API and weights-available is
| much more significant than you give it credit for.
|
| You can take the weights and train LoRAs (which is close
| to fine-tuning), but you can also build custom adapters
| on top (classification heads). You can mix models from
| different fine-tunes or perform model surgery (adding
| additional layers, attention heads, MoE).
|
| You can perform model decomposition and amplify some of
| its characteristics. You can also train multi-modal
| adapters for the model. Prompt tuning requires weights as
| well.
|
| I would even say that having the model is more potent in
| the hands of individual users than having the dataset.
| thayne wrote:
| That still doesn't make it open source.
|
| There is a massive difference between a compiled binary
| that you are allowed to do anything you want with,
| including modifying it, building something else on top or
| even pulling parts of it out and using in something else,
| and a SaaS offering where you can't modify the software
| at all. But that doesn't make the compiled binary open
| source.
| emporas wrote:
| > When I modify source code, I might make a change, check
| what that does, change the same functionality again,
| check the new change, etc... up to maybe a couple dozen
| times.
|
| You can modify individual neurons if you are so inclined.
| That's what Anthropic have done with the Claude family of
| models [1]. You cannot do that using any closed model. So
| "Open Weights" looks very much like "Open Source".
|
| Techniques for introspection of weights are very
| primitive, but i do think new techniques will be
| developed, or even new architectures which will make it
| much easier.
|
| [1] https://www.anthropic.com/news/mapping-mind-language-
| model
| the8thbit wrote:
| "You can modify individual neurons if you are so
| inclined."
|
| You can also modify a binary, but that doesn't mean that
| binaries are open source.
|
| "That's what Anthropic have done with the Claude family
| of models [1]. ... Techniques for introspection of
| weights are very primitive, but i do think new techniques
| will be developed"
|
| Yeah, I don't think what we have now is robust enough
| interpretability to be capable of generating something
| comparable to "source code", but I would like to see us
| get there at some point. It might sound crazy, but a few
| years ago the degree of interpretability we have today
| (thanks in no small part to Anthropic's work) would have
| sounded crazy.
|
| I think getting to open sourcable models is probably
| pretty important for producing models that actually do
| what we want them to do, and as these models become more
| powerful and integrated into our lives and production
| processes the inability to make them do what we actually
| want them to do may become increasingly dangerous.
| Muddling the meaning of open source today to market your
| product, then, can have troubling downstream effects as
| focus in the open source community may be taken away from
| interpretability and on distributing and tuning public
| weights.
| bilsbie wrote:
| You make a good point but those are also just limitations
| of the technology (or at least our current understanding of
| it)
|
| Maybe an analogy would help. A family spent generations
| breeding the perfect apple tree and they decided to "open
| source" it. What would open sourcing look like?
| the8thbit wrote:
| "You make a good point but those are also just
| limitations of the technology (or at least our current
| understanding of it)"
|
| Yeah, that _is_ my point. Things that don 't have source
| code can't be open source.
|
| "Maybe an analogy would help. A family spent generations
| breeding the perfect apple tree and they decided to "open
| source" it. What would open sourcing look like?"
|
| I think we need to be weary of dilemmas without solutions
| here. For example, let's think about another analogy: I
| was in a car accident last week. How can I open source my
| car accident?
|
| I don't think all, or even most things, are actually
| "open sourcable". ML models could be open sourced, but it
| would require a lot of work to interpret the models and
| generate the source code from them.
| jsheard wrote:
| I also think that something like Chromium is a better analogy
| for corporate open source models than a grassroots project like
| Linux is. Chromium is technically open source, but Google has
| absolute control over the direction of it's development and
| realistically it's far too complex to maintain a fork without
| Googles resources, just like Meta has complete control over
| what goes into their open models, and even if they did release
| all the training data and code (which they don't) us mere plebs
| could never afford to train a fork from scratch anyway.
| skybrian wrote:
| I think you're right from the perspective of an individual
| developer. You and I are not about to fork Chromium any time
| soon. If you presume that forking is impractical then sure,
| the right to fork isn't worth much.
|
| But just because a single developer couldn't do it doesn't
| mean it couldn't be done. It means nobody has organized a
| large enough effort yet.
|
| For something like a browser, which is critical for security,
| you need both the organization and the trust. Despite
| frequent criticism, Mozilla (for example) is still considered
| pretty trustworthy in a way that an unknown developer can't
| be.
| Yizahi wrote:
| If Microsoft can't do it, then we can reasonably conclude
| that it can't be done for any practical purpose. Discussing
| infinitesimal possibilities is better left to philosophers.
| skybrian wrote:
| Doesn't Microsoft maintain its own fork of Chromimum?
| umbra07 wrote:
| yes - their browser is chromium-based
| candiddevmike wrote:
| None of Meta's models are "open source" in the FOSS sense, even
| the latest Llama 3.1. The license is restrictive. And no one
| has bothered to release their training data either.
|
| This post is an ad and trying to paint these things as
| something they aren't.
| JumpCrisscross wrote:
| > _no one has bothered to release their training data_
|
| If the FOSS community sets this as the benchmark for open
| source in respect of AI, they're going to lose control of the
| term. In most jurisdictions it would be illegal for the likes
| of Meta to release training data.
| exe34 wrote:
| the training data is the source.
| JumpCrisscross wrote:
| > _the training data is the source_
|
| Sure. But that's not going to be released. The term open
| source AI cannot be expected to cover it because it's not
| practical.
| diggan wrote:
| So because it's really hard to do proper Open Source with
| these LLMs, means we need to change the meaning of Open
| Source so it fits with these PR releases?
| JumpCrisscross wrote:
| > _because it 's really hard to do proper Open Source
| with these LLMs, means we need to change the meaning of
| Open Source so it fits with these PR releases?_
|
| Open training data is hard to the point of
| impracticality. It requires excluding private and
| proprietary data.
|
| Meanwhile, the term "open source" is massively popular.
| So it will get used. The question is how.
|
| Meta _et al_ would love for the choice to be between, on
| one hand, open weights only, and, on the other hand, open
| training data, because the latter is impractical. That
| dichotomy guarantees that when someone says open source
| AI they 'll mean open weights. (The way open source
| software, today, generally means source available, not
| FOSS.)
| Palomides wrote:
| source available is absolutely not the same as open
| source
|
| you are playing very loosely with terms that have
| specific, widely accepted definitions (e.g.
| https://opensource.org/osd )
|
| I don't get why you think it would be useful to call LLMs
| with published weights "open source"
| JumpCrisscross wrote:
| > _terms that have specific, widely accepted definitions_
|
| OSF's definition is far from the only one [1].
| Switzerland is currently implementing CH Open's
| definition, the EU another one, _et cetera_.
|
| > _I don 't get why you think it would be useful to call
| LLMs with published weights "open source"_
|
| I don't. I'm saying that if the choice is between open
| weights or open weights + open training data, open
| weights will win because the useful definition will
| outcompete the pristine one in a public context.
|
| [1] https://en.wikipedia.org/wiki/Open-
| source_software#Definitio...
| diggan wrote:
| For the EU, I'm guessing you're talking about the EUPL,
| which is FSF/OSI approved and GPL compatible, generally
| considered copyleft.
|
| For the CH Open, I'm not finding anything specific, even
| from Swiss websites, could you help me understand what
| you're referring to here?
|
| I'm guessing that all these definitions have at least
| some points in common, which involves (another guess) at
| least being able to produce the output artifacts/binaries
| by yourself, something that you cannot do with Llama,
| just as an example.
| JumpCrisscross wrote:
| > _For the CH Open, I 'm not finding anything specific,
| even from Swiss websites, could you help me understand
| what you're referring to here_
|
| Was on the _HN_ front page earlier [1][2]. The definition
| comes strikingly close to source on request with no use
| restrictions.
|
| > _all these definitions have at least some points in
| common_
|
| Agreed. But they're all different. There isn't an
| accepted defintiion of open source even when it comes to
| software; there is an accepted set of broad principles.
|
| [1] https://news.ycombinator.com/item?id=41047172
|
| [2] https://joinup.ec.europa.eu/collection/open-source-
| observato...
| diggan wrote:
| > Agreed. But they're all different. There isn't an
| accepted defintiion of open source even when it comes to
| software; there is an accepted set of broad principles.
|
| Agreed, but are we splitting hairs here and is it
| relevant to the claim made earlier?
|
| > (The way open source software, today, generally means
| source available, not FOSS.)
|
| Do any of these principles or definitions from these orgs
| agree/disagree with that?
|
| My hypothesis is that they generally would go against
| that belief and instead argue that open source is
| different from source available. But I haven't looked
| specifically to confirm if that's true or not, just a
| guess.
| JumpCrisscross wrote:
| > _are we splitting hairs here and is it relevant to the
| claim made earlier?_
|
| I don't think so. Take the Swiss definition. Source on
| request, not even available. Yet being branded and
| accepted as open source.
|
| (To be clear, the Swiss example favours FOSS. But it also
| permits source on request and bundles them together under
| the same label.)
| Palomides wrote:
| diluting open source into a marketing term meaning "you
| can download something" would be a sad result
| SquareWheel wrote:
| > specific, widely accepted definitions
|
| Realistically, nobody outside of Hacker News commenters
| have ever cared about the OSD. It's just not how the term
| is used colloquially.
| Palomides wrote:
| who says open source colloquially? ime anyone who doesn't
| care about software licenses will just say free (per free
| beer)
|
| and (strong personal opinion) any software developer
| should have a firm grip on the terminology and details
| for legal reasons
| SquareWheel wrote:
| > who says open source colloquially?
|
| There is a large span of people between gray beard
| programmer and lay person, and many in that span have
| some concept of open-source. It's often used synonymously
| with visible source, free software, or in this case, open
| weights.
|
| It seems unfortunate - though expected - that over half
| of the comments in this thread are debating the OSD for
| the umpeenth time instead of discussing the actual model
| release or accompanying news posts. Meanwhile communities
| like /r/LocalLlama are going hog wild with this release
| and already seeing what it can do.
|
| > any software developer should have a firm grip on the
| terminology and details for legal reasons
|
| They'd simply need to review the terms of the license to
| see if it fits their usage. It doesn't really matter if
| the license satisfies the OSD or not.
| diggan wrote:
| > Open training data is hard to the point of
| impracticality. It requires excluding private and
| proprietary data.
|
| Right, so the onus is on Facebook/Meta to get that right,
| then they could call something Open Source, until then,
| find another name that already doesn't have a specific
| meaning.
|
| > (The way open source software, today, generally means
| source available, not FOSS.)
|
| No, but it's going in that way. Open Source, today, still
| means that the things you need to build a project, is
| publicly available for you to download and run on your
| own machine, granted you have the means to do so. What
| you're thinking of is literally called "Source Available"
| which is very different from "Open Source".
|
| The intent of Open Source is for people to be able to
| reproduce the work themselves, with modifications if they
| want to. Is that something you can do today with the
| various Llama models? No, because one core part of the
| projects "source code" (what you need to reproduce it
| from scratch), the training data, is being held back and
| kept private.
| unethical_ban wrote:
| >Meanwhile, the term "open source" is massively popular.
| So it will get used. The question is how.
|
| Here's the source of the disagreement. You're justifying
| the use of the term "open source" by saying it's logical
| for Meta to want to use it for its popularity and layman
| (incorrect) understanding.
|
| Other person is saying it doesn't matter how convenient
| it is or how much Meta wants to use it, that the term
| "open source" is misleading for a product where the
| "source" is the training data, _and_ the final product
| has onerous restrictions on use.
|
| This would be like Adobe giving Photoshop away for free,
| but for personal use only and not for making ads for
| Adobe's competitors. Sure, Adobe likes it and most users
| may be fine with it, but it isn't open source.
|
| >The way open source software, today, generally means
| source available, not FOSS.
|
| I don't agree with that. When a company says "open
| source" but it's not free, the tech community is quick to
| call it "source available" or "open core".
| JumpCrisscross wrote:
| > _You 're justifying the use of the term "open source"
| by saying it's logical for Meta to want to use it for its
| popularity and layman (incorrect) understanding_
|
| I'm actually not a fan of Meta's definition. I'm arguing
| specifically against an unrealistic definition, because
| for practical purposes that cedes the term to Meta.
|
| > _the term "open source" is misleading for a product
| where the "source" is the training data, and the final
| product has onerous restrictions on use_
|
| Agree. I think the focus should be on the use
| restrictions.
|
| > _When a company says "open source" but it's not free,
| the tech community is quick to call it "source available"
| or "open core"_
|
| This isn't consistently applied. It's why we have the
| free vs open vs FOSS fracture.
| plsbenice34 wrote:
| Of course it could be practical - provide the data. The
| fact of that society is a dystopian nightmare controlled
| by a few megacorporations that don't want free
| information does not justify outright changing the
| meaning of the language.
| JumpCrisscross wrote:
| > _provide the data_
|
| Who? It's not their data.
| tintor wrote:
| Meta can call it something else other than open source.
|
| Synthetic part of the training data could be released.
| JimDabell wrote:
| I don't think it's that simple. The source is "the
| preferred form of the work for making modifications to
| it" (to use the GPL's wording).
|
| For an LLM, that's not the training data. That's the
| model itself. You don't make changes to an LLM by going
| back to the training data and making changes to it, then
| re-running the training. You update the model itself with
| more training data.
|
| You can't even use the training code and original
| training data to reproduce the existing model. A lot of
| it is non-deterministic, so you'll get different results
| each time anyway.
|
| Another complication is that the object code for normal
| software is a clear derivative work of the source code.
| It's a direct translation from one form to another. This
| isn't the case with LLMs and their training data. The
| models learn from it, but they aren't simply an
| alternative form of it. I don't think you can describe an
| LLM as a derivative work of its training data. It learns
| from it, it isn't a copy of it. This is mostly the reason
| why distributing training data is infeasible - the
| model's creator may not have the license to do so.
|
| Would it be extremely useful to have the original
| training data? Definitely. Is distributing it the same as
| distributing source code for normal software? I don't
| think so.
|
| I think new terminology is needed for open AI models. We
| can't simply re-use what works for human-editable code
| because it's a fundamentally different type of thing with
| different technical and legal constraints.
| root_axis wrote:
| No. It's an asset used in the training process, the
| source code can process arbitrary training data.
| sangnoir wrote:
| We've had a similar debate before, but the last time it
| about whether Linux device drivers based on non-public
| datasheets under NDA were actually open source. This
| debate occurred again over drivers that interact with
| binary blobs.
|
| I disagree with the purists - if you can _legally_ change
| the source or weights - even without having access to the
| data used by the upstream authors - it 's open enough for
| me. YMMV.
| wrs wrote:
| I don't think even that is true. I conjecture that
| Facebook couldn't reproduce the model weights if they
| started over with the same training data, because I doubt
| such a huge training run is a reproducible deterministic
| process. I don't think _anyone_ has "the" source.
| exe34 wrote:
| numpy.random.seed(1234)
| mesebrec wrote:
| Regardless of the training data, the license even heavily
| restricts how you can use the model.
|
| Please read through their "acceptable use" policy before
| you decide whether this is really in line with open source.
| JumpCrisscross wrote:
| > _Please read through their "acceptable use" policy
| before you decide whether this is really in line with
| open source_
|
| I'm not taking a specific posiion on this license. I
| haven't read it closely. My broad point is simply that
| open source AI, as a term, cannot practically require the
| training data be made available.
| causal wrote:
| "Open weights" is a more appropriate term but I'll point out
| that these weights are also largely inscrutable to the people
| with the code that trained it. And for licensing reasons, the
| datasets may not be possible to share.
|
| There is still a lot of modifying you can do with a set of
| weights, and they make great foundations for new stuff, but
| yeah we may never see a competitive model that's 100% buildable
| at home.
|
| Edit: mkolodny points out that the model code is shared (under
| llama license at least), which is really all you need to run
| training https://github.com/meta-
| llama/llama3/blob/main/llama/model.p...
| aerzen wrote:
| LLAMA is an open-weights model. I like this term, let's use
| that instead of open source.
| stavros wrote:
| "Open weights" means you can use the weights for free (as in
| beer). "Open source" means you get the training dataset and
| the methodology. ~Nobody does open source LLMs.
| _heimdall wrote:
| Why is the dataset required for it to be open source?
|
| If I self host a project that is open sourced rather than
| paying for a hosted version, like Sentry.io for example, I
| don't expect data to come along with the code. Licensing
| rights are always up for debate in open source, but I
| wouldn't expect more than the code to be available and
| reviewable for anything needed to build and run the
| project.
|
| In the case of an LLM I would expect that to mean the code
| run to train the model, the code for the model data
| structure itself, and the control code for querying the
| model should all be available. I'm not actually sure if
| Meta does share all that, but training data is separate
| from open source IMO.
| solarmist wrote:
| The sticking point is you can't build the model. To be
| able to build the model from scratch you need methodology
| and a complete description of the data set.
|
| They only give you a blob of data you can run.
| _heimdall wrote:
| Got it, that makes sense. I still wouldn't expect them to
| have to publicly share the data itself, but if you can't
| take the code they share and run it against your own data
| to build a model that wouldn't be open source in my
| understanding of it.
| stavros wrote:
| Data is to models what code is to software.
| gowld wrote:
| https://opensource.org/osd
|
| "The source code must be the preferred form in which a
| programmer would modify the program. Deliberately
| obfuscated source code is not allowed. Intermediate forms
| such as the output of a preprocessor or translator are
| not allowed."
|
| > In the case of an LLM I would expect that to mean the
| code run to train the model, the code for the model data
| structure itself, and the control code for querying the
| model should all be available
|
| The M in LLM is for "Model".
|
| The code you describe is for an LLM _harness_ , not for
| an LLM. The code for the _LLM_ is whatever is needed to
| enable a developer to _modify_ to inputs and then build a
| modified output LLM (minus standard generally available
| tools not custom-created for that product).
|
| Training data is one way to provide this. Another way is
| some sort of semantic model editor for an interpretable
| model.
| blackeyeblitzar wrote:
| There is a comment elsewhere claiming there are a few dozen
| fully open source models:
| https://news.ycombinator.com/item?id=41048796
| sigmoid10 wrote:
| >Nobody does open source LLMs.
|
| There are a bunch of independent, fully open source
| foundation models from companies that share everything
| (including all data). AMBER and MAP-NEO for example. But we
| have yet to see one in the 100B+ parameter category.
| stavros wrote:
| Sorry, the tilde before "nobody" is my notation for
| "basically nobody" or "almost nobody". I thought it was
| more common.
| mkolodny wrote:
| Llama's code is open source: https://github.com/meta-
| llama/llama3/blob/main/llama/model.p...
| apsec112 wrote:
| That's not the _training_ code, just the inference code. The
| training code, running on thousands of high-end H100 servers,
| is surely much more complex. They also don 't open-source the
| dataset, or the code they used for data
| scraping/filtering/etc.
| the8thbit wrote:
| "just the inference code"
|
| It's not the "inference code", its the code that specifies
| the architecture of the model and loads the model. The
| "inference code" is mostly the model, and the model is not
| legible to a human reader.
|
| Maybe someday open source models will be possible, but we
| will need much better interpretability tools so we can
| generate the source code from the model. In most software
| projects you write the source as a specification that is
| then used by the computer to implement the software, but in
| this case the process is reversed.
| blackeyeblitzar wrote:
| That is just the inference code. Not training code or
| evaluation code or whatever pre/post processing they do.
| patrickaljord wrote:
| Is there an LLM with actual open source training code and
| dataset? Besides BLOOM
| https://huggingface.co/bigscience/bloom
| osanseviero wrote:
| Yes, there are a few dozen full open source models
| (license, code, data, models)
| blackeyeblitzar wrote:
| What are some of the other ones? I am aware mainly of
| OLMo (https://blog.allenai.org/olmo-open-language-
| model-87ccfc95f5...)
| navinsylvester wrote:
| Here you go - https://github.com/apple/corenet
| mesebrec wrote:
| This is like saying any python program is open source because
| the python runtime is open source.
|
| Inference code is the runtime; the code that runs the model.
| Not the model itself.
| mkolodny wrote:
| I disagree. The file I linked to, model.py, contains the
| Llama 3 model itself.
|
| You can use that model with open data to train it from
| scratch yourself. Or you can load Meta's open weights and
| have a working LLM.
| causal wrote:
| Yeah a lot of people here seem to not understand that
| PyTorch really does make model definitions that simple,
| and that has everything you need to resume back-
| propagation. Not to mention PyTorch itself being open-
| sourced by Meta.
|
| That said the LLama-license doesn't meet strict
| definitions of OS, and I bet they have internal tooling
| for datacenter-scale training that's not represented
| here.
| yjftsjthsd-h wrote:
| > The file I linked to, model.py, contains the Llama 3
| model itself.
|
| That makes it source available (
| https://en.wikipedia.org/wiki/Source-available_software
| ), not open source
| macrolime wrote:
| Source available means you can see the source, but not
| modify it. This is kinda the opposite, you can modify the
| model, but you don't see all the details of its creation.
| Flimm wrote:
| No, it's not. The Llama 3 Community License Agreement is not
| an open source license. Open source licenses need to meet the
| criteria of the only widely accepted definition of "open
| source", and that's the one formulated by the OSI [0]. This
| license has multiple restrictions on use and distribution
| which make it not open source. I know Facebook keeps calling
| this stuff open source, maybe in order to get all the good
| will that open source branding gets you, but that doesn't
| make it true. It's like a company calling their candy vegan
| while listing one its ingredients as pork-based gelatin. No
| matter how many times the company advertises that their
| product is vegan, it's not, because it doesn't meet the
| definition of vegan.
|
| [0] - https://opensource.org/osd
| CamperBob2 wrote:
| _Open source licenses need to meet the criteria of the only
| widely accepted definition of "open source", and that's the
| one formulated by the OSI [0]_
|
| Who died and made OSI God?
| vbarrielle wrote:
| The OSI was created about 20 years ago and defined and
| popularized the term open source. Their definition has
| been widely accepted over that period.
|
| Recently, companies are trying to market things as open
| source when in reality, they fail to adhere to the
| definition.
|
| I think we should not let these companies change the
| meaning of the term, which means it's important to
| explain every time they try to seem more open than they
| are.
|
| I'm afraid the battle is being lost though.
| Suppafly wrote:
| >The OSI was created about 20 years ago and defined and
| popularized the term open source. Their definition has
| been widely accepted over that period.
|
| It was defined and accepted by the community well before
| OSI came around though.
| MaxBarraclough wrote:
| This isn't helpful. The community defers to the OSI's
| definition because it captures what they care about.
|
| We've seen people try to deceptively describe non-OSS
| projects as open source, and no doubt we will continue to
| see it. Thankfully the community (including Hacker News)
| is quick to call it out, and to insist on not cheapening
| the term.
|
| This is one the topics that just keeps turning up:
|
| * https://news.ycombinator.com/item?id=24483168
|
| * https://news.ycombinator.com/item?id=31203209
|
| * https://news.ycombinator.com/item?id=36591820
| 8note wrote:
| Isn't the MIT license the generally accepted "open source"
| license? It's a community owned term, not OSI owned
| henryfjordan wrote:
| There are more licenses than just MIT that are "open
| source". GPL, BSD, MIT, Apache, some of the Creative
| Commons licenses, etc. MIT has become the defacto default
| though
|
| https://opensource.org/license (linking to OSI for the
| list because it's convenient, not because they get to
| decide)
| yjftsjthsd-h wrote:
| MIT is _a_ permissive open source license, not _the_ open
| source license.
| stale2002 wrote:
| Ok call it Open Weights then if the dictionary definitions
| matter so much to you.
|
| The actual point that matters is that these models are
| available for most people to use for a lot of stuff, and this
| is way way better than what competitors like OpenAI offer.
| the8thbit wrote:
| They don't "[allow] developers to modify its code however
| they want", which is a critical component of "open source",
| and one that Meta is clearly trying to leverage in branding
| around its products. I would like _them_ to start calling
| these "public weight models", because what they're doing now
| is muddying the waters so much that "open source" now just
| means providing an enormous binary and an open source harness
| to run it in, rather than serving access to the same binary
| via an API.
| Voloskaya wrote:
| Feels a bit like you are splitting hair for the pleasure of
| semantic arguments to be honest. Yes there are no source in
| ML, so if we want to be pedantic it shouldn't be called
| open source. But what really matters in the open source
| movement is that we are able to take a program built by
| someone and modify it to do whatever we want with it,
| without having to ask someone for permission or get
| scrutinized or have to pay someone.
|
| The same applies here, you can take those models and modify
| them to do whatever you want (provided you know how to
| train ML models), without having to ask for permission, get
| scrutinized or pay someone.
|
| I personally think using the term open source is fine, as
| it conveys the intent correctly, even if, yes, weights are
| not sources you can read with your eyes.
| wrs wrote:
| Calling that "open source" renders the word "source"
| meaningless. By your definition, I can release a binary
| executable freely and call it "open source" because you
| can modify it to do whatever you want.
|
| Model weights are like a binary that _nobody_ has the
| source for. We need another term.
| Voloskaya wrote:
| No it's not the same as releasing a binary, feels like we
| can't get out of the pedantics. I can in theory modify a
| binary to do whatever I want. In practice it is
| intractably hard to make any significant modification to
| a binary, and even if you could, you would then not be
| legally allowed to e.g. redistribute.
|
| Here, modifying that model is not harder that doing
| regular ML, and I can redistribute.
|
| Meta doesn't have access to some magic higher level
| abstraction for that model that would make working with
| it easier that they did not release.
|
| The sources in ML are the architecture the training and
| inference code and a paper describing the training
| procedure. It's all there.
| bornfreddy wrote:
| "Public weight models" sounds about right, thanks for
| coming up with a good term! Hope it catches.
| input_sh wrote:
| Open Source Initiative (kind of a de-facto authority on what's
| open source and what not) is spending a whole lot of time
| figuring out what it means for an AI system to be open source.
| In other words, they're basically trying to come up with a new
| license because the existing ones can't easily apply.
|
| I believe this is the current draft:
| https://opensource.org/deepdive/drafts/the-open-source-ai-de...
| downWidOutaFite wrote:
| OSI made themselves the authority because they hated Richard
| Stallman and his Free Software movement. It's just marketing.
| Zambyte wrote:
| > If so, then how can current ML models be open source?
|
| The source of a language model is the text it was trained on.
| Llama models are not open source (contrary to their claims),
| they are open weight.
| thayne wrote:
| I think it would also include the code used to train it
| moffkalast wrote:
| You can find the entire Llama 3.0 pretraining set here:
| https://huggingface.co/datasets/HuggingFaceFW/fineweb
|
| 15T tokens, 45 terrabytes. Seems fairly open source to me.
| Zambyte wrote:
| Where has Facebook linked that? I can't find anywhere that
| they actually published that.
| Oras wrote:
| This is obviously good news, but __personally__ I feel the open-
| source models are just trying to catch up with whoever the market
| leader is, based on some benchmarks.
|
| The actual problem is running these models. Very few companies
| can afford the hardware to run these models privately. If you run
| them in the cloud, then I don't see any potential financial gain
| for any company to fine-tune these huge models just to catch up
| with OpenAI or Anthropic, when you can probably get a much better
| deal by fine-tuning the closed-source models.
|
| Also this point:
|
| > We need to protect our data. Many organizations handle
| sensitive data that they need to secure and can't send to closed
| models over cloud APIs.
|
| First, it's ironic that Meta is talking about privacy. Second,
| most companies will run these models in the cloud anyway. You can
| run OpenAI via Azure Enterprise and Anthropic on AWS Bedrock.
| simonw wrote:
| "Very few companies can afford the hardware to run these models
| privately."
|
| I can run Llama 3 70B on my (64GB RAM M2) laptop. I haven't
| tried 3.1 yet but I expect to be able to run that 70B model
| too.
|
| As for the 405B model, the Llama 3.1 announcement says:
|
| > To support large-scale production inference for a model at
| the scale of the 405B, we quantized our models from 16-bit
| (BF16) to 8-bit (FP8) numerics, effectively lowering the
| compute requirements needed and allowing the model to run
| within a single server node.
| InDubioProRubio wrote:
| CrowdStrike just added "Centralized Company Controlled Software
| Ecosystem" to every risk data sheet on the planet. Everything
| futureproof is self-hosted and open source.
| mesebrec wrote:
| Note that Meta's models are not open source in any interpretation
| of the term.
|
| * You can't use them for any purpose. For example, the license
| prohibits using these models to train other models. * You can't
| meaningfully modify them given there is almost no information
| available about the training data, how they were trained, or how
| the training data was processed.
|
| As such, the model itself is not available under an open source
| license and the AI does not comply with the "open source AI"
| definition by OSI.
|
| It's an utter disgrace for Meta to write such a blogpost patting
| themselves on the back while lying about how open these models
| are.
| ChadNauseam wrote:
| > you can't meaningfully modify them given there is almost no
| information available about the training data, how they were
| trained, or how the training data was processed.
|
| I was under the impression that you could still fine-tune the
| models or apply your own RLHF on top of them. My understanding
| is that the training data would mostly be useful for training
| the model yourself from scratch (possibly after modifying the
| training data), which would be extremely expensive and out of
| reach for most people
| mesebrec wrote:
| Indeed, fine-tuning is still possible, but you can only go so
| far with fine-tuning before you need to completely retrain
| the model.
|
| This is why Silo AI, for example, had to start from scratch
| to get better support for small European languages.
| chasd00 wrote:
| From what i understand the training data and careful curation
| of it is the hard part. Everyone wants training data sets to
| train their own models instead of producing their own.
| causal wrote:
| You are definitely allowed to train other models with these
| models, you just have to give credit in the name, per the
| license:
|
| > If you use the Llama Materials or any outputs or results of
| the Llama Materials to create, train, fine tune, or otherwise
| improve an AI model, which is distributed or made available,
| you shall also include "Llama" at the beginning of any such AI
| model name.
| mesebrec wrote:
| Indeed, this is something they changed in the 3.1 version of
| the license.
|
| Regardless, the license [1] still has many restrictions, such
| as the acceptable use policy [2].
|
| [1] https://huggingface.co/meta-llama/Meta-
| Llama-3.1-8B/blob/mai...
|
| [2] https://llama.meta.com/llama3_1/use-policy
| tw04 wrote:
| >In the early days of high-performance computing, the major tech
| companies of the day each invested heavily in developing their
| own closed source versions of Unix.
|
| Because they sold the resultant code and systems built on it for
| money... this is the gold miner saying that all shovels and jeans
| should be free.
|
| Am I happy Facebook open sources some of their code? Sure, I
| think it's good for everyone. Do I think they're talking out of
| both sides of their mouth? Absolutely.
|
| Let me know when Facebook opens up the entirety of their Ad and
| Tracking platforms and we can start talking about how it's silly
| for companies to keep software closed.
|
| I can say with 100% confidence if Facebook were selling their AI
| advances instead of selling the output it produces, they wouldn't
| be advocating for everyone else to open source their stacks.
| JumpCrisscross wrote:
| > _if Facebook were selling their AI advances instead of
| selling the output it produces, they wouldn 't be advocating
| for everyone else to open source their stack_
|
| You're acting as if commoditizing one's complements is either
| new or reprehensible [1].
|
| [1] https://gwern.net/complement
| tw04 wrote:
| >You're acting as if commoditizing one's complements is
| either new or reprehensible [1].
|
| I'm acting as if calling on other companies to open source
| their core product, just because it's a complement for you,
| and acting as if it's for the benefit of mankind is
| disingenuous, which it is.
| stale2002 wrote:
| > as if it's for the benefit of mankind
|
| But it does benefit mankind.
|
| More free tech products is good for the world.
|
| This is a good thing. When people or companies do good
| things, they should get the credit for doing good things.
| JumpCrisscross wrote:
| > _acting as if it 's for the benefit of mankind is
| disingenuous, which it is_
|
| Is it bad for mankind that Meta publishes its weights?
| Mutually beneficial is a valid game state--there is no
| moral law that requires anything good be made as a
| sacrifice.
| rvnx wrote:
| The source-code to Ad tracking platform is useless to users.
|
| At the end, it's actually Facebook doing the right thing
| (though they are known for being evil).
|
| It's a bit of an irony.
|
| The supposedly "good" and "open" people like Google or OpenAI,
| haven't given their model weights.
|
| A bit like Microsoft became the company that actually supports
| the whole open-source ecosystem with GitHub.
| tw04 wrote:
| >The source-code to Ad tracking platform is useless to users.
|
| It's absolutely not useless for developers looking to build a
| competing project.
|
| >The supposedly "good" and "open" people like Google or
| OpenAI, haven't given their model weights.
|
| Because they're monetizing it... the only reason Facebook is
| giving it away is because it's a complement to their core
| product of selling ads. If they were monetizing it, it would
| be closed source. Just like their Ads platform...
| abetusk wrote:
| Another case of "open-washing". Llama is not available open
| source, under the common definition of open source, as the
| license doesn't allow for commercial re-use by default [0].
|
| They provide their model, with weights and code, as "source
| available" and it looks like they allow for commercial use until
| a 700M monthly subscriber cap is surpassed. They also don't allow
| you to train other AI models with their model:
|
| """ ... v. You will not use the Llama Materials or any output or
| results of the Llama Materials to improve any other large
| language model (excluding Meta Llama 3 or derivative works
| thereof). ... """
|
| [0] https://github.com/meta-llama/llama3/blob/main/LICENSE
| sillysaurusx wrote:
| They cannot legally enforce this, because they don't have the
| rights to the content they trained it on. Whoever's willing to
| fund that court battle would likely win.
|
| There's a legal precedent that says hard work alone isn't
| enough to guarantee copyright, i.e. it doesn't matter that it
| took millions of dollars to train.
| whimsicalism wrote:
| i think these clauses are unenforceable. it's telling that OAI
| hasn't tried a similar suit despite multiple extremely well-
| known cases of competitors training on OAI outputs
| nuz wrote:
| Everyone complaining about not having data access: Remember that
| without meta you would have openai and anthropic and that's it.
| I'm really thankful they're releasing this, and the reason they
| can't release the data is obvious.
| mesebrec wrote:
| Without Meta, you would still have Mistral, Silo AI, and the
| many other companies and labs producing much more open models
| with similar performance.
| Invictus0 wrote:
| The irony of this letter being written by Mark Zuckerburg at
| Meta, while OpenAI continues to be anything but open, is richer
| than anyone could have imagined.
| 1024core wrote:
| "open source AI" ... "open" ... "open" ....
|
| And you can't even try it without an FB/IG account.
|
| Zuck will never change.
| causal wrote:
| I think you can use an HF account as well
| https://huggingface.co/meta-llama
| Gracana wrote:
| You can also wait a bit for someone to upload quantized
| variants, finetunes, etc, and download those. FWIW I'm not
| making a claim about the legality of that, just saying it's
| an easy way around needing to sign the agreement.
| CamperBob2 wrote:
| It doesn't require an account. You do have to fill in your name
| and email (and birthdate, although it seems to accept whatever
| you feed it.)
| mvkel wrote:
| It's a real shame that we're still calling Llama "open source"
| when at best it's "open weights."
|
| Not that anyone would go buy 100,000 H100s to train their own
| Llama, but words matter. Definitions matter.
| sidcool wrote:
| Honest question. As far as LLMs are concerned, isn't open
| weights same as open source?
| mesebrec wrote:
| Open source requires, at the very least, that you can use it
| for any purpose. This is not the case with Llama.
|
| The Llama license has a lot of restrictions, based on user
| base size, type of use, etc.
|
| For example you're not allowed to use Llama to train or
| improve other models.
|
| But it goes much further than that. The government of India
| can't use Llama because they're too large. Sex workers are
| not allowed to use Llama due to the acceptable use policy of
| the license. Then there is also the vague language
| probibiting discrimination, racism etc.. good luck getting
| something like that approved by your legal team.
| aloe_falsa wrote:
| GPL defines the "source code" of a work as the preferred form
| of the work for making modifications to it. If Meta released
| a petabyte of raw training data, would that really be easier
| to extend and adapt (as opposed to fine-tuning the weights)?
| paulhilbert wrote:
| No, I would argue that from the three main ingredients -
| training data, model source code and weights - weights are
| the furthest away from something akin to source code.
|
| They're more like obfuscated binaries. When it comes to fine-
| tuning only however things shift a little bit, yes.
| lolinder wrote:
| Source versus weights seems like a really pedantic distinction
| to make. As you say, the training code and training data would
| be worthless to anyone who doesn't have compute on the level
| that Meta does. Arguably, the weights are source code
| interpreted by an inference engine, and realistically it's the
| weights that someone is going to want to modify through fine-
| tuning, not the original training code and data.
|
| The far more important distinction is "open" versus "not open",
| and I disagree that we should cede that distinction while
| trying to fight for "source". The Llama license is restrictive
| in a number of ways (it incorporates an entire acceptable use
| policy) that make it most definitely not "open" in the
| customary sense.
| mvkel wrote:
| > training code and training data would be worthless to
| anyone who doesn't have compute on the level that Meta does
|
| I don't fully agree.
|
| Isn't that like saying *nix being open source is worthless
| unless you're planning to ship your own Linux distro?
|
| Knowing how the sausage is made is important if you're an
| animal rights activist.
| JamesBarney wrote:
| https://llama.meta.com/llama3_1/use-policy/
|
| The acceptable use policy is seems fine. Don't use it to
| break the law, solicit sex, kill people, or lie.
| lolinder wrote:
| It's fine in that I'm happy to use it and don't think I'll
| be breaking the terms anytime soon. It's not fine in that
| one of the primary things that makes open source open is
| that an open source license doesn't restrict groups of
| people or whole fields from usage of the software. The
| policy has a number of such blanket bans on industries,
| which, while reasonable, make the license not truly open.
| rybosworld wrote:
| Huge companies like facebook will often argue for solutions that
| on the surface, seem to be in the public interest.
|
| But I have strong doubts they (or any other company) actually
| believe what they are saying.
|
| Here is the reality:
|
| - Facebook is spending untold billions on GPU hardware.
|
| - Facebook is arguing in favor of open sourcing the models, that
| they spent billions of dollars to generate, for free...?
|
| It follows that companies with much smaller resources (money)
| will not be able to match what Facebook is doing. Seems like an
| attempt to kill off the competition (specifically, smaller
| organizations) before they can take root.
| Salgat wrote:
| The reason for Meta making their model open source is rather
| simple: They receive an unimaginable amount of free labor, and
| their license only excludes their major competitors to ensure
| mass adoption without benefiting their competition (Microsoft,
| Google, Alibaba, etc). Public interest, philanthropy, etc are
| just nice little marketing bonuses as far as they're concerned
| (otherwise they wouldn't be including this licensing
| restriction).
| noiseinvacuum wrote:
| All correct, Meta does obviously benefit.
|
| It's helpful to also look at what do the developers and
| companies (everyone outside of top 5/10 big tech companies)
| get out of this. They get open access to weights of SOTA LLM
| models that take billions of dollars to train and 10s of
| billions a year to run the AI labs that make these. They get
| the freedom to fine tune them, to distill them, and to host
| them on their own hardware in whatever way works best for
| their products and services.
| mattnewton wrote:
| I actually think this is one of the rare times where the small
| guys interests are aligned with Meta. Meta is scared of a world
| where they are locked out of LLM platforms, one where OpenAI
| gets to dictate rules around their use of the platform much
| like Apple and Google dictates rules around advertiser data and
| monetization on their mobile platforms. Small developers should
| be scared of a world where the only competitive LLMs are owned
| by those players too.
|
| Through this lense, Meta's actions make more sense to me. Why
| invest billions in VR/AR? The answer is simple, don't get
| locked out of the next platform, maybe you can own the next
| one. Why invest in LLMs? Again, don't get locked out. Google
| and OpenAi/Microsoft are far larger and ahead of Meta right now
| and Meta genuinely believes the best way to make sure they have
| an LLM they control is to make everyone else have an LLM they
| can control. That way community efforts are unified around
| their standard.
| mupuff1234 wrote:
| Sure, but don't you think the "not getting locked out" is
| just the pre-requisite for their eventual goal of locking
| everyone else out?
| yesco wrote:
| Does it really matter? Attributing goodwill to a company is
| like attributing goodwill to a spider that happens to clean
| up the bugs in your basement. Sure if they had the ability
| to, I'm confident Meta would try something like that, but
| they obviously don't, and will not for the foreseeable
| future.
|
| I have faith they will continue to do what's in their best
| interests and if their best interests happen to align with
| mine, then I will support that. Just like how I don't
| bother killing the spider in my basement because it helps
| clean up the other bugs.
| mupuff1234 wrote:
| But you also know that the spider has been laying eggs so
| you better have an extermination plan ready.
| noiseinvacuum wrote:
| If by "everyone else" here you mean 3 or 4 large players
| trying to create a regulatory moat around themselves then I
| am fine with them getting locked out and not being able to
| create a moat for next 3 decades.
| myaccountonhn wrote:
| > I actually think this is one of the rare times where the
| small guys interests are aligned with Meta
|
| Small guys are the ones being screwed over by AI companies
| and having their text/art/code stolen without any attribution
| or adherence to license. I don't think Meta is on their side
| at all
| MisterPea wrote:
| That's a separate problem which affects small to large
| players alike (e.g. ScarJo).
|
| Small companies interests are aligned with Meta as they are
| now on an equal footing with large incumbent players. They
| can now compete with a similarly sized team at a big tech
| company instead of that team + dozens of AI scientists
| ketzo wrote:
| Meta is, fundamentally, a user-generated-content distribution
| company.
|
| Meta wants to make sure they commoditize their complements:
| they don't want a world where OpenAI captures all the value of
| content generation, they want the cost of producing the best
| content to be as close to free as possible.
| chasd00 wrote:
| i was thinking along the same. A lot of content generated by
| LLMs is going to end up on Facebook or Instagram. The easier
| it is to create AI generated content the more content ends up
| on those applications.
| Nesco wrote:
| Especially because genAI is a copyright laundering system.
| You can train it on copyrighted material and none of the
| content generated with it are copyright-able, which is
| perfect for social apps
| KaiserPro wrote:
| The model it's self isn't actually that valuable to facebook.
| The thing that's important is the dataset, the infrastructure
| and the people to make the models.
|
| There is still, just about, a strong ethos( especially in the
| research teams) to chuck loads of stuff over the wall into
| opensource. (pytorch, detectron, SAM, aria etc)
|
| but its seen internally as a two part strategy:
|
| 1) strong recruitment tool (come work with us, we've done cool
| things, and you'll be able to write papers)
|
| 2) seeding the research community with a common toolset.
| jorblumesea wrote:
| Cynically I think this position is largely due to how they can
| undercut OpenAI's moat.
| wayeq wrote:
| It's not cynical, it's just an awareness that public companies
| have a fiduciary duty to their shareholders.
| cs702 wrote:
| _> We're releasing Llama 3.1 405B, the first frontier-level open
| source AI model, as well as new and improved Llama 3.1 70B and 8B
| models._
|
| _Bravo!_ While I don 't agree with Zuck's views and actions on
| many fronts, on this occasion I think he and the AI folks at Meta
| deserve our praise and gratitude. With this release, they have
| brought the cost of pretraining a frontier 400B+ parameter model
| to ZERO for pretty much everyone -- well, everyone _except_ Meta
| 's key competitors.[a] THANK YOU ZUCK.
|
| Meanwhile, the business-minded people at Meta surely won't mind
| if the release of these frontier models to the public happens to
| completely mess up the AI plans of competitors like
| OpenAI/Microsoft, Google, Anthropic, etc. Come to think of it,
| the negative impact on such competitors was likely a key
| motivation for releasing the new models.
|
| ---
|
| [a] The license is not open to the handful of companies worldwide
| which have more than 700M users.
| swyx wrote:
| > the AI folks at Meta deserve our praise and gratitude
|
| We interviewed Thomas who led Llama 2 and 3 post training here
| in case you want to hear from someone closer to the ground on
| the models https://www.latent.space/p/llama-3
| throwaway_2494 wrote:
| > We're releasing Llama 3.1 405B
|
| Is it possible to run this with ollama?
| jessechin wrote:
| Sure, if you have a H100 cluster. If you quant it to int4 you
| might get away with using only 4 H100 GPUs!
| sheepscreek wrote:
| Assuming $25k a pop, that's at least $100k in just the GPUs
| alone. Throw in their linking technology (NVLink) and cost
| for the remaining parts, won't be surprised if you're
| looking at $150k for such a cluster. Which is not bad to be
| honest, for something at this scale.
|
| Can anyone share the cost of their pre-built clusters,
| they've recently started selling? (sorry feeling lazy to
| research atm, I might do that later when I have more time).
| rty32 wrote:
| You can rent H100 GPUs.
| tomp wrote:
| you're about right.
|
| https://smicro.eu/nvidia-
| hgx-h100-640gb-935-24287-0001-000-1
|
| 8x H100 HGX cluster for EUR250k + VAT
| vorticalbox wrote:
| If you have the ram for it.
|
| Ollama will offload as many layers as it can to the gpu then
| the rest will run on the cpu/ram.
| tambourine_man wrote:
| Praising is good. Gratitude is a bit much. They got this big by
| selling user generated content and private info to the highest
| bidder. Often through questionable means.
|
| Also, the underdog always touts Open Source and standards, so
| it's good to remain skeptical when/if tables turn.
| sheepscreek wrote:
| All said and done, it is a very _expensive_ and balsy way to
| undercut competitors. They've spent > $5B on hardware alone,
| much of which will depreciate in value quickly.
|
| Pretty sure the only reason Meta's managed to do this is
| because of Zuck's iron grip on the board (majority voting
| rights). This is great for Open Source and regular people
| though!
| wrsh07 wrote:
| Zuck made a bet when they provisioned for reels to buy
| enough GPUs to be able to spin up another reels-sized
| service.
|
| Llama is probably just running on spare capacity (I mean,
| sure, they've kept increasing capex, but if they're worried
| about an llm-based fb competitor they sort of have to in
| order to enact their copycat strategy)
| fractalf wrote:
| Well, he didn't do it to be "nice", you can be sure about
| that. Obviously they see a financial gain
| somewhere/sometime
| tambourine_man wrote:
| At Meta level, spending $5B to stay competitive is not
| balsy. It's a bargain.
| ricardo81 wrote:
| >selling user generated content and private info to the
| highest bidder
|
| Was always their modus operandi, surely. How else would they
| have survived.
|
| Thanks for returning everyone else;s content and never mind
| all the content stealing your platform did.
| jart wrote:
| I'm perfectly happy with them draining the life essence out
| of the people crazy enough to still use Facebook, if they're
| funneling the profits into advancing human progress with AI.
| It's an Alfred Nobel kind of thing to do.
| germinalphrase wrote:
| "Come to think of it, the negative impact on such competitors
| was likely a key motivation for releasing the new models."
|
| "Commoditize Your Complement" is often cited here:
| https://gwern.net/complement
| tintor wrote:
| > they have brought the cost of pretraining a frontier 400B+
| parameter model to ZERO
|
| It is still far from zero.
| cs702 wrote:
| If the model is already pretrained, there's no need to
| pretrain it, so the cost of pretraining is zero.
| moffkalast wrote:
| Yeah but you only have the one model, and so far it seems
| to be only good on paper.
| pwdisswordfishd wrote:
| Makes me wonder why he's really doing this. Zuckerberg being
| Zuckerberg, it can't be out of any genuine sense of altruism.
| Probably just wants to crush all competitors before he
| monetizes the next generation of Meta AI.
| spiralk wrote:
| Its certainly not altruism. Given that Facebook/Meta owns the
| largest user data collection systems, any advancement in AI
| ultimately strengthens their business model (which is still
| mostly collecting private user data, amassing large user
| datasets, and selling targeting ads).
|
| There is a demo video that shows a user wearing a Quest VR
| headset and asks the AI "what do you see" and it interprets
| everything around it. Then, "what goes well with these
| shorts"... You can see where this is going. Wearing headsets
| with AIs monitoring everything the users see and collecting
| even more data is becoming normalized. Imagine the private
| data harvesting capabilities of the internet but anywhere in
| the physical world. People need not even choose to wear a
| Meta headset, simply passing a user with a Meta headset in
| public will be enough to have private data collected. This
| will be the inevitable result of vision models improvements
| integrated into mobile VR/AR headsets.
| goatlover wrote:
| That's very dystopian. It's bad enough having cameras
| everywhere now. I never opted in to being recorded.
| warkdarrior wrote:
| That sounds fantastic. If they make the Meta headset easy
| to wear and somewhat fashionable (closer to eyeglass than
| to a motorcycle helmet), I'd take it everywhere and record
| everything. Give me a retrospective search and
| conferences/meetings will be so much easier (I am terrible
| with names).
| phyrex wrote:
| You can always listen to the investor calls for the
| capitalist point of view. In short, attracting talent,
| building the ecosystem, and making it really easy for users
| to make stuff they want to share on Meta's social networks
| bun_at_work wrote:
| I really think the value of this for Meta is content
| generation. More open models (especially state of the art)
| means more content is being generated, and more content is
| being shared on Meta platforms, so there is more advertising
| revenue for Meta.
| chasd00 wrote:
| All the content generated by llms (good or bad) is going to
| end up back in Facebook/Instagram and other social media
| sites. This enables Meta to show growth and therefore demand
| a higher stock price. So it makes sense to get content
| generation tools out there as widely as possible.
| troupo wrote:
| There's nothing open source about it.
|
| It's a proprietary dump of data you can't replicate or verify.
|
| What were the sources? What datasets it was trained on? What
| are the training parameters? And so on and so on
| advael wrote:
| Look, absolutely zero people in the world should trust any tech
| company when they say they care about or will keep commitments
| to the open-source ecosystem in any capacity. Nevertheless, it
| is occasionally strategic for them to do so, and there can be
| ancillary benefits for said ecosystem in those moments where
| this is the best play for them to harm their competitors
|
| For now, Meta seems to release Llama models in ways that don't
| significantly lock people into their infrastructure. If that
| ever stops being the case, you should fork rather than trust
| their judgment. I say this knowing full well that most of the
| internet is on AWS or GCP, most brick and mortar businesses use
| Windows, and carrying a proprietary smartphone is essentially
| required to participate in many aspects of the modern economy.
| All of this is a mistake. You can't resist all lock-in. The
| players involved effectively run the world. You should still
| try where you can, and we should still be happy when tech
| companies either slip up or make the momentary strategic
| decision to make this easier
| ori_b wrote:
| > _If that ever stops being the case, you should fork rather
| than trust their judgment._
|
| Fork what? The secret sauce is in the training data and
| infrastructure. I don't think either of those is currently
| open.
| quasse wrote:
| I'm just a lowly outsider to the AI space, but calling
| these open source models seems kind of like calling a
| compiled binary open source.
|
| If you don't have a way to replicate what they did to
| create the model, it seems more like freeware than open
| source.
| advael wrote:
| As an ML researcher, I agree. Meta doesn't include
| adequate information to replicate the models, and from
| the perspective of fundamental research, the interest
| that big tech companies have taken in this field has been
| a significant impediment to independent researchers,
| despite the fact that they are undeniably producing
| groundbreaking results in many respects, due to this
| fundamental lack of openness
|
| This should also make everyone very skeptical of any
| claim they are making, from benchmark results to the
| legalities involved in their training process to the
| prospect of future progress on these models. Without
| being able to vet their results against the same datasets
| they're using, there is no way to verify what they're
| saying, and the credulity that otherwise smart people
| have been exhibiting in this space has been baffling to
| me
|
| As a developer, if you have a working Llama model,
| including the source code and weights, and it's crucial
| for something you're building or have already built, it's
| still fundamentally a good thing that Meta isn't gating
| it behind an API and if they went away tomorrow, you
| could still use, self-host, retrain, and study the models
| warkdarrior wrote:
| The model is public, so you can at least verify their
| benchmark claims.
| Nuzzerino wrote:
| Which option would be better?
|
| A) Release the data, and if it ends up causing a privacy
| scandal, at least you can actually call it open this
| time.
|
| B) Neuter the dataset, and the model
|
| All I ever see in these threads is a lot of whining and
| no viable alternative solutions (I'm fine with the idea
| of it being a hard problem, but when I see this attitude
| from "researchers" it makes me less optimistic about the
| future)
|
| > and the credulity that otherwise smart people have been
| exhibiting in this space has been baffling to me
|
| Remove the "otherwise" and you're halfway to
| understanding your error.
| Nuzzerino wrote:
| > it seems more like freeware than open source.
|
| What would you have them do instead? Specifically?
| wongarsu wrote:
| > If you don't have a way to replicate what they did to
| create the model, it seems more like freeware
|
| Isn't that a bit like arguing that a linux kernel driver
| isn't open source if I just give you a bunch of GPL-
| licensed source code that speaks to my device, but no
| documentation how my device works? If you take away the
| source code you have no way to recreate it. But so far
| that never caused anyone to call the code not open-
| source. The closest is the whole GPL3 Tivoization debate
| and that was very divisive.
|
| The heart of the issue is that open source is kind of
| hard to define for anything that isn't software. As a
| proxy we could look at Stallman's free software
| definition. Free software shares a common history with
| open source and in most open source software is
| free/libre, and the other way around, so this might be a
| useful proxy.
|
| So checking the four software freedoms:
|
| - The freedom to run the program as you wish, for any
| purpose: For most purposes. There's that 700M user
| restriction, also Meta forbids breaking the law and
| requires you to follow their acceptable use policy.
|
| - The freedom to study how the program works, and change
| it so it does your computing as you wish: yes. You can
| change it by fine tuning it, and the weights allow you to
| figure out how it works. At least as well as anyone knows
| how any large neural network works, but it's not like
| Meta is keeping something from you here
|
| - The freedom to redistribute copies so you can help your
| neighbor: Allowed, no real asterisks
|
| - The freedom to distribute copies of your modified
| versions to others: Yes
|
| So is it Free Software(tm)? Not really, but it is pretty
| close.
| JKCalhoun wrote:
| A good point.
|
| Forgive me, I am AI naive, is there some way to harness
| Llama to train ones own actually-open AI?
| advael wrote:
| Kinda. Since you can self-host the model on a linux
| machine, there's no meaningful way for them to prevent
| you from having the trained weights. You can use this to
| bootstrap other models, or retrain on your own datasets,
| or fine-tune from the starting point of the currently-
| working model. What you can't do is be sure what they
| trained it on
| QuercusMax wrote:
| How open is it _really_ though? If you 're starting from
| their weights, do you actually have legal permission to
| use derived models for commercial purposes? If it turns
| out that Meta used datasets they didn't have licenses to
| use in order to generate the model, then you might be in
| a big heap of mess.
| ein0p wrote:
| I could be wrong but most "model" licenses prohibit the
| use of the models to improve other models
| logicchains wrote:
| They actually did open source the infrastructure library
| they developed. They don't open source the data but they
| describe how they gathered/filtered it.
| ladzoppelin wrote:
| Is forking really possible with an LLM or one the size of
| future Lama versions, have they even released the weights and
| everything? Maybe I am just negative about it because I feel
| Meta is the worst company ever invented and feel this will
| hurt society in the long run just like Facebook.
| lawlessone wrote:
| > have they even released the weights?
|
| Isn't that what the model is? just a collection weights?
| pmarreck wrote:
| When you run `ollama pull llama3.1:70b`, which you can
| literally do right now (assuming ollama.com is installed
| and you're not afraid of the terminal), and it downloads a
| 40 gigabyte model, _that is the weights_!
|
| I'd consider the ability to admit when even your most hated
| adversary is doing something right, a hallmark of acting
| smarter.
|
| Now, they haven't released the training data with the model
| weights. THAT plus the training tooling would be "end to
| end open source". Apple actually did _that very thing_
| recently, and it flew under almost everyone 's radar for
| some reason:
|
| https://x.com/vaishaal/status/1813956553042711006?s=46&t=qW
| a...
| mym1990 wrote:
| Doing something right vs doing something that seems right
| but has a hidden self interest that is harmful in the
| long run can be vastly different things. Often this kind
| of strategy will allow people to let their guard down,
| and those same people will get steamrolled down the road,
| left wondering where it all went wrong. Get smarter.
| pmarreck wrote:
| How in the heck is an open source model that is free and
| open today going to lock me down, down the line? This is
| nonsense. You can literally run this model forever if you
| use NixOS (or never touch your windows, macos or linux
| install again). Zuck can't come back and molest it. Ever.
|
| The best I can tell is that their self-interest here is
| more about gathering mindshare. That's not a terrible
| motive; in fact, that's a pretty decent one. It's not the
| bully pressing you into their ecosystem with a tit-for-
| tat; it's the nerd showing off his latest and going
| "Here. Try it. Join me. Join us."
| holoduke wrote:
| In tech you can trust the underdogs. Once they turn into
| dominant players they turn evil. 99% of the cases.
| sandworm101 wrote:
| >> Bravo! While I don't agree with Zuck's views and actions on
| many fronts, on this occasion I think he and the AI folks at
| Meta deserve our praise and gratitude.
|
| Nope. Not one bit. Supporting F/OSS when it suits you in one
| area and then being totally dismissive of it in _every other
| area_ should not be lauded. How about open sourcing some of FB
| 's VR efforts?
| y04nn wrote:
| Don't be fooled, it is a "embrace extend extinguish" strategy.
| Once they have enough usage and be the default standard they
| will start to find any possible ways to make you pay.
| war321 wrote:
| Hasn't really happened with PyTorch or any of their other
| open sourced releases tbh.
| tyler-jn wrote:
| So far, it seems like this release has done ~nothing to the
| stock price for GOOGL/MSFT, which we all know has been propped
| up largely on the basis of their AI plans. So it's probably
| premature to say that this has messed it up for them.
| userabchn wrote:
| Interview with Mark Zuckerberg released today:
| https://www.bloomberg.com/news/videos/2024-07-23/mark-zucker...
| starship006 wrote:
| > Our adversaries are great at espionage, stealing models that
| fit on a thumb drive is relatively easy, and most tech companies
| are far from operating in a way that would make this more
| difficult.
|
| Mostly unrelated to the correctness of the article, but this
| feels like a bad argument. AFAIK, Anthropic/OpenAI/Google are not
| having issues with their weights being leaked (are they?). Why is
| it that Meta's model weights are?
| meowface wrote:
| >AFAIK, Anthropic/OpenAI/Google are not having issues with
| their weights being leaked. Why is it that Meta's model weights
| are?
|
| The main threat actors there would be powerful nation-states,
| in which case they'd be unlikely to leak what they've taken.
|
| It is a bad argument though, because one day possession of AI
| models (and associated resources) might confer great and
| dangerous power, and we can't just throw up our hands and say
| "welp, no point trying to protect this, might as well let
| everyone have it". I don't think that'll happen anytime soon,
| but I am personally somewhat in the AI doomer camp.
| whimsicalism wrote:
| We have no way of knowing whether nation-state level actors
| have access to those weights.
| skybrian wrote:
| I think it's hard to say. We simply don't know much from the
| outside. Microsoft has had some pretty bad security lapses, for
| example around guarding access to Windows source code. I don't
| think we've seen a bad security break-in at Google in quite a
| few years? It would surprise me if Anthropic and OpenAI had
| good security since they're pretty new, and fast-growing
| startups have a lot of organizational challenges.
|
| It seems safe to assume that not all the companies doing
| leading-edge LLM's have good security and that the industry as
| a whole isn't set up to keep secrets for long. Things aren't
| locked down to the level of classified research. And it sounds
| like Zuckerberg doesn't want to play the game that way.
|
| At the state level, China has independent AI research efforts
| and they're going to figure it out. It's largely a matter of
| timing, which could matter a lot.
|
| There's still an argument to be made against making
| proliferation too easy. Just because states have powerful
| weapons doesn't mean you want them in the hands of people on
| the street.
| dfadsadsf wrote:
| We have nationals/citizens of every major US adversary working
| in those companies with looser security practice than security
| at local warehouse. Security check before hiring is a joke
| (mostly checks that resume checks out), laptops can be taken
| home and internal communication are not segmented on need to
| know basis. Essentially if China wants weights or source code,
| it will have hundreds of people to choose from who can provide
| it.
| probablybetter wrote:
| I would avoid Facebook and Meta products in general. I do NOT
| trust them. We have approx. 20 years of their record to go upon.
| diggan wrote:
| > Today we're taking the next steps towards open source AI
| becoming the industry standard. We're releasing Llama 3.1 405B,
| the first frontier-level open source AI model,
|
| Why do people keep mislabeling this as Open Source? The whole
| point of calling something Open Source is that the "magic sauce"
| of how to build something is publicly available, so I could built
| it myself if I have the means. But without the training data
| publicly available, could I train Llama 3.1 if I had the means?
| No wonder Zuckerberg doesn't start with defining what Open Source
| actually means, as then the blogpost would have lost all meaning
| from the get go.
|
| Just call it "Open Model" or something. As it stands right now,
| the meaning of Open Source is being diluted by all these
| companies pretending to doing one thing, while actually doing
| something else.
|
| I initially got very exciting seeing the title and the domain,
| but hopelessly sad after reading through the article and
| realizing they're still trying to pass their artifacts off as
| Open Source projects.
| valine wrote:
| The codebase to do the training is way less valuable than the
| weights for the vast majority of people. Releasing the training
| code would be nice, but it doesn't really help anyone but
| Meta's direct competitors.
|
| If you want to train on top of Llama there's absolutely nothing
| stopping you. Plenty of open source tools to do parameter
| optimization.
| diggan wrote:
| Not just the training code but the training data as well,
| should be under a permissive license, otherwise you cannot
| call the project itself Open Source, which Facebook does
| here.
|
| > is way less valuable than the weights for the vast majority
| of people
|
| The same is true for most Open Source projects, most people
| use the distributed binaries or other artifacts from the
| projects, and couldn't care less about the code itself. But
| that doesn't warrant us changing the meaning of Open Source
| just because companies feel like it's free PR.
|
| > If you want to train on top of Llama there's absolutely
| nothing stopping you.
|
| Sure, but in order for the intent of Open Source to be true
| for Llama, I should be able to build this project from
| scratch. Say I have a farm of 100 A100's, could I reproduce
| the Llama model from scratch today?
| unshavedyak wrote:
| > Not just the training code but the training data as well,
| should be under a permissive license, otherwise you cannot
| call the project itself Open Source, which Facebook does
| here.
|
| Does FB even have the capability to do that? I'd assume
| there's a bunch of data that's not theirs and they can't
| even release it. Let alone some data that they might not
| want to admit is in the source.
| bornfreddy wrote:
| If not, it is questionable if they should train on such
| data anyway.
|
| Also, that doesn't matter in this discussion - if you are
| unable to release the source under appropriate licence
| (for whatever reason), you should not call it Open
| Source.
| talldayo wrote:
| I will steelman the idea that a tokenizer and weights are
| all you need for the "source" of an LLM. They are
| components that can be modified, redistributed and when put
| together, reproduce the full experience intended.
|
| If we _insist_ upon the release of training data with Open
| models, you might as well kiss the idea of usable Open LLMs
| out the door. Most of the content in training datasets like
| The Pile are not licensed for redistribution in any way
| shape or form. It would jeopardize projects that _do_ use
| transparent training data while not offering anything of
| value to the community compared to the training code.
| Republishing all training data is an absolute trap.
| enriquto wrote:
| > Most of the content in training datasets like The Pile
| are not licensed for redistribution in any way shape or
| form.
|
| But distributing the weights is a "form" of distribution.
| You can recover many items of the dataset (most easily,
| the outliers) by using the weights.
|
| Just because they are codified in a non-readily
| accessible way, does not mean that you are not
| distributing them.
|
| It's scary to think that "training" is becoming a thinly
| veiled way to strip copyright of works.
| talldayo wrote:
| The weights are a transformed, lossy and non-complete
| permutation of the training material. You _cannot_
| recover most of the dataset reliably, which is what stops
| it from being an outright replacement for the work it 's
| trained on.
|
| > does not mean that you are not distributing them.
|
| Except you literally aren't distributing them. It's like
| accusing me of pirating a movie because I sent a
| screenshot or a scene description to my friend.
|
| > It's scary to think that "training" is becoming a
| thinly veiled way to strip copyright of works.
|
| This is the way it's been for years. Google is given Fair
| Use for redistributing incomplete parts of copywritten
| text materials verbatim, since their application is
| transformative: https://en.wikipedia.org/wiki/Authors_Gui
| ld,_Inc._v._Google,....
|
| Or Corellium, who won their case to use copywritten Apple
| code in novel and transformative ways: https://www.forbes
| .com/sites/thomasbrewster/2023/12/14/apple...
|
| Copyright has always been a limited power.
| jncfhnb wrote:
| People don't typically modify distributed binaries.
|
| People do typically modify model weights. They are the
| preferred form to modify model.
|
| Saying "build" llama is just a nonsense comparison to
| traditional compiled software. "Building llama" is more
| akin to taking the raw weights as text and putting them
| into a nice pickle file. Or loading it into an inference
| engine.
|
| Demanding that you have everything needed to recreate the
| weights from scratch is like arguing an application cannot
| be open source unless it also includes the user testing
| history and design documents.
|
| And of course some idiots don't understand what a pickled
| weights file is and claim it's as useless as a distributed
| binary if you want to modify the program just because it is
| technically compiled; not understanding that the point of
| the pickled file is "convenience" and that it unpacks back
| to the original form. Like arguing open source software
| can't be distributed in zip files.
|
| > Say I have a farm of 100 A100's, could I reproduce the
| Llama model from scratch today?
|
| Say you have a piece of paper. Can you reproduce
| `print("hello world")` from scratch?
| vngzs wrote:
| Agreed. The Linux kernel source contains everything you need to
| produce Linux kernel binaries. The llama source does not
| contain what you need to produce llama models. Facebook is
| using sleight of hand to garner favor with open model weights.
|
| Open model weights are still commendable, but it's a far cry
| from open-source (or even _libre_ ) software!
| elromulous wrote:
| 100%. With this licensing model, meta gets to reap the benefits
| of open source (people contributing, social cachet), without
| any of the real detriment (exposing secret sauce).
| hbn wrote:
| Is that even something they keep on hand? Or would WANT to keep
| on hand? I figured they're basically sending a crawler to go
| nuts reading things and discard the data once they've trained
| on it.
|
| If that included, e.g. reading all of Github for code, I
| wouldn't expect them to host an entire separate read-only copy
| of Github because they trained on it and say "this is part of
| our open source model"
| jdminhbg wrote:
| > Why do people keep mislabeling this as Open Source? The whole
| point of calling something Open Source is that the "magic
| sauce" of how to build something is publicly available, so I
| could built it myself if I have the means. But without the
| training data publicly available, could I train Llama 3.1 if I
| had the means?
|
| I don't think not releasing the commit history of a project
| makes it not Open Source, this seems like that to me. What's
| important is you can download it, run it, modify it, and re-
| release it. Being able to see how the sausage was made would be
| interesting, but I don't think Meta have to show their training
| data any more than they are obligated to release their planning
| meeting notes for React development.
|
| Edit: I think the restrictions in the license itself are good
| cause for saying it shouldn't be called Open Source, fwiw.
| thenoblesunfish wrote:
| You don't need to have the commit history to see "how it
| works". ML that works well does so in huge part due to the
| training data used. The leading models today aren't
| distinguished by the way they're trained, but what they're
| trained on.
| jdminhbg wrote:
| I agree that you need training data to build AI from
| scratch, much like you need lots of really smart developers
| and a mailing list and servers and stuff to build the Linux
| kernel from scratch. But it's not like having the training
| data and training code will get you the same result, in the
| way something like open data in science is about
| replicating results.
| tempfile wrote:
| For the freedom to change to be effective, a user must be
| given the software in a form they can modify. Can you tweak
| an LLM once it's built? (I genuinely don't know the answer)
| jdminhbg wrote:
| Yes, you can finetune Llama:
| https://llama.meta.com/docs/how-to-guides/fine-tuning/
| diggan wrote:
| > I don't think not releasing the commit history of a project
| makes it not Open Source,
|
| Right, I'm not talking about the commit history, but rather
| that anyone (with means) should be able to produce the final
| artifact themselves, if they want. For weights like this,
| that requires at least the training script + the training
| data. Without that, it's very misleading to call the project
| Open Source, when only the result of the training is
| released.
|
| > What's important is you can download it, run it, modify it,
| and re-release it
|
| But I literally cannot download the project, build it and run
| it myself? I can only use the binaries (weights) provided by
| Meta. No one can modify how the artifact is produced, only
| modify the already produced artifact.
|
| That's like saying that Slack is Open Source because if I
| want to, I could patch the binary with a hex editor and
| add/remove things as I see fit? No one believes Slack should
| be called Open Source for that.
| jdminhbg wrote:
| > Right, I'm not talking about the commit history, but
| rather that anyone (with means) should be able to produce
| the final artifact themselves, if they want. For weights
| like this, that requires at least the training script + the
| training data.
|
| You cannot produce the final artifact with the training
| script + data. Meta also cannot reproduce the current
| weights with the training script + data. You could produce
| some other set of weights that are just about as good, but
| it's not a deterministic process like compiling source
| code.
|
| > That's like saying that Slack is Open Source because if I
| want to, I could patch the binary with a hex editor and
| add/remove things as I see fit? No one believes Slack
| should be called Open Source for that.
|
| This analogy doesn't work because it's not like Meta can
| "patch" Llama any more than you can. They can only finetune
| it like everyone else, or produce an entirely different LLM
| by training from scratch like everyone else.
|
| The right to release your changes is another difference; if
| you patch Slack with a hex editor to do some useful thing,
| you're not allowed to release that changed Slack to others.
|
| If Slack lost their source code, went out of business, and
| released a decompiled version of the built product into the
| public domain, that would in some sense be "open source,"
| even if not as good as something like Linux. LLMs though do
| not have a source code-like representation that is easily
| and deterministically modifiable like that, no matter who
| the owner is or what the license is.
| unraveller wrote:
| Open-weights is not open-source, for sure, but I don't mind it
| being stated as an aspiration goal, the moment it is legally
| possible to publish a source without shooting themselves in the
| foot they should do it.
|
| They could release 50% of their best data but that would only
| stop them from attracting the best talent.
| JeremyNT wrote:
| > _Why do people keep mislabeling this as Open Source?_
|
| I guess this is a rhetorical question, but this is a press
| release from Meta itself. It's just a marketing ploy, of
| course.
| blcknight wrote:
| InstructLab and the Granite Models from IBM seem the closest to
| being open source. Certainly more than whatever FB is doing
| here.
|
| (Disclaimer: I work for an IBM subsidiary but not on any of
| these products)
| hubraumhugo wrote:
| The big winners of this: devs and AI startups
|
| - No more vendor lock-in
|
| - Instead of just wrapping proprietary API endpoints, developers
| can now integrate AI deeply into their products in a very cost-
| effective and performant way
|
| - Price race to the bottom with near-instant LLM responses at
| very low prices are on the horizon
|
| As a founder, it feels like a very exciting time to build a
| startup as your product automatically becomes better, cheaper,
| and more scalable with every major AI advancement. This leads to
| a powerful flywheel effect: https://www.kadoa.com/blog/ai-
| flywheel
| danielmarkbruce wrote:
| It creates the opposite of a flywheel effect for you. It
| creates a leapfrog effect.
| boringg wrote:
| AI might cannabalize a lot of first gen AI businesses.
| boringg wrote:
| - Price race to the bottom with near-instant LLM responses at
| very low prices are on the horizon
|
| Maybe a big price war while the market majors fight out for
| positioning but they still need to make money off their
| investments so someone is going to have to raise prices at some
| point and youll be locked into their system if you build on it.
| mav3ri3k wrote:
| I am not deep into llms so I ask this. From my understanding,
| their last model was open source but it was in a way that you can
| use them but the inner working were "hidden"/not transparent.
|
| With the new model, I am seeing alot of how open source they are
| and can be build upon. Is it now completely open source or
| similar to their last models ?
| whimsicalism wrote:
| It's intrinsic to transformers that the inner workings are
| largely inscrutable. This is no different, but it does not mean
| they cannot be built upon.
|
| Gradient descent works on these models just like the prior
| ones.
| carimura wrote:
| Looks like you can already try out Llama-3.1-405b on Groq,
| although it's timing out. So. Hugged I guess.
| TechDebtDevin wrote:
| All the big providers should have it up by end of day. They
| just change their API configs (they're just reselling you AWS
| Bedrock).
| jamiedg wrote:
| 405B and the other Llama 3.1 models are working and available
| on Together AI. https://api.together.ai
| mensetmanusman wrote:
| It's easy to support open source AI when the code is 1,000 lines
| and the execution costs $100,000,000 of electricity.
|
| Only the big players can afford to push go, and FB would love to
| see OpenAI's code so they can point it to their proprietary user
| data.
| bun_at_work wrote:
| Meta makes their money off advertising, which means they profit
| from attention.
|
| This means they need content that will grab attention, and
| creating open source models that allow anyone to create any
| content on their own becomes good for Meta. The users of the
| models can post it to their Instagram/FB/Threads account.
|
| Releasing an open model also releases Meta from the burden of
| having to police the content the model generates, once the open
| source community fine-tunes the models.
|
| Overall, this move is good business move for Meta - the post
| doesn't really talk about the true benefit, instead moralizing
| about open source, but this is a sound business move for Meta.
| jklinger410 wrote:
| This is a great point. Eventually, META will only allow LLAMA
| generated visual AI content on its platforms. They'll put a
| little key in the image that clears it with the platform.
|
| Then all other visual AI content will be banned. If that is
| where legislation is heading.
| natural219 wrote:
| AI moderators too would be an enormous boon if they could get
| that right.
| KaiserPro wrote:
| It would be good, but the cost per moderation is still really
| high for it to be practical.
| noiseinvacuum wrote:
| Creating content with AI will surely be helpful for social
| media to some extent but I think it's not that important in
| larger scheme of things, there's already a vast sea of content
| being created by humans and differentiation is already in
| recommending the right content to right people at right time.
|
| More important is the products that Meta will be able to make
| if the industry standardizes on Llama. They would have the
| front seat in not just with access the latest unreleased models
| but also settings the direction of progress and next gen LLM
| optimizes for. If you're Twitter or Snap or TikTok or compete
| with Meta on the product then good luck in trying to keep up.
| apwell23 wrote:
| I am not sure I follow this.
|
| 1. Is there such a thing as 'attention grabbing AI content' ?
| Most AI content I see is the opposite of 'attention grabbing'.
| Kindle store is flooded with this garbage and none of it is
| particularly 'attention grabbing'.
|
| 2. Why would creation of such content, even if it was truly
| attention grabbing, benefit meta in particular ?
|
| 3. How would poliferation of AI content lead to more ad spend
| in the economy. Ad budgets won't increase because of AI
| content?
|
| To me this is typical Zuckerberg play. Attach metas name to
| whatever is trendy at the moment like ( now forgotten)
| metaverse, cryptocoins and bunch of other failed stuff that was
| trendy for a second. Meta is NOT an Gen AI company like he is
| scamming ( more like colluding) the market to believe. A mere
| distraction from slowing user growth on ALL of meta apps.
| bun_at_work wrote:
| Sure - there is plenty of attention grabbing AI content - it
| doesn't have to grab _your_ attention, and it won't work for
| everyone. I have seen people engaging with apps that redo a
| selfie to look like a famous character or put the person in a
| movie scene, for example.
|
| Every piece of content in any feed (good, bad, or otherwise)
| benefits the aggregator (Meta, YouTube, whatever), because
| someone will look at it. Not everything will go viral, but it
| doesn't matter. Scroll whatever on Twitter, YouTube Shorts,
| Reddit, etc. Meta has a massive presence in social media, so
| content being generated is shared there.
|
| The more content of any type leads to more engagement on the
| platforms where it's being shared. Every Meta feed serves the
| viewer an ad (for which Meta is paid) every 3 or so posts
| (pieces of content). It doesn't matter if the user doesn't
| like 1/5 posts or whatever, the number of ads still goes up.
| apwell23 wrote:
| > it doesn't have to grab _your_ attention
|
| I am talking about in general, not me personally. No
| popular content on any website/platform is AI generated.
| Maybe you have examples that lead you believe that its
| possible on a mass scale.
|
| > look like a famous character or put the person in a movie
| scene
|
| what attention grabbing movie used gen ai persons
| resters wrote:
| This is really good news. Zuck sees the inevitability of it and
| the dystopian regulatory landscape and decided to go all in.
|
| This also has the important effect of neutralizing the critique
| of US Government AI regulation because it will democratize
| "frontier" models and make enforcement nearly impossible. Thank
| you, Zuck, this is an important and historic move.
|
| It also opens up the market to a lot more entry in the area of
| "ancillary services to support the effective use of frontier
| models" (including safety-oriented concerns), which should really
| be the larger market segment.
| passion__desire wrote:
| Probably, Yann Lecun is the Lord Varys here. He has Mark's ear
| and Mark believes in Yann's vision.
| war321 wrote:
| Unfortunately, there are a number of AI safety people that are
| still crowing about how AI models need to be locked down, with
| some of them loudly pivoting to talking about how open source
| models aid China.
|
| Plus there's still the spectre of SB-1047 hanging around.
| amelius wrote:
| > One of my [Mark Zuckerberg, ed.] formative experiences has been
| building our services constrained by what Apple will let us build
| on their platforms. Between the way they tax developers, the
| arbitrary rules they apply, and all the product innovations they
| block from shipping, it's clear that Meta and many other
| companies would be freed up to build much better services for
| people if we could build the best versions of our products and
| competitors were not able to constrain what we could build.
|
| This is hard to disagree with.
| glhaynes wrote:
| I think it's very easy to disagree with!
|
| If Zuckerberg had his way, mobile device OSes would let Meta
| ingest microphone and GPS data 24/7 (just like much of the
| general public already _thinks_ they do because of the
| effectiveness of the other sorts of tracking they are able to
| do).
|
| There are certainly legit innovations that haven't shipped
| because gatekeepers don't allow them. But there've been lots of
| harmful "innovations" blocked, too.
| throwaway1194 wrote:
| I strongly suspect that what AI will end up doing is push
| companies and organizations towards open source, they will
| eventually realize that code is already being shared via AI
| channels, so why not do it legally with open source?
| talldayo wrote:
| > they will eventually realize that code is already being
| shared via AI channels
|
| Private repos are not being reproduced by any modern AI. Their
| source code is safe, although AI arguably lowers the bar to
| compete with them.
| whimsicalism wrote:
| OpenAI needs to release a new model setting a new capabilities
| highpoint. This is existential for them now.
| ChrisArchitect wrote:
| Related:
|
| _Llama 3.1 Official Launch_
|
| https://news.ycombinator.com/item?id=41046540
| m3kw9 wrote:
| The truth is we need both closed and open source, they both have
| their discovery path and advantages and disadvantages, there
| shouldn't be a system where one is eliminated over the other.
| They also seem to be driving each other forward via competition.
| typpo wrote:
| Thanks to Meta for their work on safety, particularly Llama
| Guard. Llama Guard 3 adds defamation, elections, and code
| interpreter abuse as detection categories.
|
| Having run many red teams recently as I build out promptfoo's red
| teaming featureset [0], I've noticed the Llama models punch above
| their weight in terms of accuracy when it comes to safety. People
| hate excessive guardrails and Llama seems to thread the needle.
|
| Very bullish on open source.
|
| [0] https://www.promptfoo.dev/docs/red-team/
| swyx wrote:
| is there a #2 to llamaguard? Meta seems curiously alone in
| doing this kind of, lets call it, "practical safety" work
| enriquto wrote:
| It's alarming that he refers to llama as if it was open source.
|
| The definition of free software (and open source, for that
| mater), is well-established. The same definition applies to all
| programs, whether they are "AI" or not. In any case, if a program
| was built by training against a dataset, the whole dataset is
| part of the source code.
|
| Llama is distributed in binary form, and it was built based on a
| secret dataset. Referring to it as "open source" is not
| ignorance, it's malice.
| Nesco wrote:
| The training data contains most likely insane amounts of
| copyrighted material. That's why virtually none of the "open
| models" come with their training data
| enriquto wrote:
| > The training data contains most likely insane amounts of
| copyrighted material.
|
| If that is the case then the weights must inherit all these
| copyrights. It has been shown (at least in image processing)
| that you can extract many training images from the weights,
| almost verbatim. Hiding the training data does not solve this
| issue.
|
| But regardless of copyright issues, people here are
| complaining about the malicious use of the term "open
| source", to signify a completely different thing (more like
| "open api").
| tempfile wrote:
| > If that is the case then the weights must inherit all
| these copyrights.
|
| Not if it's a fair use (which is obviously the defence
| they're hoping for)
| jdminhbg wrote:
| > In any case, if a program was built by training against a
| dataset, the whole dataset is part of the source code.
|
| I'm not sure why I keep seeing this. What is the equivalent of
| the training data for something like the Linux kernel?
| enriquto wrote:
| > What is the equivalent of the training data for something
| like the Linux kernel?
|
| It's the source code.
|
| For the linux kernel:
| compile(sourcecode) = binary
|
| For llama: train(data) = weights
| jdminhbg wrote:
| That analogy doesn't work. `train` is not a deterministic
| process. Meta has all of the training data and all of the
| supporting source code and they still won't get the same
| `weights` if they re-run the process.
|
| The weights are the result of the development process, like
| the source code of a program is the result of a development
| process.
| indus wrote:
| Is there an argument against Open Source AI?
|
| Not the usual nation-state rhetoric, but something that justifies
| that closed source leads to better user-experience and fewer
| security and privacy issues.
|
| An ecosystem that benefits vendors, customers, and the makers of
| close source?
|
| Are there historical analogies other than Microsoft Windows or
| Apple iPhone / iOS?
| kjkjadksj wrote:
| Lets take the iphone. Secured by the industries best security
| teams I am sure. Closed source, yet teenagers in eastern europe
| have cracked into it dozens of times making jailbreaks. Every
| law enforcement agency can crack into it. Closed source is not
| a security moat, but a trade protection moat.
| finolex1 wrote:
| Replace "Open Source AI" in "is there an argument against xxx"
| with bioweapons or nuclear missiles. We are obviously not at
| that stage yet, but it could be a real, non-trivial concern in
| the near future.
| GaggiX wrote:
| Llama 3.1 405B is on par with GPT-4o and Claude 3.5 Sonnet, the
| 70B model is better than GPT 3.5 turbo, incredible.
| itissid wrote:
| How are smaller models distilled from large models, I know of
| LoRA, quantization like technique; but does distilling also mean
| generating new datasets for conversing with smaller models
| entirely from the big models for many simpler tasks?
| tintor wrote:
| Smaller models can be trained to match log probs of the larger
| model. Larger model can be used to generate synthethic data for
| the smaller model.
| popcorncowboy wrote:
| > Developers can run inference on Llama 3.1 405B on their own
| infra at roughly 50% the cost of using closed models like GPT-4o
|
| Does anyone have details on exactly what this means or where/how
| this metric gets derived?
| rohansood15 wrote:
| I am guessing these are prices on services like AWS Bedrock
| (their post is down right now).
| PlattypusRex wrote:
| a big chunk of that is probably the fact that you don't need to
| pay someone who is trying to make a profit by running inference
| off-premises.
| wesleyyue wrote:
| Just added Llama 3.1 405B/70B/8B to https://double.bot (VSCode
| coding assistant) if anyone would like to try it.
|
| ---
|
| Some observations:
|
| * The model is much better at trajectory correcting and putting
| out a chain of tangential thoughts than other frontier models
| like Sonnet or GPT-4o. Usually, these models are limited to
| outputting "one thought", no matter how verbose that thought
| might be.
|
| * I remember in Dec of 2022 telling famous "tier 1" VCs that
| frontier models would eventually be like databases: extremely
| hard to build, but the best ones will eventually be open and win
| as it's too important to too many large players. I remember the
| confidence in their ridicule at the time but it seems
| increasingly more likely that this will be true.
| didip wrote:
| Is it really open source though? You can't run these models for
| your company. The license is extremely restrictive and there's NO
| SOURCE CODE.
| jamiedg wrote:
| Looks like it's easy to test out these models now on Together AI
| - https://api.together.ai
| KingOfCoders wrote:
| Open Source AI needs to include training data.
| fsndz wrote:
| Small language models is the path forward
| https://medium.com/thoughts-on-machine-learning/small-langua...
| pja wrote:
| "Commoditise your complement" in action!
| manishrana wrote:
| rally useful insights
| manishrana wrote:
| really useful insights
| bufferoverflow wrote:
| Hard disagree. So far every big important model is closed-source.
| Grok is sort-of the only exception, and it's not even that big
| compared to the (already old) GPT-4.
|
| I don't see open source being able to compete with the cutting-
| edge proprietary models. There's just not enough money. GPT-5
| will take an estimated $1.2 billion to train. MS and OpenAI are
| already talking about building a $100 billion training data
| center.
|
| How can you compete with that if your plan is to give away the
| training result for free?
| sohamgovande wrote:
| Where is the $1.2b number from?
| bufferoverflow wrote:
| There are a few numbers floating around, $1.2B being the
| lowest estimate.
|
| HSBC estimates the training cost for GPT-5 between $1.7B and
| $2.5B.
|
| Vlad Bastion Research estimates $1.25B - 2.25B.
|
| Some people on HN estimate $10B:
|
| https://news.ycombinator.com/item?id=39860293
| smusamashah wrote:
| Meta's article with more details on the new LLAMA 3.1
| https://ai.meta.com/blog/meta-llama-3-1/
| 6gvONxR4sf7o wrote:
| > Third, a key difference between Meta and closed model providers
| is that selling access to AI models isn't our business model.
| That means openly releasing Llama doesn't undercut our revenue,
| sustainability, or ability to invest in research like it does for
| closed providers. (This is one reason several closed providers
| consistently lobby governments against open source.)
|
| The whole thing is interesting, but this part strikes me as
| potentially anticompetitive reasoning. I wonder what the lines
| are that they have to avoid crossing here?
| phkahler wrote:
| >> ...but this part strikes me as potentially anticompetitive
| reasoning.
|
| "Commoditize your complements" is an accepted strategy. And
| while pricing below cost to harm competitors is often illegal,
| the reality is that the marginal cost of software is zero.
| Palomides wrote:
| spending a very quantifiable large amount of money to release
| something your nominal competitors charge for without having
| your own direct business case for it seems a little much
| phkahler wrote:
| Companies spend very large amounts of money on all sorts of
| things that never even get released. Nothing wrong with
| releasing something for free that no longer costs you
| anything. Who knows why they developed it in the first
| place, it makes no difference.
| frabjoused wrote:
| Who knew FB would hold OpenAI's original ideals, and OpenAI now
| holds early FB ideals/integrity.
| boringg wrote:
| FB needed to differentiate drastically. FB is at its best
| creating large data infra.
| jmward01 wrote:
| I never thought I would say this but thanks Meta.
|
| *I reserve the right to remove this praise if they abuse this
| open source model position in the future.
| gooob wrote:
| why do they keep training on publicly available online data, god
| dammit? what the fuck. don't they want to make a good LLM? train
| on the classics, on the essentials reference manuals for
| different technologies, on history books, medical encyclopedias,
| journal notes from the top surgeons and engineers, scientific
| papers of the experiments that back up our fundamental theories.
| we want quality information, not recent information. we already
| have plenty of recent information.
| mmmore wrote:
| I appreciate that Mark Zuckerberg soberly and neutrally talked
| about some of the risks from advances in AI technology. I agree
| with others in this thread that this is more accurately called
| "public weights" instead of open source, and in that vein I
| noticed some issues in the article.
|
| > This is one reason several closed providers consistently lobby
| governments against open source.
|
| Is this substantially true? I've noticed a tendency of those who
| support the general arguments in this post to conflate the
| beliefs of people concerned about AI existential risk, some of
| whom work at the leading AI labs, with the position of the labs
| themselves. In most cases I've seen, the AI labs (especially
| OpenAI) have lobbied against any additional regulation on AI,
| including with SB1047[1] and the EU AI Act[2]. Can anyone provide
| an example of this in the context of actual legislation?
|
| > On this front, open source should be significantly safer since
| the systems are more transparent and can be widely scrutinized.
| Historically, open source software has been more secure for this
| reason.
|
| This may be true if we could actually understand what was
| happening in neural networks, or train them to consistently avoid
| unwanted behaviors. As things are, the public weights are simply
| inscrutable black boxes, and the existence of jailbreaks and
| other strange LLM behaviors show that we don't understand how our
| training processes create models' emergent behaviors. The
| capabilities of these models and their influence are growing
| faster than our understand of them, and our ability to steer them
| to behave precisely how we want, and that will only get harder as
| the models get more powerful.
|
| > At this point, the balance of power will be critical to AI
| safety. I think it will be better to live in a world where AI is
| widely deployed so that larger actors can check the power of
| smaller bad actors.
|
| This paragraph ignores the concept of offense/defense balance.
| It's much easier to cause a pandemic than to stop one, and
| cyberattacks, while not as bad as pandemics, seem to also favor
| the attacker (this one is contingent on how much AI tools can
| improve our ability to write secure code). At the extreme, it
| would clearly be bad if everyone had access to a anti-matter
| weapon large enough to destroy the Earth; at some level of
| capability, we have to limit the commands an advanced AI will
| follow from an arbitrary person.
|
| That said, I'm unsure if limiting public weights at this time
| would be good regulation. They do seem to have some benefits in
| increasing research around alignment/interpretability, and I
| don't know if I buy the argument that public weights are
| significantly more dangerous from a "misaligned ASI" perspective
| than many competing closed companies. I also don't buy the view
| of some in the leading labs that we'll likely have "human level"
| systems by the end of the decade; it seems possible but unlikely.
| But I worry that Zuckerberg's vision of the future does not
| adequately guard against downside risks, and is not compatible
| with the way the technology will actually develop.
|
| [1] https://thebulletin.org/2024/06/california-ai-bill-
| becomes-a...
|
| [2] https://time.com/6288245/openai-eu-lobbying-ai-act/
| btbuildem wrote:
| The "open source" part sounds nice, though we all know there's
| nothing particularly open about the models (or their weights).
| The barriers to entry remain the same - huge upfront investments
| to train your own, and steep ongoing costs for "inference".
|
| Is the vision here to treat LLM-based AI as a "public good", akin
| to a utility provider in a civilized country (taxpayer funded,
| govt maintained, non-for-profit)?
|
| I think we could arguably call this "open source" when all the
| infra blueprints, scripts and configs are freely available for
| anyone to try and duplicate the state-of-the-art (resource and
| grokking requirements nonwithstanding)
| brrrrrm wrote:
| check out the paper. it's pretty comprehensive
| https://ai.meta.com/research/publications/the-llama-3-herd-o...
| openrisk wrote:
| Open source "AI" is a proxy for democratising and making (much)
| more widely useful the goodies of high performance computing
| (HPC).
|
| The HPC domain (data and compute intensive applications that
| typically need vector, parallel or other such architectures) have
| been around for the longest time, but confined to academic /
| government tasks.
|
| LLM's with their famous "matrix multiply" at their very core are
| basically demolishing an ossified frontier where a few commercial
| entities (Intel, Microsoft, Apple, Google, Samsung etc) have
| defined for decades what computing looks like _for most people_.
|
| Assuming that the genie is out of the bottle, the question is:
| what is the shape of end-user devices that are optimally designed
| to use compute intensive open source algorithms? The "AI PC" is
| already a marketing gimmick, but could it be that Linux desktops
| and smartphones will suddenly be "AI natives"?
|
| For sure its a transformational period and the landscape T+10 yrs
| could be drastically different...
| LarsDu88 wrote:
| Obligatory reminder of why tech companies subsidize open source
| projects: https://www.joelonsoftware.com/2002/06/12/strategy-
| letter-v/
| avivo wrote:
| The FTC also recently put out a statement that is fairly pro-open
| source: https://www.ftc.gov/policy/advocacy-research/tech-at-
| ftc/202...
|
| I think it's interesting to think about this question of open
| source, benefits, risk, and even competition, without all of the
| baggage that Meta brings.
|
| I agree with the FTC, that the benefits of open-weight models are
| significant for competition. _The challenge is in distinguishing
| between good competition and bad competition._
|
| Some kind of competition can harm consumers and critical public
| goods, including democracy itself. For example, competing for
| people's scarce attention or for their food buying, with
| increasingly optimized and addictive innovations. Or competition
| to build the most powerful biological weapons.
|
| Other kinds of competition can massively accelerate valuable
| innovation. The FTC must navigate a tricky balance here --
| leaning into competition that serves consumers and the broader
| public, while being careful about what kind of competition it is
| accelerating that could cause significant risk and harm.
|
| It's also obviously not just "big tech" that cares about the
| risks behind open-weight foundation models. Many people have
| written about these risks even before it became a subject of
| major tech investment. (In other words, A16Z's framing is often
| rather misleading.) There are many non-big tech actors who are
| very concerned about current and potential negative impacts of
| open-weight foundation models.
|
| One approach which can provide the best of both worlds, is for
| cases where there are significant potential risks, to ensure that
| there is at least some period of time where weights are not
| provided openly, in order to learn a bit about the potential
| implications of new models.
|
| Longer-term, there may be a line where models are too risky to
| share openly, and it may be unclear what that line is. In that
| case, it's important that we have governance systems for such
| decisions that are not just profit-driven, and which can help us
| continue to get the best of all worlds. (Plug: my organization,
| the AI & Democracy Foundation; https://ai-dem.org/; is working to
| develop such systems and hiring.)
| whimsicalism wrote:
| making food that people want to buy is good actually
|
| i am not down with this concept of the chattering class
| deciding what are good markets and what are bad, unless it is
| due to broad-based and obvious moral judgements.
| tpurves wrote:
| 405 sounds like a lot of B's! What do you need to practically run
| or host that yourself?
| tpurves wrote:
| 405 is a lot of B's. What does it take to run or host that?
| danielmarkbruce wrote:
| quantize to 0 bit. Run on a potato.
|
| Jokes aside ~ 405b x 2 bytes of memory (FP16), so say 810 gigs,
| maybe 1000 gigs or so required in reality, need maybe 2 aws p5
| instances?
| dang wrote:
| Related ongoing thread:
|
| _Llama 3.1_ - https://news.ycombinator.com/item?id=41046540 -
| July 2024 (114 comments)
| littlestymaar wrote:
| I love how Zuck decided to play a new game called "commoditize
| some other billionaire's business to piss him", I can't wait
| until this becomes a trend and we get plenty of open source cool
| stuff.
|
| If he really wants to replicate Linux's success against
| proprietary Unices, he needs to release Llama with some kind of
| GPL equivalent, that forces everyone to play the open source
| game.
| Dwedit wrote:
| Without the raw data that trained the model, how is it open
| source?
| suyash wrote:
| Open source is a welcome step but what we really need is complete
| decentralisation so people can run their own private AI Models
| that keep all the data private to them. We need this to happen
| locally on laptops, mobile phones, smart devices etc. Waiting for
| when that will become ubiquitous.
| zoogeny wrote:
| Totally tangential thought, probably doomed to be lost in the
| flood of comments on this very interesting announcement.
|
| I was thinking today about Musk, Zuckerberg and Altman. Each
| claims that the next version of their big LLMs will be the best.
|
| For some reason it reminded me of one apocryphal cause of WW1,
| which was that the kings of Europe were locked in a kind of ego
| driven contest. It made me think about the Nation State as a
| technology. In some sense, the kings were employing the new
| technology which was clearly going to be the basis for the future
| political order. And they were pitting their own implementation
| of this new technology against the other kings.
|
| I feel we are seeing a similar clash of kings playing out. The
| claims that this is all just business or some larger claim about
| the good of humanity seem secondary to the ego stakes of the
| major players. And when it was about who built the biggest
| rocket, it felt less dangerous.
|
| It breaks my heart just a little bit. I feel sympathy in some
| sense for the AIs we will create, especially if they do reach the
| level of AGI. As another tortured analogy, it is like a bunch of
| competitive parents forcing their children into adversarial
| relationships to satisfy the parent's ego.
| light_triad wrote:
| They are positioning themselves as champions of AI open source
| mostly because they were blindsided by OpenAI, are not in the
| infra game, and want to commoditize their complements as much as
| possible.
|
| This is not altruism although it's still great for devs and
| startups. All FB GPU investments is primarily for new AI products
| "friends", recommendations and selling ads.
|
| https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/
| baby wrote:
| Meta does a good thing
|
| HN spends a day figuring out how it's actually bad
| shnock wrote:
| It's not actually bad, OP's point is that it is not motivated
| by altruism. An action can be beneficial to the people
| without that effect being the incentive
| war321 wrote:
| They've been working on AI for a good bit now. Open source
| especially is something they've championed since the mid 2010s
| at least with things like PyTorch, GraphQL, and React. It's not
| something they've suddenly pivoted to since ChatGPT came in
| 2022.
| kertoip_1 wrote:
| They are giving it "for free" because:
|
| * they need LLMs that they can control for features on their
| platforms (Fb/Instagram, but I can see many use cases on VR
| too)
|
| * they cannot sell it. They have no cloud services to offer.
|
| So they would spend this money anyways, but to compensate some
| losses they just decided to use it to fix their PR by
| contenting developers
| sterlind wrote:
| They also reap the benefits of AI researchers across the
| world using Llama as a base. All their research is
| immediately applicable to their models. It's also likely a
| strategic decision to reduce the moat OpenAI is building
| around itself.
|
| I also think LeCunn opposes OpenAI's gatekeeping at a
| philosophical/political level. He's using his position to
| strengthen open-source AI. Sure, there's strategic business
| considerations, but I wouldn't rule out principled
| motivations too.
| anthomtb wrote:
| > My framework for understanding safety is that we need to
| protect against two categories of harm: unintentional and
| intentional. Unintentional harm is when an AI system may cause
| harm even when it was not the intent of those running it to do
| so. For example, modern AI models may inadvertently give bad
| health advice. Or, in more futuristic scenarios, some worry that
| models may unintentionally self-replicate or hyper-optimize goals
| to the detriment of humanity. Intentional harm is when a bad
| actor uses an AI model with the goal of causing harm.
|
| Okay then Mark. Replace "modern AI models" with "social media"
| and repeat this statement with a straight face.
| j_m_b wrote:
| > We need to protect our data.
|
| This is a very important concern in Health Care because of HIPAA
| compliance. You can't just send your data over the wire to
| someone's proprietary API. You would at least need to de-identify
| your data. This can be a tricky task, especially with
| unstructured text.
| xpe wrote:
| Zuck needs to get real. They are Open Weights not Open Source.
| Sparkyte wrote:
| The real path forward is recognizing what AI is good at and what
| it is bad at. Focus on making what it is good at even better and
| faster. Open AI will definitely give us that option but it isn't
| a miracle worker.
|
| My impression is that AI if done correctly will be the new way to
| build APIs with large data sets and information. It can't write
| code unless you want to dump billions of dollars into a solution
| with millions of dollars of operational costs. As it stands it
| loses context too quickly to do advance human tasks. BUT this is
| where it is great at assembling data and information. You know
| what is great at assembling data and information? APIs.
|
| Think of it this way if we can make it faster and it trains on a
| datalake for a company it could be used to return information
| faster than a nested micro-service architecture that is just a
| spiderweb of dependencies.
|
| Because AI loses context simple API requests could actually be
| more efficient.
| Bluescreenbuddy wrote:
| >This is how we've managed security on our social networks - our
| more robust AI systems identify and stop threats from less
| sophisticated actors who often use smaller scale AI systems.
|
| So about all the bots and sock puppets on social media..
| pjkundert wrote:
| Deployment of PKI-signed distributed software systems to use
| community-provisioned compute, bandwidth and storage at scale is,
| now quite literally, the future.
|
| We mostly don't all want or need the hardware to run these AIs
| ourselves, all the time. But, when we do, we need lots of it for
| a little while.
|
| This is what Holochain was born to do. We can rent massive
| capacity when we need it, or earn money renting ours when we
| don't.
|
| All running cryptographically trusted software at Internet scale,
| without the knowledge or authorization of commercial or
| government "do-gooders".
|
| Exciting times!
| ayakang31415 wrote:
| Massive props to AI teams at Meta that released this model open
| source
| ceva wrote:
| They have earned so much money on all of their users, this is
| least they can do to give back to the community, if this can be
| considered that ;)
| animanoir wrote:
| "Says the Meta Inc".
| seydor wrote:
| That assumes LLMs are the path to AI, which is increasingly
| becoming an unpopular opinion
| tmsh wrote:
| Software 2.0 is about open licensing.
|
| I.e., the more important thing - the more "free" thing - is the
| licensing now.
|
| E.g., I play around with different image diffusion models like
| Stable Diffusion and specific fine-tuned variations for
| ControlNet or LoRA that I plug into ComfyUI.
|
| But I can't use it at work because of the licensing. I have to
| use InvokeAI instead of ComfyUI if I want to be careful and only
| very specific image diffusion models without the latest and
| greatest fine-tuning. As others have said - the weights
| themselves are rather inscrutable. So we're building on more
| abstract shapes now.
|
| But the key open thing is making sure (1) the tools to modify the
| weights are open and permissive (ComfyUI, related scripts or
| parts of both the training and deployment) and (2) the underlying
| weights of the base models and the tools to recreate them have
| MIT or other generous licensing. As well as the fine-tuned
| variants for specific tasks.
|
| It's not going to be the naive construction in the future where
| you take a base model and as company A you produce company A's
| fine tuned model and you're done.
|
| It's going to be a tree of fine-tuned models as a node-based
| editor like ComfyUI already shows and that whole tree has to be
| open if we're to keep the same hacker spirit where anyone can
| tinker with it and also at some point make money off of it. Or go
| free software the whole way (i.e., LGPL or equivalent the whole
| tree of tools).
|
| In that sense unfortunately Llama has a ways to go to be truly
| open: https://news.ycombinator.com/item?id=36816395
| jameson wrote:
| It's hard to say Llama is an "open source" when their license
| states Meta has full control under certain circumstances
|
| https://raw.githubusercontent.com/meta-llama/llama-models/ma...
|
| > 2. Additional Commercial Terms. If, on the Llama 3.1 version
| release date, the monthly active users of the products or
| services made available by or for Licensee, or Licensee's
| affiliates, is greater than 700 million monthly active users in
| the preceding calendar month, you must request a license from
| Meta, which Meta may grant to you in its sole discretion, and you
| are not authorized to exercise any of the rights under this
| Agreement unless or until Meta otherwise expressly grants you
| such rights.
| __loam wrote:
| It should be transparently clear that this move was taken by
| Meta to drive their competitors out of business in a capital
| intensive space.
| apwell23 wrote:
| not sure how it drives competitors out of business. OpenAI is
| losing money on queries not on model creation. This
| opensource model has no impact of their business model of
| charging users money to run queries.
|
| on a side note OpenAI is losing users on its own. It doesn't
| need meta to put it out of business.
| systemvoltage wrote:
| Tbh, it's incredibly generous.
| nailer wrote:
| Llama isn't open source. The license is at
| https://llama.meta.com/llama3/license/ and includes various
| restrictions on use, which means it falls outside the rules
| created by the https://opensource.org/osd
| war321 wrote:
| Even if it's just open weights and not "true" open source, I'll
| still give Meta the appreciation of being one of the few big AI
| companies actually committed to open models. In an ecosystem
| where groups like Anthropic and OpenAI keep hemming and hawing
| about safety and the necessity of closed AI systems "for our
| sake", they stand out among the rest.
| rednafi wrote:
| How's only sharing the binary artifact is open source? There's
| the data aspect of things that they can't share because of
| licensing and the code itself isn't accessible.
___________________________________________________________________
(page generated 2024-07-23 23:00 UTC)