[HN Gopher] Open source AI is the path forward
___________________________________________________________________
Open source AI is the path forward
Author : atgctg
Score : 2262 points
Date : 2024-07-23 15:08 UTC (1 days ago)
(HTM) web link (about.fb.com)
(TXT) w3m dump (about.fb.com)
| amusingimpala75 wrote:
| Sure but under what license? Because slapping "open source" on
| the model doesn't make it open source if it's not actually
| license that way. The 3.1 license still contains their non-
| commercial clause (over 700m users) and requires derivatives,
| whether fine tunings or trained on generated data, to use the
| llama name.
| redleader55 wrote:
| "Use it for whatever you want(conditions apply), but not if you
| are Google, Amazon, etc. If you become big enough talk to us."
| That's how I read the license, but obviously I might be missing
| some nuance.
| mesebrec wrote:
| You also can't use it for training or improving other models.
|
| You also can't use it if you're the government of India.
|
| Neither can sex workers use it. (Do you know if your
| customers are sex workers?)
|
| There are also very vague restrictions for things like
| discrimination, racism etc.
| war321 wrote:
| They're actually updating their license to allow LLAMA
| outputs for training!
|
| https://x.com/AIatMeta/status/1815766335219249513
| sumedh wrote:
| > You also can't use it if you're the government of India.
|
| Why is that?
| frabcus wrote:
| Also it isn't source code, it is a binary. You need at least
| the data curation code and preferably the data itself for it to
| be actually source code in the practical sense that anyone can
| remake the build.
|
| Llama could change the license on later versions to kill your
| business and you have no options as you don't know how they
| trained it or have the budget to.
|
| It's not much more free than binary software.
| aliljet wrote:
| And this is happening RIGHT as a new potential leader is emerging
| in Llama 3.1. I'm really curious about how this is going to match
| up on the leaderboards...
| kart23 wrote:
| > This is how we've managed security on our social networks - our
| more robust AI systems identify and stop threats from less
| sophisticated actors who often use smaller scale AI systems.
|
| Ok, first of all, has this really worked? AI moderators still
| can't capture the mass of obvious spam/bots on all their
| platforms, threads included. Second, AI detection doesn't work,
| and with how much better the systems are getting, it's probably
| never going to, unless you keep the best models for yourself, and
| it's is clear from the rest of the note that its not zuck's
| intention to do so.
|
| > As long as everyone has access to similar generations of models
| - which open source promotes - then governments and institutions
| with more compute resources will be able to check bad actors with
| less compute.
|
| This just doesn't make sense. How are you going to prevent AI
| spam, AI deepfakes from causing harm with more compute? What are
| you gonna do with more compute about nonconsensual deepfakes?
| People are already using AI to bypass identity verification on
| your social media networks, and pump out loads of spam.
| OpenComment wrote:
| Interesting quotes. _Less sophisticated actors_ just means
| humans who already write in 2020 what the NYT wrote in early
| 2022 to prepare for Biden 's State Of The Union 180deg policy
| reversals (manufacturing consent).
|
| FB was notorious for censorship. Anyway, what is with the
| "actions/actors" terminology? This is straightforward
| totalitarian language.
| simonw wrote:
| "AI detection doesn't work, and with how much better the
| systems are getting, it's probably never going to, unless you
| keep the best models for yourself"
|
| I don't think that's true. I don't think even the best
| privately held models will be able to detect AI text reliably
| enough for that to be worthwhile.
| zmmmmm wrote:
| I found this dubious as well, especially how it is portrayed as
| a simple game of compute power. For a start, there is an
| enormous asymmetry which is why we have a spam problem in the
| first place. For example a single bot can send out millions of
| emails at almost no cost and we have to expend a lot more
| "energy" to classify each one and decide if it's spam or not.
| So you don't just need more compute power you need drastically
| more compute power, and as AI models improve and get refined,
| the operation at ten times the scale is probably going to be
| marginally better, not orders of magnitude better.
|
| I still agree with his general take - bad actors will get these
| models or make them themselves, you can't stop it. But the
| logic about compute power is odd.
| blackeyeblitzar wrote:
| Only if it is truly open source (open data sets, transparent
| curation/moderation/censorship of data sets, open training source
| code, open evaluation suites, and an OSI approved open source
| license).
|
| Open weights (and open inference code) is NOT open source, but
| just some weak open washing marketing.
|
| The model that comes closest to being TRULY open is AI2's OLMo.
| See their blog post on their approach:
|
| https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e73...
|
| I think the only thing they're not open about is how they've
| curated/censored their "Dolma" training data set, as I don't
| think they explicitly share each decision made or the original
| uncensored dataset:
|
| https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-co...
|
| By the way, OSI is working on defining open source for AI. They
| post weekly updates to their blog. Example:
|
| https://opensource.org/blog/open-source-ai-definition-weekly...
| JumpCrisscross wrote:
| > _Only if it is truly open source (open data sets, transparent
| curation /moderation/censorship of data sets, open training
| source code, open evaluation suites, and an OSI approved open
| source license)_
|
| You're missing a then to your if. What happens if it's "truly"
| open per your definition versus not?
| blackeyeblitzar wrote:
| I think you are asking what the benefits are? The main
| benefit is that we can trust what these systems are doing
| better. Or we can self host them. If we just take the
| weights, then it is unclear how these systems might be lying
| to us or manipulating us.
|
| Another benefit is that we can learn from how the training
| and other steps actually work. We can change them to suit our
| needs (although costs are impractical today). Etc. It's all
| the usual open source benefits.
| haolez wrote:
| There is also the risk of companies like Meta introducing ads
| in the training itself, instead of inference time.
| itissid wrote:
| Yeah, though I do wonder for a big model like 405B if the
| original training recipe, really matters for where models are
| heading, practically speaking which is smaller and more
| specific?
|
| I imagine its main use would be to train other models by
| distilling them down with LoRA/Quantization etc(assuming we
| have a tokenizer). Or use them to generate training data for
| smaller models directly.
|
| But, I do think there is always a way to share without
| disclosing too many specifics, like this[1] lecture from this
| year's spring course at Stanford. You can always say, for
| example:
|
| - The most common technique for filtering was using voting LLMs
| ( _without disclosing said llms or quantity of data_ ).
|
| - We built on top of a filtering technique for removing poor
| code using ____ by ____ authors ( _without disclosing or
| handwaving how you exactly filtered, but saying that you had to
| filter_ ).
|
| - We mixed certain proportion of this data with that data to
| make it better ( _without saying what proportion_ )
|
| [1]
| https://www.youtube.com/watch?v=jm2hyJLFfN8&list=PLoROMvodv4...
| JumpCrisscross wrote:
| "The Heavy Press Program was a Cold War-era program of the United
| States Air Force to build the largest forging presses and
| extrusion presses in the world." This "program began in 1944 and
| concluded in 1957 after construction of four forging presses and
| six extruders, at an overall cost of $279 million. Six of them
| are still in operation today, manufacturing structural parts for
| military and commercial aircraft" [1].
|
| $279mm in 1957 dollars is about $3.2bn today [2]. A public
| cluster of GPUs provided for free to American universities,
| companies and non-profits might not be a bad idea.
|
| [1] https://en.m.wikipedia.org/wiki/Heavy_Press_Program
|
| [2] https://data.bls.gov/cgi-
| bin/cpicalc.pl?cost1=279&year1=1957...
| CardenB wrote:
| Doubtful that GPUs purchased today would be in use for a
| similar time scale. Govt investment would also drive the cost
| of GPUs up a great deal.
|
| Not sure why a publicly accessible GPU cluster would be a
| better solution than the current system of research grants.
| JumpCrisscross wrote:
| > _Doubtful that GPUs purchased today would be in use for a
| similar time scale_
|
| Totally agree. That doesn't mean it can't generate massive
| ROI.
|
| > _Govt investment would also drive the cost of GPUs up a
| great deal_
|
| Difficult to say this _ex ante_. On its own, yes. But it
| would displace some demand. And it could help boost chip
| production in the long run.
|
| > _Not sure why a publicly accessible GPU cluster would be a
| better solution than the current system of research grants_
|
| Those receiving the grants have to pay a private owner of the
| GPUs. That gatekeeping might be both problematic, if there is
| a conflict of interests, and inefficient. (Consider why the
| government runs its own supercomputers versus contracting
| everything to Oracle and IBM.)
| rvnx wrote:
| It would be better that the government removes IP on such
| technology for public use, like drugs got generics.
|
| This way the government pays 2'500 USD per card, not 40'000
| USD or whatever absurd.
| JumpCrisscross wrote:
| > _better that the government removes IP on such
| technology for public use, like drugs got generics_
|
| You want to punish NVIDIA for calling its shots
| correctly? You don't see the many ways that backfires?
| gpm wrote:
| No. But I do want to limit the amount we reward NVIDIA
| for calling the shots correctly to maximize the benefit
| to society. For instance by reducing the duration of the
| government granted monopolies on chip technology that is
| obsolete well before the default duration of 20 years is
| over.
|
| That said, it strikes me that the actual limiting factor
| is fab capacity not nvidia's designs and we probably need
| to lift the monopolies preventing competition there if we
| want to reduce prices.
| JumpCrisscross wrote:
| > _reducing the duration of the government granted
| monopolies on chip technology that is obsolete well
| before the default duration of 20 years is over_
|
| Why do you think these private entities are willing to
| invest the massive capital it takes to keep the frontier
| advancing at that rate?
|
| > _I do want to limit the amount we reward NVIDIA for
| calling the shots correctly to maximize the benefit to
| society_
|
| Why wouldn't NVIDIA be a solid steward of that capital
| given their track record?
| gpm wrote:
| > Why do you think these private entities are willing to
| invest the massive capital it takes to keep the frontier
| advancing at that rate?
|
| Because whether they make 100x or 200x they make a
| shitload of money.
|
| > Why wouldn't NVIDIA be a solid steward of that capital
| given their track record?
|
| The problem isn't who is the steward of the capital. The
| problem is that economically efficient thing to do for a
| single company is (given sufficient fab capacity, and a
| monopoly) to raise prices to extract a greater share of
| the pie at the expense of shrinking the size of the pie.
| I'm not worried about who takes the profit, I'm worried
| about the size of the pie.
| whimsicalism wrote:
| > Because whether they make 100x or 200x they make a
| shitload of money.
|
| It's not a certainty that they 'make a shitload of
| money'. Reducing the right tail payoffs absolutely
| reduces the capital allocated to solve problems - many of
| which are _risky bets_.
|
| Your solution absolutely decreases capital investment at
| the margin, this is indisputable and basic economics.
| Even worse when the taking is not due to some pre-
| existing law, so companies have to deal with the
| additional uncertainty of whether & when future people
| will decide in retrospect that they got too large a
| payoff and arbitrarily decide to take it from them.
| gpm wrote:
| You can't just look at the costs to an action, you also
| have to look at the benefits.
|
| Of course I agree I'm going to stop marginal investments
| from occurring into research into patent-able
| technologies by reducing the expect profit. But I'm going
| to do so _very slightly_ because I 'm not shifting the
| expected value by very much. Meanwhile I'm going to
| greatly increase the investment into the existing
| technology we already have, and allow many more people to
| try to improve upon it, and I'm going to argue the
| benefits greatly outweigh the costs.
|
| Whether I'm right or wrong about the net benefit, the
| basic economics here is that there are both costs and
| benefits to my proposed action.
|
| And yes I'm going to marginally reduce future investments
| because the same might happen in the future and that
| reduces expected value. In fact if I was in charge the
| same _would_ happen in the future. And the trade-off I
| get for this is that society gets the benefit of the same
| _actually_ happening in the future and us not being
| hamstrung by unbreachable monopolies.
| JumpCrisscross wrote:
| > _I 'm going to do so very slightly because I'm not
| shifting the expected value by very much_
|
| You're massively increasing uncertainty.
|
| > _the same would happen in the future. And the trade-off
| I get for this is that society gets the benefit_
|
| Why would you expect it would ever happen again? What you
| want is an unrealized capital gains tax. Not to nuke our
| semiconductor industry.
| whimsicalism wrote:
| > But I'm going to do so very slightly because I'm not
| shifting the expected value by very much
|
| I think you're shifting it by a lot. If the government
| can post-hoc decide to invalidate patents because the
| holder is getting too successful, you are introducing a
| substantial impact on expectations and uncertainty. Your
| action is not taken in a vacuum.
|
| > Meanwhile I'm going to greatly increase the investment
| into the existing technology we already have, and allow
| many more people to try to improve upon it, and I'm going
| to argue the benefits greatly outweigh the costs.
|
| I think this is a much more speculative impact. Why will
| people even fund the improvements if the government might
| just decide they've gotten too large a slice of the pie
| later on down the road?
|
| > the trade-off I get for this is that society gets the
| benefit of the same actually happening in the future and
| us not being hamstrung by unbreachable monopolies.
|
| No the trade-off is that materially less is produced.
| These incentive effects are not small. Take for instance,
| drug price controls - a similar post-facto taking because
| we feel that the profits from R&D are too high.
| Introducing proposed price controls leads to hundreds of
| fewer drugs over the next decade [0] - and likely
| millions of premature deaths downstream of these
| incentive effects. And that's with a policy with a clear
| path towards short-term upside (cheaper drug prices).
| Discounted GPUs by invalidating nvidia's patents has a
| much more tenuous upside and clear downside.
|
| [0]: https://bpb-
| us-w2.wpmucdn.com/voices.uchicago.edu/dist/d/312...
| hluska wrote:
| You have proposed state ownership of all successful IP.
| That is a massive change and yet you have demonstrated
| zero understanding of the possible costs.
|
| Your claim that removing a profit motivation will
| increase investment is flat out wrong. Everything else
| crumbles from there.
| gpm wrote:
| No, I've proposed removing or reducing IP protections,
| not transferring them to the state. Allowing competitors
| to enter the market will obviously increase investment in
| competitors...
| IG_Semmelweiss wrote:
| This is already happening - its called China. There's a
| reason they don't innovate in anything, and they are
| always playing catch-up, except in the art of copying
| (stealing) from others.
|
| I do think there are some serious IP issues, as IP rules
| can be hijacked in the US, but that means you fix those
| problems, not blow up IP that was rightfully earned
| psd1 wrote:
| > they don't innovate in anything
|
| They are leaders in solar and EVs.
|
| Remember how Japan leapfrogged the western car industry,
| and six sigma became required reading for managers in
| every industry?
| hluska wrote:
| Removing IP restrictions transfers them to the state.
| Grow up.
| salawat wrote:
| >Why wouldn't NVIDIA be a solid steward of that capital
| given their track record?
|
| Past performance is not indicative of future results.
| whimsicalism wrote:
| there is no such thing as a lump-sum transfer, this will
| shift expectations and incentives going forward and make
| future large capital projects an increasingly uphill
| battle
| hluska wrote:
| So, if a private company is successful, you will
| nationalize its IP under some guise of maximizing the
| benefit to society? That form of government was tried
| once. It failed miserably.
|
| Under your idea, we'll try a badly broken economic
| philosophy again. And while we're at it, we will
| completely stifle investment in innovation.
| tick_tock_tick wrote:
| > That said, it strikes me that the actual limiting
| factor is fab capacity not nvidia's designs and we
| probably need to lift the monopolies preventing
| competition there if we want to reduce prices.
|
| Lol it's not "monopolies" limiting fab capacity. Existing
| fab companies can barely manage to stand-up a new fab in
| different cities. Fabs are impossibly complex and beyond
| risky to fund.
|
| It's the kind of thing you'd put government money to
| making but it's so risky government really don't want to
| spend billions and fail so they give existing companies
| billions so if they fail it's not the governments fault.
| Teever wrote:
| There was a post[0] on here recently about how the US
| went from producing woefully insufficient numbers of
| aircraft to producing 300k by the end of world war 2.
|
| One of the things that the post mentioned was the meager
| profit margin that the companies made during this time.
|
| But the thing is that this set the America auto and
| aviation industry up to rule the world for decades.
|
| A government going to a company and saying 'we need you
| to produce this product for us at a lower margin thab
| you'd like to' isn't the end of the world.
|
| I don't know if this is one of those scenarios but they
| exist.
|
| [0] https://www.construction-physics.com/p/how-to-
| build-300000-a...
| rvnx wrote:
| In the case of NVIDIA it's even more sneaky.
|
| They are an intellectual property company holding the
| rights on plans to make graphic cards, not even a company
| actually making graphic cards.
|
| The government could launch an initiative "OpenGPU" or
| "OpenAI Accelerator", where the government orders GPUs
| from TSMC directly, without the middleman.
|
| It may require some tweaking in the law to allow
| exception to intellectual property for "public interest".
| whimsicalism wrote:
| y'all really don't understand how these actions would
| seriously harm capital markets and make it difficult for
| private capital formation to produce innovations going
| forward.
| freeone3000 wrote:
| If we have public capital formation, we don't necessarily
| need private capital. Private innovation in weather
| modelling isn't outpacing government work by leaps and
| bounds, for instance.
| whimsicalism wrote:
| because it is extremely challenging to capture the
| additional value that is being produced by better weather
| forecasts and generally the forecasts we have right now
| are pretty good.
|
| private capital is absolutely the driving force for the
| vast majority of innovations since the beginning of the
| 20th century. public capital may be involved, but it is
| dwarfed by private capital markets.
| freeone3000 wrote:
| It's challenging to capture the additional value and the
| forecasts are pretty good _because_ of _continual_ large-
| scale government investment into weather forecasting.
| NOAA is launching satellites! it's a big deal!
|
| Private nuclear research is heavily dependent on
| governmental contracts to function. Solar was subsidized
| to heck and back for years. Public investment does work,
| and does make a didference.
|
| I would even say governmental involvement is sometimes
| even the deciding factor, to determine if research is
| worth pursuing. Some major capital investors have decided
| AI models cannot possibly gain enough money to pay for
| their training costs. So what do we do when we believe
| something is a net good for society, but isn't going to
| be profitable?
| inetknght wrote:
| > _y 'all really don't understand how these actions would
| seriously harm capital markets and make it difficult for
| private capital_
|
| Reflexively, I count that harm as a feature. I don't like
| private capital markets because I've been screwed by
| private capital on multiple occasions.
|
| But you are right: I don't understand how these actions
| would harm. So please do expand your concerns.
| panarky wrote:
| To the extent these are incremental units that wouldn't
| have been sold absent the government program, it's
| difficult to see how NVIDIA is "harmed".
| nickpsecurity wrote:
| They said remove legally-enforced monopolies on what they
| produce. Many of these big firms made their tech with
| millions to billions of taxpayer dollars at various
| points in time. If we've given them millions, shouldn't
| we at least get to make independent implementations of
| the tech we already paid for?
| kube-system wrote:
| > It would be better that the government removes IP on
| such technology for public use, like drugs got generics.
|
| 20-25 year old drugs are a lot more useful than 20-25
| year old GPUs, and the manufacturing supply chain is not
| a bottleneck.
|
| There's no generics for the latest and greatest drugs,
| and a fancy gene therapy might run a _lot_ more than
| $40k.
| latchkey wrote:
| > Those receiving the grants have to pay a private owner of
| the GPUs.
|
| Along similar lines, I'm trying to build a developer
| credits program where I get whomever (AMD/Dell) to purchase
| credits on my super computers, that we then give away to
| developers to build solutions, which drives more demand for
| our hardware, and we commit to re-invest those credits back
| into more hardware. The idea is to create a win-win-win
| (us, them, you) developer flywheel ecosystem. It isn't a
| new idea at all, Nvidia and hyperscalers have been doing
| this for ages.
| ygjb wrote:
| Of course they won't. The investment in the Heavy Press
| Program was the initial build, and just citing one example,
| the Alcoa 50,000 ton forging press was built in 1955,
| operated until 2008, and needed ~$100M to get it operational
| again in 2012.
|
| The investment was made to build the press, which created
| significant jobs and capital investment. The press, and
| others like it, were subsequently operated by and then sold
| to a private operator, which in turn enabled the massive
| expansion of both military manufacturing, and commercial
| aviation and other manufacturing.
|
| The Heavy Press Program was a strategic investment that paid
| dividends by both advancing the state of the art in
| manufacturing at the time it was built, and improving
| manufacturing capacity.
|
| A GPU cluster might not be the correct investment, but a
| strategic investment in increasing, for example, the
| availability of training data, or interoperability of tools,
| or ease of use for building, training, and distributing
| models would probably pay big dividends.
| JumpCrisscross wrote:
| > _A GPU cluster might not be the correct investment, but a
| strategic investment in increasing, for example, the
| availability of training data, or interoperability of
| tools, or ease of use for building, training, and
| distributing models would probably pay big dividends_
|
| Would you mind expanding on these options? Universal
| training data sounds intriguing.
| ygjb wrote:
| Sure, just on the training front, building and
| maintaining a broad corpus of properly managed training
| data with metadata that provides attribution (for
| example, content that is known to be human generated
| instead of model generated, what the source of data is
| for datasets such as weather data, census data, etc), and
| that also captures any licensing encumbrance so that
| consumers of the training data can be confident in their
| ability to use it without risk of legal challenge.
|
| Much of this is already available to private sector
| entities, but having a publicly funded organization
| responsible for curating and publishing this would enable
| new entrants to quickly and easily get a foundation
| without having to scrape the internet again, especially
| given how rapidly model generated content is being
| published.
| mnahkies wrote:
| I think the EPC (energy performance certificate) dataset
| in the UK is a nice example of this. Anyone can download
| a full dataset of EPC data from
| https://epc.opendatacommunities.org/
|
| Admittedly it hasn't been cleaned all that much - you
| still need to put a bit of effort into that (newer
| certificates tend to be better quality), but it's very
| low friction overall. I'd love to see them do this with
| more datasets
| dmix wrote:
| I don't think there's a shortage of capital for AI...
| probably the opposite
|
| Of all the things to expand the scope of government
| spending why would they choose AI, or more specifically
| GPUs?
| devmor wrote:
| There may however, be a shortage of capital for _open
| source_ AI, which is the subject under consideration.
|
| As for the why... because there's no shortage of capital
| for AI. It sounds like the government would like to
| encourage redirecting that capital to something that's
| good for the economy at large, rather than good for the
| investors of a handful of Silicon Valley firms interested
| only in their own short term gains.
| hluska wrote:
| Look at it from the perspective of an elected official:
|
| If it succeeds, you were ahead of the curve. If it fails,
| you were prudent enough to fund an investigation early.
| Either way, bleeding edge tech gives you a W.
| Geezus_42 wrote:
| Or you wasted a bunch of tax payer money on some over
| hyped and over funded nonsense.
| alickz wrote:
| how would you determine that without investigation?
| seunosewa wrote:
| You'll be long gone before they find out.
| ygjb wrote:
| Yeah. There is alot of over hyped and over funded
| nonsense that comes out of NASA. Some of it is hype from
| the marketing and press teams, other hype comes from
| misinterpretation of releases.
|
| None of that changes that there have been major technical
| breakthroughs, and entire classes of products and
| services that didn't exist before those investments in
| NASA (see https://en.wikipedia.org/wiki/NASA_spin-
| off_technologies for a short list). There are 15
| departments and dozens of Agencies that comprise the US
| Federal government, many of whom make investments in
| science and technology as part of their mandates, and
| most of that is delivered through some structure of
| public-private partnerships.
|
| What you see as over-hyped and over-funded nonsense could
| be the next ground breaking technology, and that is why
| we need both elected leaders who (at least in theory)
| represent the will of the people, and appointed, skilled
| bureaucrats who provide the elected leaders with the
| skills, domain expertise, and experience that the winners
| of the popularity contest probably don't have.
|
| Yep, there will be waste, but at least with public funds
| there is the appearance of accountability that just
| doesn't exist with private sector funds.
| hluska wrote:
| Which happens every single day in every government in the
| world.
| phatfish wrote:
| If it succeeds the idea gets sold to private corporations
| or the technology is made public and everyone thinks the
| corporation with the most popular version created it.
|
| If it fails certain groups ensure everyone knows the
| government "wasted" taxpayer money.
| whimsicalism wrote:
| there are many things i think are more capital constrained,
| if the government is trying to subsidize things.
| jvanderbot wrote:
| A much better investment would be to (somehow) revolutionize
| production of chips for AI so that it's all cheaper, more
| reliable, and faster to stand up new generations of software
| and hardware codesign. This is probably much closer to the
| program mentioned in the top level comment: It wasn't to
| produce one type of thing, but to allow better production of
| any large thing from lighter alloys.
| photonthug wrote:
| > Not sure why a publicly accessible GPU cluster would be a
| better solution than the current system of research grants.
|
| You mean a better solution than different teams paying AWS
| over and over, potentially spending 10x on rent rather than
| using all that cash as a down payment on actually owning
| hardware? I can't really speak for the total costs of
| depreciation/hardware maintenance but renting forever isn't
| usually a great alternative to buying.
| CardenB wrote:
| Do you have some information to share to support your bias
| against leasing especially with a depreciating asset?
| manux wrote:
| In Canada, all three major AI research centers use
| clusters created with public money. These clusters
| receive regular additional hardware as new generations of
| GPUs become available. Considering how these institutions
| work, I'm pretty confident they've considered the
| alternatives (renting, AWS, etc). So that's one data
| point.
| photonthug wrote:
| sure, I'll hand it over after you spend your own time
| first to show that everything everywhere that's owned
| instead of leased is a poor financial decision.
| vasili111 wrote:
| AWS is not only hardware but also software, documentation,
| support and more.
| light_hue_1 wrote:
| The problem is that any public cluster would be outdated in 2
| years. At the same time, GPUs are massively overpriced.
| Nvidia's profit margins on the H100 are crazy.
|
| Until we get cheaper cards that stand the test of time,
| building a public cluster is just a waste of money. There are
| far better ways to spend $1b in research dollars.
| JumpCrisscross wrote:
| > _any public cluster would be outdated in 2 years_
|
| The private companies buying hundreds of billions of dollars
| of GPUs aren't writing them off in 2 years. They won't be
| cutting edge for long. But that's not the point--they'll
| still be available.
|
| > _Nvidia 's profit margins on the H100 are crazy_
|
| I don't see how the current practice of giving a researcher a
| grant so they can rent time on a Google cluster that runs
| H100s is more efficient. It's just a question of capex or
| opex. As a state, the U.S. has a structual advantage in the
| former.
|
| > _far better ways to spend $1b in research dollars_
|
| One assumes the U.S. government wouldn't be paying list
| price. In any case, the purpose isn't purely research ROI.
| Like the heavy presses, it's in making a prohibitively-
| expensive capital asset generally available.
| ninininino wrote:
| What about dollar cost averaging your purchases of GPUs? So
| that you're always buying a bit of the newest stuff every
| year rather than just a single fixed investment in hardware
| that will become outdated? Say 100 million a year every year
| for 20 years instead of 2 billion in a single year?
| fweimer wrote:
| Don't these public clusters exist today, and have been around
| for decades at this point, with varying architectures? In the
| sense that you submit a proposal, it gets approved, and then
| you get access for your research?
| JumpCrisscross wrote:
| Not--to my knowledge--for the GPUs necessary to train
| cutting-edge LLMs.
| Maxious wrote:
| All of the major cloud providers offer grants for public
| research https://www.amazon.science/research-awards https:/
| /edu.google.com/intl/ALL_us/programs/credits/research
| https://www.microsoft.com/en-us/azure-academic-research/
|
| NVIDIA offers discounts
| https://developer.nvidia.com/education-pricing
|
| eg. for Australia, the National Computing Infrastructure
| allows researchers to reserve time on:
|
| - 160 nodes each containing four Nvidia V100 GPUs and two
| 24-core Intel Xeon Scalable 'Cascade Lake' processors.
|
| - 2 nodes of the NVIDIA DGX A100 system, with 8 A100 GPUs
| per node.
|
| https://nci.org.au/our-systems/hpc-systems
| NewJazz wrote:
| This is the most recent iteration of a national platform.
| They have tons of GPUs (and CPUs, and flash storage) hooked
| up as a Kubernetes cluster, available for teaching and
| research.
|
| https://nationalresearchplatform.org/
| epaulson wrote:
| The National Science Foundation has been doing this for
| decades, starting with the supercomputing centers in the 80s.
| Long before anyone talked about cloud credits, NSF has had a
| bunch of different programs to allocate time on supercomputers
| to researchers at no cost, these days mostly run out of the
| Office of Advanced Cyberinfrastruture. (The office name is from
| the early 00s) - https://new.nsf.gov/cise/oac
|
| (To connect universities to the different supercomputing
| centers, the NSF funded the NSFnet network in the 80s, which
| was basically the backbone of the Internet in the 80s and early
| 90s. The supercomputing funding has really, really paid off for
| the USA)
| JumpCrisscross wrote:
| > _NSF has had a bunch of different programs to allocate time
| on supercomputers to researchers at no cost, these days
| mostly run out of the Office of Advanced Cyberinfrastruture_
|
| This would be the logical place to put such a programme.
| alephnerd wrote:
| The DoE has also been a fairly active purchaser of GPUs for
| almost two decades now thanks to the Exascale Computing
| Project [0] and other predecessor projects.
|
| The DoE helped subsidize development of Kepler, Maxwell,
| Pascal, etc along with the underlying stack like NVLink,
| NGC, CUDA, etc either via purchases or allowing grants to
| be commercialized by Nvidia. They also played matchmaker by
| helping connect private sector research partners with
| Nvidia.
|
| The DoE also did the same thing for AMD and Intel.
|
| [0] - https://www.exascaleproject.org/
| PostOnce wrote:
| The DoE subsidized the development of GPUs, but so did
| Bitcoin.
|
| But before that, it was video games, like quake. Nvidia
| wouldn't be viable if not for games.
|
| But before that, graphics research was subsidized by the
| DoD, back when visualizing things in 3D cost serious
| money.
|
| It's funny how technology advances.
| Retric wrote:
| It was really Ethereum / Alt coins not Bitcoin that
| caused the GPU demand in 2021.
|
| Bitcoin moved to FPGAs/ASIC very quickly because
| dedicated hardware was vastly more efficient they were
| only viable from Oct 2010. By 2013 when ASIC's came
| online GPU's only made sense if someone else was paying
| for both the hardware and electricity.
| jszymborski wrote:
| As you've rightly pointed out, we have the mechanism, now
| let's fund it properly!
|
| I'm in Canada, and our science funding has likewise fallen
| year after year as a proportion of our GDP. I'm still
| benefiting from A100 clusters funded by tax payer dollars,
| but think of the advantage we'd have over industry if we
| didn't have to fight over resources.
| xena wrote:
| Where do you get access to those as a member of the general
| public?
| kiwih wrote:
| In Australia at least, anyone who is enrolled at or works
| at a university can use the taxpayer-subsidised "Gadi"
| HPC which is part of the National Computing
| Infrastructure (https://nci.org.au/our-systems/hpc-
| systems). I also do mean anyone, I have an undergraduate
| student using it right now (for free _) to fine-tune
| several LLMs.
|
| It also says commercial orgs can get access via
| negotiation, I expect a random member of the public would
| be able to go that route as well. I expect that there
| would be some hurdles to cross, it isn't really common
| for random members of the public to be doing the kinds of
| research Gadi was created to benefit. I expect it is the
| same way in this case in Canada. I suppose the argument
| is if there weren't any gatekeeping at all, you might end
| up with all kinds of unsuitable stuff on the cluster,
| e.g. crypto miners and such.
|
| Possibly another way for a true random person to get
| access would be to get some kind of 0-hour academic
| affiliation via someone willing to back you up, or one
| could enrol in a random AI course or something and then
| talk to the lecturer in charge.
|
| _In reality, the (also taxpayer-subsidised) university
| pays some fee for access, but it doesn't come from any of
| our budgets.
| jph00 wrote:
| Australia's peak HPC has a total of: "2 nodes of the
| NVIDIA DGX A100 system, with 8 A100 GPUs per node".
|
| It's pretty meagre pickings!
| FireBeyond wrote:
| Well, one, it has:
|
| > 160 nodes each containing four Nvidia V100 GPUs
|
| and two, well, it's a CPU-based supercomputer.
| mmastrac wrote:
| I'm going to guess it's Compute Canada, which I don't
| think we non-academics have access to.
| jszymborski wrote:
| That's correct (they go by the Digital Research Alliance
| of Canada now... how boring).
|
| I wish that wasn't the case though!
| jszymborski wrote:
| I get my resources through a combination of servers my
| lab bought with using a government grant and the Digital
| Research Alliance of Canada (nee Compute Canada)'s
| cluster.
|
| These resources aren't available to the public, but if I
| were king for a day we'd increase science funding such
| that we'd have compute resources available to high-school
| students and the general public (possibly following
| training on how to use it).
|
| Making sure folks didn't use it to mine bitcoin would be
| important, though ;)
| cmdrk wrote:
| Yeah, the specific AI/ML-focused program is NAIRR.
|
| https://nairrpilot.org/
|
| Terrible name unless they low-key plan to make AI
| researchers' hair fall out.
| dastbe wrote:
| the US already pays for 2+ aws region for cia/dod. why not
| pay for a region that is only available to researchers?
| blackeyeblitzar wrote:
| What about distributed training on volunteer hardware? Is that
| feasible?
| oersted wrote:
| It is an exciting concept, there's a huge wealth of gaming
| hardware deployed that is inactive at most hours of the day.
| And I'm sure people are willing to pay well above the
| electricity cost for it.
|
| Unfortunately, the dominant LLM architecture makes it
| relatively infeasible right now.
|
| - Gaming hardware has too limited VRAM for training any kind
| of near-state-of-the-art model. Nvidia is being annoyingly
| smart about this to sell enterprise GPUs at exorbitant
| markups.
|
| - Right now communication between machines seems to be the
| bottleneck, and this is way worse with limited VRAM. Even
| with data-centre-grade interconnect (mostly Infiniband, which
| is also Nvidia, smart-asses), any failed links tend to cause
| big delays in training.
|
| Nevertheless, it is a good direction to push towards, and the
| government could indeed help, but it will take time. We need
| both a more healthy competitive landscape in hardware, and
| research towards model architectures that are easy to train
| in a distributed manner (this was also the key to the success
| of Transformers, but we need to go further).
| sharpshadow wrote:
| Couldn't VRAM be subsidised with SSDs on a lower end
| machine? It would make it slower but maybe useful at last.
| oersted wrote:
| Perhaps, the landscape has improved a lot in the last
| couple of years, there are lots of implementation tricks
| to improve efficiency on consumer hardware, particularly
| for inference.
|
| Although it is clear that the computing capacity of the
| GPU would be very underutilized with the SSD as the
| bottleneck. Even using RAM instead of VRAM is pretty
| impractical. It might be a bit better for chips like
| Apple's where the CPU, RAM and GPU are all tightly
| connected on the same SoC, and the main RAM is used as
| the VRAM.
|
| Would that performance be still worth more than the
| electricity cost? Would the earnings be high enough for a
| wide population to be motivated to go through the hassle
| of setting up their machine to serve requests?
| codemusings wrote:
| Ever heard of SETI@home?
|
| https://setiathome.berkeley.edu
| tessellated wrote:
| Followed the link and got two, for me, new infos: both the
| project and Drake are dead.
|
| Used to contribute in the early 2000s with my Pentium for a
| while.
|
| Ever got any results?
|
| Also, for training LLMs, I understand there is a huge
| bandwith problem with this approach.
| ks2048 wrote:
| How about using some of that money to develop CUDA alternatives
| so everyone is not paying the Nvidia tax?
| lukan wrote:
| It would be probably cheaper to negate some IP. There are
| quite some projects and initiatives to make CUDA code run on
| AMD for example, but as far as I know, they all stopped at
| some point, probably because of fear of being sued into
| oblivion.
| whimsicalism wrote:
| It seems like rocm is already fully ready for transformer
| inference, so you are just referring to training?
| janalsncm wrote:
| ROCm is buggy and largely undocumented. That's why we don't
| use it.
| latchkey wrote:
| It is actively improving every day.
|
| https://news.ycombinator.com/item?id=41052750
| belter wrote:
| Please start with the Windows Tax first for Linux users
| buying hardware...and the Apple Tax for Android users...
| zitterbewegung wrote:
| Either you port Tensorflow (Apple)[1] or PyTorch to your
| platform or you allow CUDA to run on your hardware (AMD) [2].
| Companies are incentives to not have NVIDIA having a monopoly
| but the thing is that CUDA is a huge moat due to
| compatibility of all frameworks and everyone knows it. Also,
| all of the cloud or on premises providers use NVIDIA
| regardless.
|
| [1] https://developer.apple.com/metal/tensorflow-plugin/ [2]
| https://www.xda-developers.com/nvidia-cuda-amd-zluda/
| TuringNYC wrote:
| >> Either you port Tensorflow (Apple)[1] or PyTorch to your
| platform or you allow CUDA to run on your hardware (AMD)
| [2]. Companies are incentives to not have NVIDIA having a
| monopoly but the thing is that CUDA is a huge moat due to
| compatibility of all frameworks and everyone knows it.
| Also, all of the cloud or on premises providers use NVIDIA
| regardless.
|
| This never made sense to me -- Apple could easily hire top
| talent to write Apple Silicon bindings for these popular
| libraries. I work at a creative ad agency, we have tons of
| high end apple devices yet the neural cores sit unused most
| of the time.
| jcheng wrote:
| A lot of libraries seem to be working on Apple Silicon
| GPUs but not on ANE. I found this discussion interesting,
| seems like the ANE has a lot of limitations, is not well
| documented, and can only be used indirectly through Core
| ML.
| https://github.com/ggerganov/llama.cpp/discussions/336
| erickj wrote:
| That's the kind of work that can come out of academia and
| open source communities when societies provide the resources
| required.
| latchkey wrote:
| It is being done already...
|
| https://docs.scale-lang.com/
| dogcomplex wrote:
| Or just develop the next wave of chips designed for
| specifically transformer-based architectures (and ternary
| computing), and bypass the needs for GPUs and CUDA altogether
| Zambyte wrote:
| That would be betting against other architectures like
| Mamba, which does not seem like an obviously good bet to
| make yet. Maybe it is though.
| prpl wrote:
| Great idea, too bad the DOE and NSF were there first.
| kjkjadksj wrote:
| The size of the cluster would have to be massive or else your
| job will be on the queue for a year. And even then what are you
| going to do downsize the resources requested so you can get in
| earlier? After a certain point it starts to make more sense to
| just buy your own xeons and run your own cluster.
| Aperocky wrote:
| Imagine if they made a data center with 1957 electronics that
| cost $279 million.
|
| They probably won't be using it now because the phone in your
| pocket is likely more powerful. Moore law did end but data
| center stuff are still evolving order of magnitudes faster than
| forging presses.
| goda90 wrote:
| I'd like to see big programs to increase the amount of cheap,
| clean energy we have. AI compute would be one of many
| beneficiaries of super cheap energy, especially since you
| wouldn't need to chase newer, more efficient hardware just to
| keep costs down.
| Melatonic wrote:
| Yeah this would be the real equivalent of the program people
| are talking about above. That an investing in core networking
| infrastructure (like cables) instead of just giving huge
| handouts to certain corporations that then pocket the
| money.....
| BigParm wrote:
| So we'll have the government bypass markets and force the
| working class to buy toys for the owning class?
|
| If anything, allocate compute to citizens.
| _fat_santa wrote:
| > If anything, allocate compute to citizens.
|
| If something like this were to become a reality, I could see
| something like "CitizenCloud" where once you prove that you
| are a US Citizen (or green card holder or some other
| requirement), you can then be allocated a number of credits
| every month for running workloads on the "CitizenCloud".
| Everyone would get a baseline amount, from there if you can
| prove you are a researcher or own a business related to AI
| then you can get more credits.
| aiauthoritydev wrote:
| Overall government doing anything is a bad idea. There are
| cases however where government is the only entity that can do
| certain things. These are things that involve military, law
| enforcement etc. Outside of this we should rely on private
| industry and for-profit industry as much as possible.
| pavlov wrote:
| The American healthcare industry demonstrates the tremendous
| benefits of rigidly applying this mindset.
|
| Why couldn't law enforcement be private too? You call 911,
| several private security squads rush to solve your immediate
| crime issue, and the ones who manage to shoot the suspect
| send you a $20k bill. Seems efficient. If you don't like the
| size of the bill, you can always get private crime insurance.
| sterlind wrote:
| For a further exploration of this particular utopia, see
| Snowcrash by Neal Stephenson.
| chris_wot wrote:
| That's not correct. The American health care system is an
| extreme example of where private organisations fail overall
| society.
| fragmede wrote:
| > Overall government doing anything is a bad idea.
|
| that is bereft of detail enough to just be wrong. There are
| things that government is good for and things that government
| is bad for, but "anything" is just too broad, and reveals an
| anti-government bias which just isn't well thought out.
| goatlover wrote:
| Why are governments a bad idea? Seems the human race has
| opted for governments doing things since the dawn of
| civilization. Building roads, providing defense, enforcing
| rights, provide social safety nets, funding costly scientific
| endeavors.
| com2kid wrote:
| Ugh.
|
| Government distorting undeveloped markets that have a lot of
| room for competition to increase efficiencies is a bad thing.
|
| Government agencies running programs that should not be
| profitable, or where the only profit to be left comes at the
| expense of society as a whole, is a good thing.
|
| Lots of basic medicine is the go to example here, treating
| cancer isn't going to be "profitable" and attempting to make
| it such just leads to dead people.
|
| On the flip side, one can argue that dentistry has seen
| amazing strides in affordability and technological progress
| through the free market. From dental xrays to improvements in
| dental procedures to make them less painful for the patients.
|
| Eye surgery is another area where competition has lead to
| good consumer outcomes.
|
| But life of death situations where people can't spend time
| researching? The only profit there comes through exploiting
| people.
| Angostura wrote:
| To summarise: There are some things where government action
| is the best solution, however by default see if the private
| sector can sort it first.
| varenc wrote:
| I just watched this 1950s DoD video on the heavy press program
| and highly recommend it:
| https://www.youtube.com/watch?v=iZ50nZU3oG8
| newzisforsukas wrote:
| https://loc.gov/pictures/search/?q=Photograph:%20oh1540&fi=n.
| ..
| spullara wrote:
| It makes much more sense to invest in a next generation fab for
| GPUs than to buy GPUs and more closely matches this kind of
| project.
| epolanski wrote:
| Does it? You're looking at a gargantuan investment in terms
| of money that would also require thousands of staff.
|
| That just doesn't seem a good idea.
| inhumantsar wrote:
| > gargantuan investment
|
| it's a bigger investment, but it's an investment which will
| pay dividends for decades. with a compute cluster, the
| government is taking on an asset in the form of the cluster
| but also liabilities in the form of operations and
| administration.
|
| with a fab, the government takes on either a promise of
| lower taxes for N years or hands over a bag of cash. after
| that they're clear of it. the company operating the fab
| will be responsible for the risks and on-going expenses.
|
| on top of that...
|
| > thousand of staff
|
| the company will employ/attract even more top talent, each
| of whom will pay taxes and eventually go on to found
| related companies or teach the next generation or what have
| you. not to mention the risk reduction that comes with on-
| shoring something as critical to national security and the
| economy as a fab.
|
| a public-access compute cluster isn't a bad idea, but it
| probably makes more sense to fund/operate it in similar PPP
| model. non-profit consortium of universities and business
| pool resources to plan, build, and operate it, government
| recognizes it as a public good and chips in a significant
| amount of money to help.
| rkique wrote:
| Very much in this spirit is the NSF-funded National Deep
| Inference Fabric, which lets researchers run remote experiments
| on foundation models: https://ndif.us. They just announced a
| pilot program for Llama405b!
| cyanydeez wrote:
| Better idea would be to make various open source packages
| utilities and put maintainers everywhere funded by public good.
|
| AI is a fad, the brick and mortar of the future is open source
| tools.
| fintler wrote:
| For the DoE, take a look at:
|
| https://doeleadershipcomputing.org/
| carschno wrote:
| In the Netherlands, for instance, there is "the national
| supercomputer" Snellius:
| https://www.surf.nl/en/services/snellius-the-national-superc...
| I am not sure about its budget, but my impression as a user is
| that its resources are never fully used. At least, none of my
| jobs ever had to queue. I doubt that it can compete with the
| scale of resources that FAANG companies have available, but
| then again, I also doubt how research would benefit.
|
| Sure, academia could build LLMs, and there is at least one
| large-scale project for that: https://gpt-nl.com/ On the other
| hand, this kind of models still need to demonstrate specific
| scientific value that goes beyond using a chatbot for
| generating ideas and summarizing documents.
|
| So I fully agree that the research budget cuts in the past
| decades have been catastrophic, and probably have contributed
| to all the disasters the world is currently facing. But I think
| that funding prestigious super-projects is not the best way to
| spend funds.
| teekert wrote:
| Snellius is a nice resource. A powerful Slurm based HTC
| cluster with different cues for different workloads
| (cpu/genomics, gpu/deep learning).
|
| To access the resource I had to go through EuroCC [0], which
| is a network facilitating access to and exploitation of
| HPC/HTC infra. It is (or can be) a great competing model to
| US cloud providers.
|
| As a small business I got 8 hrs of consultancy and 10k
| compute hours for free. I'm still learning the details but my
| understanding is is that after that the prices are very
| competitive.
|
| [0] https://www.eurocc-access.eu/
| matteocontrini wrote:
| Italy built the Leonardo HPC cluster, it's one of the largest
| in EU and was created by a consortium of universities. After
| just over a year it's already at full capacity and expansion
| plans have been anticipated because of this.
| B4CKlash wrote:
| Eric Schmidt advocated for this exact thing in an Op-ed piece
| in the latest MIT Technology Review.
|
| [1] https://www.technologyreview.com/2024/05/13/1092322/why-
| amer...
| maxdo wrote:
| so that North Korea will create small call centers for cheaper,
| since they can get these models for free?
| HanClinto wrote:
| The article argues that the threat of foreign espionage is not
| solved by closing models.
|
| > Some people argue that we must close our models to prevent
| China from gaining access to them, but my view is that this
| will not work and will only disadvantage the US and its allies.
| Our adversaries are great at espionage, stealing models that
| fit on a thumb drive is relatively easy, and most tech
| companies are far from operating in a way that would make this
| more difficult. It seems most likely that a world of only
| closed models results in a small number of big companies plus
| our geopolitical adversaries having access to leading models,
| while startups, universities, and small businesses miss out on
| opportunities.
| tempfile wrote:
| This argument implies that cheap phones are bad since
| telemarketers can use them.
| mrfinn wrote:
| You guys really need to get over your bellicose POV of the
| world. Actually, before it destroys you. Really, is not
| necessary. Most people in the world just want to leave in
| peace, and see their children grow happily. For each data
| center NK would create there will be a thousand of peaceful,
| kind, and well-intentioned AI projects going on. Or maybe more.
| the8thbit wrote:
| "Eventually though, open source Linux gained popularity -
| initially because it allowed developers to modify its code
| however they wanted ..."
|
| I find the language around "open source AI" to be confusing. With
| "open source" there's usually "source" to open, right? As in,
| there is human legible code that can be read and modified by the
| user? If so, then how can current ML models be open source?
| They're very large matrices that are, for the most part,
| inscrutable to the user. They seem akin to binaries, which, yes,
| can be modified by the user, but are extremely obscured to the
| user, and require enormous effort to understand and effectively
| modify.
|
| "Open source" code is not just code that isn't executed remotely
| over an API, and it seems like maybe its being conflated with
| that here?
| orthoxerox wrote:
| Open training dataset + open steps sufficient to train exactly
| the same model.
| the8thbit wrote:
| This isn't what Meta releases with their models, though I
| would like to see more public training data. However, I still
| don't think that would qualify as "open source". Something
| isn't open source just because its reproducible out of
| composable parts. If one, very critical and system defining
| part is a binary (or similar) without publicly available
| source code, then I don't think it can be said to be "open
| source". That would be like saying that Windows 11 is open
| source because Windows Calculator is open source, and its a
| component of Windows.
| blackeyeblitzar wrote:
| Here's one list of what is needed to be actually open
| source:
|
| https://blog.allenai.org/hello-olmo-a-truly-open-
| llm-43f7e73...
| orthoxerox wrote:
| That's what I meant by "open steps", I guess I wasn't clear
| enough.
| the8thbit wrote:
| Is that what you meant? I don't think releasing the
| sequence of steps required to produce the model satisfies
| "open source", which is how I interpreted you, because
| there is still no source code for the model.
| Yizahi wrote:
| They can't release training dataset if it was illegally
| scrapped all over the web without permission :) (taps head)
| bilsbie wrote:
| Can't you do fine tuning on those binaries? That's a
| modification.
| the8thbit wrote:
| You can fine tune the models, and you can modify binaries.
| However, there is no human readable "source" to open in
| either case. The act of "fine tuning" is essentially brute
| forcing the system to gradually alter the weights such that
| loss is reduced against a new training set. This limits what
| you can actually do with the model vs an actual open source
| system where you can understand how the system is working and
| modify specific functionality.
|
| Additionally, models can be (and are) fine tuned via APIs, so
| if that is the threshold required for a system to be "open
| source", then that would also make the GPT4 family and other
| such API only models which allow finetuning open source.
| whimsicalism wrote:
| I don't find this argument super convincing.
|
| There's a pretty clear difference between the 'finetuning'
| offered via API by GPT4 and the ability to do whatever sort
| of finetuning you want and get the weights at the end that
| you can do with open weights models.
|
| "Brute forcing" is not the correct language to use for
| describing fine-tuning. It is not as if you are trying
| weights randomly and seeing which ones work on your dataset
| - you are following a gradient.
| the8thbit wrote:
| "There's a pretty clear difference between the
| 'finetuning' offered via API by GPT4 and the ability to
| do whatever sort of finetuning you want and get the
| weights at the end that you can do with open weights
| models."
|
| Yes, the difference is that one is provided over a remote
| API, and the provider of the API can restrict how you
| interact with it, while the other is performed directly
| by the user. One is a SaaS solution, the other is a
| compiled solution, and neither are open source.
|
| ""Brute forcing" is not the correct language to use for
| describing fine-tuning. It is not as if you are trying
| weights randomly and seeing which ones work on your
| dataset - you are following a gradient."
|
| Whatever you want to call it, this doesn't sound like
| modifying functionality in source code. When I modify
| source code, I might make a change, check what that does,
| change the same functionality again, check the new
| change, etc... up to maybe a couple dozen times. What I
| don't do is have a very simple routine make very small
| modifications to all of the system's functionality, then
| check the result of that small change across the broad
| spectrum of functionality, and repeat millions of times.
| Kubuxu wrote:
| The gap between fine-tuning API and weights-available is
| much more significant than you give it credit for.
|
| You can take the weights and train LoRAs (which is close
| to fine-tuning), but you can also build custom adapters
| on top (classification heads). You can mix models from
| different fine-tunes or perform model surgery (adding
| additional layers, attention heads, MoE).
|
| You can perform model decomposition and amplify some of
| its characteristics. You can also train multi-modal
| adapters for the model. Prompt tuning requires weights as
| well.
|
| I would even say that having the model is more potent in
| the hands of individual users than having the dataset.
| thayne wrote:
| That still doesn't make it open source.
|
| There is a massive difference between a compiled binary
| that you are allowed to do anything you want with,
| including modifying it, building something else on top or
| even pulling parts of it out and using in something else,
| and a SaaS offering where you can't modify the software
| at all. But that doesn't make the compiled binary open
| source.
| emporas wrote:
| > When I modify source code, I might make a change, check
| what that does, change the same functionality again,
| check the new change, etc... up to maybe a couple dozen
| times.
|
| You can modify individual neurons if you are so inclined.
| That's what Anthropic have done with the Claude family of
| models [1]. You cannot do that using any closed model. So
| "Open Weights" looks very much like "Open Source".
|
| Techniques for introspection of weights are very
| primitive, but i do think new techniques will be
| developed, or even new architectures which will make it
| much easier.
|
| [1] https://www.anthropic.com/news/mapping-mind-language-
| model
| the8thbit wrote:
| "You can modify individual neurons if you are so
| inclined."
|
| You can also modify a binary, but that doesn't mean that
| binaries are open source.
|
| "That's what Anthropic have done with the Claude family
| of models [1]. ... Techniques for introspection of
| weights are very primitive, but i do think new techniques
| will be developed"
|
| Yeah, I don't think what we have now is robust enough
| interpretability to be capable of generating something
| comparable to "source code", but I would like to see us
| get there at some point. It might sound crazy, but a few
| years ago the degree of interpretability we have today
| (thanks in no small part to Anthropic's work) would have
| sounded crazy.
|
| I think getting to open sourcable models is probably
| pretty important for producing models that actually do
| what we want them to do, and as these models become more
| powerful and integrated into our lives and production
| processes the inability to make them do what we actually
| want them to do may become increasingly dangerous.
| Muddling the meaning of open source today to market your
| product, then, can have troubling downstream effects as
| focus in the open source community may be taken away from
| interpretability and on distributing and tuning public
| weights.
| bilsbie wrote:
| You make a good point but those are also just limitations
| of the technology (or at least our current understanding of
| it)
|
| Maybe an analogy would help. A family spent generations
| breeding the perfect apple tree and they decided to "open
| source" it. What would open sourcing look like?
| the8thbit wrote:
| "You make a good point but those are also just
| limitations of the technology (or at least our current
| understanding of it)"
|
| Yeah, that _is_ my point. Things that don 't have source
| code can't be open source.
|
| "Maybe an analogy would help. A family spent generations
| breeding the perfect apple tree and they decided to "open
| source" it. What would open sourcing look like?"
|
| I think we need to be weary of dilemmas without solutions
| here. For example, let's think about another analogy: I
| was in a car accident last week. How can I open source my
| car accident?
|
| I don't think all, or even most things, are actually
| "open sourcable". ML models could be open sourced, but it
| would require a lot of work to interpret the models and
| generate the source code from them.
| gowld wrote:
| Be charitable and intellectually curious. What would
| "open" look like?
|
| GNU says "The GNU GPL can be used for general data which
| is not software, as long as one can determine what the
| definition of "source code" refers to in the particular
| case. As it turns out, the DSL (see below) also requires
| that you determine what the "source code" is, using
| approximately the same definition that the GPL uses."
|
| and offers these categories, for example:
|
| https://www.gnu.org/licenses/license-
| list.en.html#NonFreeSof...
|
| * Software Licenses
|
| * * GPL-Compatible Free Software Licenses
|
| \
|
| * * GPL-Incompatible Free Software Licenses
|
| \
|
| * Licenses For Documentation
|
| * * Free Documentation Licenses
|
| \
|
| * Licenses for Other Works
|
| * * Licenses for Works of Practical Use besides Software
| and Documentation
|
| * * Licenses for Fonts
|
| * * Licenses for Works stating a Viewpoint (e.g., Opinion
| or Testimony)
|
| * * Licenses for Designs for Physical Objects
| the8thbit wrote:
| "Be charitable and intellectually curious. What would
| "open" look like?"
|
| To really be intellectually curious we need to be open to
| the idea that there is not (yet) a solution to this
| problem. Or in the analogy you laid out, that it is
| simply not possible for the system to be "open source".
|
| Note that most of the licenses listed under the "Licenses
| for Other Works" section say "It is incompatible with the
| GNU GPL. Please don't use it for software or
| documentation, since it is incompatible with the GNU GPL
| and with the GNU FDL." This is because these are not free
| software/open source licenses. They are licenses that the
| FSF endorses because they encourage openness and copyleft
| in non-software mediums, and play nicely with the GPL
| _when used appropriately_ (i.e. not for software).
|
| The GPL _is_ appropriate for many works that we wouldn 't
| conventionally view as software, but in those contexts
| the analogy is usually so close to the literal nature of
| software that it stops being an analogy. The major
| difference is public perception. For example, we don't
| generally view jpegs as software. However, jpegs, at
| their heart, are executable binaries with very domain
| specific instructions that are executed in a very much
| non-Turing complete context. The source code for the jpeg
| is the XCF or similar (if it exists) which contains a
| specification (code) for building the binary. The code
| becomes human readable once loaded into an IDE, such as
| GIMP, designed to display and interact with the
| specification. This is code that is most easily
| interacted with using a visual IDE, but that doesn't
| change the fact that it _is_ code.
|
| There are some scenarios where you could identify a
| "source code" but not a "software". For example, a cake
| can be open sourced by releasing the recipe. In such a
| context, though, there is literally source code. It's
| just that the code never produces a binary, and is
| compiled by a human and kitchen instead of a computer.
| There is open source hardware, where the source code is a
| human readable hardware specification which can be easily
| modified, and the hardware is compiled by a human or
| machine using that specification.
|
| The scenario where someone has bred a specific plant,
| however, can not be open source, unless they have also
| deobfuscated the genome, released the genome publicly,
| and there is also some feasible way to convert the
| deobfuscated genome, or a modification of it, into a
| seed.
| jpadkins wrote:
| > vs an actual open source system where you can understand
| how the system is working and modify specific
| functionality.
|
| No one on the planet understands how the model weights work
| exactly, nor can they modify them specifically (i.e. hand
| modifying the weights to get the result they want). This is
| an impossible standard.
|
| The source code is open (sorta, it does have some
| restrictions). The weights are open. The training data is
| closed.
| the8thbit wrote:
| > No one on the planet understands how the model weights
| work exactly
|
| Which is my point. These models aren't open source
| because there is no source code to open. Maybe one day we
| will have strong enough interpretability to generate
| source from these models, and _then_ we could have open
| source models. But today its not possible, and changing
| the meaning of open source such that it is possible
| probably isn 't a great idea.
| jsheard wrote:
| I also think that something like Chromium is a better analogy
| for corporate open source models than a grassroots project like
| Linux is. Chromium is technically open source, but Google has
| absolute control over the direction of it's development and
| realistically it's far too complex to maintain a fork without
| Googles resources, just like Meta has complete control over
| what goes into their open models, and even if they did release
| all the training data and code (which they don't) us mere plebs
| could never afford to train a fork from scratch anyway.
| skybrian wrote:
| I think you're right from the perspective of an individual
| developer. You and I are not about to fork Chromium any time
| soon. If you presume that forking is impractical then sure,
| the right to fork isn't worth much.
|
| But just because a single developer couldn't do it doesn't
| mean it couldn't be done. It means nobody has organized a
| large enough effort yet.
|
| For something like a browser, which is critical for security,
| you need both the organization and the trust. Despite
| frequent criticism, Mozilla (for example) is still considered
| pretty trustworthy in a way that an unknown developer can't
| be.
| Yizahi wrote:
| If Microsoft can't do it, then we can reasonably conclude
| that it can't be done for any practical purpose. Discussing
| infinitesimal possibilities is better left to philosophers.
| skybrian wrote:
| Doesn't Microsoft maintain its own fork of Chromimum?
| umbra07 wrote:
| yes - their browser is chromium-based
| candiddevmike wrote:
| None of Meta's models are "open source" in the FOSS sense, even
| the latest Llama 3.1. The license is restrictive. And no one
| has bothered to release their training data either.
|
| This post is an ad and trying to paint these things as
| something they aren't.
| JumpCrisscross wrote:
| > _no one has bothered to release their training data_
|
| If the FOSS community sets this as the benchmark for open
| source in respect of AI, they're going to lose control of the
| term. In most jurisdictions it would be illegal for the likes
| of Meta to release training data.
| exe34 wrote:
| the training data is the source.
| JumpCrisscross wrote:
| > _the training data is the source_
|
| Sure. But that's not going to be released. The term open
| source AI cannot be expected to cover it because it's not
| practical.
| diggan wrote:
| So because it's really hard to do proper Open Source with
| these LLMs, means we need to change the meaning of Open
| Source so it fits with these PR releases?
| JumpCrisscross wrote:
| > _because it 's really hard to do proper Open Source
| with these LLMs, means we need to change the meaning of
| Open Source so it fits with these PR releases?_
|
| Open training data is hard to the point of
| impracticality. It requires excluding private and
| proprietary data.
|
| Meanwhile, the term "open source" is massively popular.
| So it will get used. The question is how.
|
| Meta _et al_ would love for the choice to be between, on
| one hand, open weights only, and, on the other hand, open
| training data, because the latter is impractical. That
| dichotomy guarantees that when someone says open source
| AI they 'll mean open weights. (The way open source
| software, today, generally means source available, not
| FOSS.)
| Palomides wrote:
| source available is absolutely not the same as open
| source
|
| you are playing very loosely with terms that have
| specific, widely accepted definitions (e.g.
| https://opensource.org/osd )
|
| I don't get why you think it would be useful to call LLMs
| with published weights "open source"
| JumpCrisscross wrote:
| > _terms that have specific, widely accepted definitions_
|
| OSF's definition is far from the only one [1].
| Switzerland is currently implementing CH Open's
| definition, the EU another one, _et cetera_.
|
| > _I don 't get why you think it would be useful to call
| LLMs with published weights "open source"_
|
| I don't. I'm saying that if the choice is between open
| weights or open weights + open training data, open
| weights will win because the useful definition will
| outcompete the pristine one in a public context.
|
| [1] https://en.wikipedia.org/wiki/Open-
| source_software#Definitio...
| diggan wrote:
| For the EU, I'm guessing you're talking about the EUPL,
| which is FSF/OSI approved and GPL compatible, generally
| considered copyleft.
|
| For the CH Open, I'm not finding anything specific, even
| from Swiss websites, could you help me understand what
| you're referring to here?
|
| I'm guessing that all these definitions have at least
| some points in common, which involves (another guess) at
| least being able to produce the output artifacts/binaries
| by yourself, something that you cannot do with Llama,
| just as an example.
| JumpCrisscross wrote:
| > _For the CH Open, I 'm not finding anything specific,
| even from Swiss websites, could you help me understand
| what you're referring to here_
|
| Was on the _HN_ front page earlier [1][2]. The definition
| comes strikingly close to source on request with no use
| restrictions.
|
| > _all these definitions have at least some points in
| common_
|
| Agreed. But they're all different. There isn't an
| accepted defintiion of open source even when it comes to
| software; there is an accepted set of broad principles.
|
| [1] https://news.ycombinator.com/item?id=41047172
|
| [2] https://joinup.ec.europa.eu/collection/open-source-
| observato...
| diggan wrote:
| > Agreed. But they're all different. There isn't an
| accepted defintiion of open source even when it comes to
| software; there is an accepted set of broad principles.
|
| Agreed, but are we splitting hairs here and is it
| relevant to the claim made earlier?
|
| > (The way open source software, today, generally means
| source available, not FOSS.)
|
| Do any of these principles or definitions from these orgs
| agree/disagree with that?
|
| My hypothesis is that they generally would go against
| that belief and instead argue that open source is
| different from source available. But I haven't looked
| specifically to confirm if that's true or not, just a
| guess.
| JumpCrisscross wrote:
| > _are we splitting hairs here and is it relevant to the
| claim made earlier?_
|
| I don't think so. Take the Swiss definition. Source on
| request, not even available. Yet being branded and
| accepted as open source.
|
| (To be clear, the Swiss example favours FOSS. But it also
| permits source on request and bundles them together under
| the same label.)
| Palomides wrote:
| diluting open source into a marketing term meaning "you
| can download something" would be a sad result
| SquareWheel wrote:
| > specific, widely accepted definitions
|
| Realistically, nobody outside of Hacker News commenters
| have ever cared about the OSD. It's just not how the term
| is used colloquially.
| Palomides wrote:
| who says open source colloquially? ime anyone who doesn't
| care about software licenses will just say free (per free
| beer)
|
| and (strong personal opinion) any software developer
| should have a firm grip on the terminology and details
| for legal reasons
| SquareWheel wrote:
| > who says open source colloquially?
|
| There is a large span of people between gray beard
| programmer and lay person, and many in that span have
| some concept of open-source. It's often used synonymously
| with visible source, free software, or in this case, open
| weights.
|
| It seems unfortunate - though expected - that over half
| of the comments in this thread are debating the OSD for
| the umpeenth time instead of discussing the actual model
| release or accompanying news posts. Meanwhile communities
| like /r/LocalLlama are going hog wild with this release
| and already seeing what it can do.
|
| > any software developer should have a firm grip on the
| terminology and details for legal reasons
|
| They'd simply need to review the terms of the license to
| see if it fits their usage. It doesn't really matter if
| the license satisfies the OSD or not.
| diggan wrote:
| > Open training data is hard to the point of
| impracticality. It requires excluding private and
| proprietary data.
|
| Right, so the onus is on Facebook/Meta to get that right,
| then they could call something Open Source, until then,
| find another name that already doesn't have a specific
| meaning.
|
| > (The way open source software, today, generally means
| source available, not FOSS.)
|
| No, but it's going in that way. Open Source, today, still
| means that the things you need to build a project, is
| publicly available for you to download and run on your
| own machine, granted you have the means to do so. What
| you're thinking of is literally called "Source Available"
| which is very different from "Open Source".
|
| The intent of Open Source is for people to be able to
| reproduce the work themselves, with modifications if they
| want to. Is that something you can do today with the
| various Llama models? No, because one core part of the
| projects "source code" (what you need to reproduce it
| from scratch), the training data, is being held back and
| kept private.
| unethical_ban wrote:
| >Meanwhile, the term "open source" is massively popular.
| So it will get used. The question is how.
|
| Here's the source of the disagreement. You're justifying
| the use of the term "open source" by saying it's logical
| for Meta to want to use it for its popularity and layman
| (incorrect) understanding.
|
| Other person is saying it doesn't matter how convenient
| it is or how much Meta wants to use it, that the term
| "open source" is misleading for a product where the
| "source" is the training data, _and_ the final product
| has onerous restrictions on use.
|
| This would be like Adobe giving Photoshop away for free,
| but for personal use only and not for making ads for
| Adobe's competitors. Sure, Adobe likes it and most users
| may be fine with it, but it isn't open source.
|
| >The way open source software, today, generally means
| source available, not FOSS.
|
| I don't agree with that. When a company says "open
| source" but it's not free, the tech community is quick to
| call it "source available" or "open core".
| JumpCrisscross wrote:
| > _You 're justifying the use of the term "open source"
| by saying it's logical for Meta to want to use it for its
| popularity and layman (incorrect) understanding_
|
| I'm actually not a fan of Meta's definition. I'm arguing
| specifically against an unrealistic definition, because
| for practical purposes that cedes the term to Meta.
|
| > _the term "open source" is misleading for a product
| where the "source" is the training data, and the final
| product has onerous restrictions on use_
|
| Agree. I think the focus should be on the use
| restrictions.
|
| > _When a company says "open source" but it's not free,
| the tech community is quick to call it "source available"
| or "open core"_
|
| This isn't consistently applied. It's why we have the
| free vs open vs FOSS fracture.
| plsbenice34 wrote:
| Of course it could be practical - provide the data. The
| fact of that society is a dystopian nightmare controlled
| by a few megacorporations that don't want free
| information does not justify outright changing the
| meaning of the language.
| JumpCrisscross wrote:
| > _provide the data_
|
| Who? It's not their data.
| exe34 wrote:
| why are they using it?
| guitarlimeo wrote:
| And why legislation allows them to use the data to train
| their LLM and release that, but not release the data?
| tintor wrote:
| Meta can call it something else other than open source.
|
| Synthetic part of the training data could be released.
| JimDabell wrote:
| I don't think it's that simple. The source is "the
| preferred form of the work for making modifications to
| it" (to use the GPL's wording).
|
| For an LLM, that's not the training data. That's the
| model itself. You don't make changes to an LLM by going
| back to the training data and making changes to it, then
| re-running the training. You update the model itself with
| more training data.
|
| You can't even use the training code and original
| training data to reproduce the existing model. A lot of
| it is non-deterministic, so you'll get different results
| each time anyway.
|
| Another complication is that the object code for normal
| software is a clear derivative work of the source code.
| It's a direct translation from one form to another. This
| isn't the case with LLMs and their training data. The
| models learn from it, but they aren't simply an
| alternative form of it. I don't think you can describe an
| LLM as a derivative work of its training data. It learns
| from it, it isn't a copy of it. This is mostly the reason
| why distributing training data is infeasible - the
| model's creator may not have the license to do so.
|
| Would it be extremely useful to have the original
| training data? Definitely. Is distributing it the same as
| distributing source code for normal software? I don't
| think so.
|
| I think new terminology is needed for open AI models. We
| can't simply re-use what works for human-editable code
| because it's a fundamentally different type of thing with
| different technical and legal constraints.
| jononor wrote:
| No the preferred way to make modifications is using the
| the training code. One may also input a snapshot weighs
| to start from, but the training code is definitely what
| you would modify to make a change.
| exe34 wrote:
| how do you train it in a different language by changing
| the training code?
| jononor wrote:
| By selecting different dataset. Of course this dataset
| does need to exist. In practice building and curating
| datasets also involves a lot of code.
| exe34 wrote:
| sounds like you need the data to train the model.
| root_axis wrote:
| No. It's an asset used in the training process, the
| source code can process arbitrary training data.
| sangnoir wrote:
| We've had a similar debate before, but the last time it
| about whether Linux device drivers based on non-public
| datasheets under NDA were actually open source. This
| debate occurred again over drivers that interact with
| binary blobs.
|
| I disagree with the purists - if you can _legally_ change
| the source or weights - even without having access to the
| data used by the upstream authors - it 's open enough for
| me. YMMV.
| wrs wrote:
| I don't think even that is true. I conjecture that
| Facebook couldn't reproduce the model weights if they
| started over with the same training data, because I doubt
| such a huge training run is a reproducible deterministic
| process. I don't think _anyone_ has "the" source.
| exe34 wrote:
| numpy.random.seed(1234)
| mesebrec wrote:
| Regardless of the training data, the license even heavily
| restricts how you can use the model.
|
| Please read through their "acceptable use" policy before
| you decide whether this is really in line with open source.
| JumpCrisscross wrote:
| > _Please read through their "acceptable use" policy
| before you decide whether this is really in line with
| open source_
|
| I'm not taking a specific posiion on this license. I
| haven't read it closely. My broad point is simply that
| open source AI, as a term, cannot practically require the
| training data be made available.
| guitarlimeo wrote:
| > In most jurisdictions it would be illegal for the likes
| of Meta to release training data.
|
| How come releasing an LLM trained on that data is not
| illegal then? I think it should be.
| blackeyeblitzar wrote:
| AI2 has released training data in their OLMo model:
| https://blog.allenai.org/hello-olmo-a-truly-open-
| llm-43f7e73...
| causal wrote:
| "Open weights" is a more appropriate term but I'll point out
| that these weights are also largely inscrutable to the people
| with the code that trained it. And for licensing reasons, the
| datasets may not be possible to share.
|
| There is still a lot of modifying you can do with a set of
| weights, and they make great foundations for new stuff, but
| yeah we may never see a competitive model that's 100% buildable
| at home.
|
| Edit: mkolodny points out that the model code is shared (under
| llama license at least), which is really all you need to run
| training https://github.com/meta-
| llama/llama3/blob/main/llama/model.p...
| aerzen wrote:
| LLAMA is an open-weights model. I like this term, let's use
| that instead of open source.
| gowld wrote:
| Can a human programmer edit the weights according to some
| semantics?
| sebastiennight wrote:
| It is possible to merge two fine-tunes of models from the
| same family by... wait for it... averaging or combining
| their weights[0].
|
| I am still amazed that we can do that.
|
| [0]: https://arxiv.org/abs/2212.09849
| root_axis wrote:
| Yes. Using fine tuning.
| sitkack wrote:
| Yes, there is the concept of a "frakenmerge" and folks
| have also bolted on vision and audio models to LLMs.
| stavros wrote:
| "Open weights" means you can use the weights for free (as in
| beer). "Open source" means you get the training dataset and
| the methodology. ~Nobody does open source LLMs.
| _heimdall wrote:
| Why is the dataset required for it to be open source?
|
| If I self host a project that is open sourced rather than
| paying for a hosted version, like Sentry.io for example, I
| don't expect data to come along with the code. Licensing
| rights are always up for debate in open source, but I
| wouldn't expect more than the code to be available and
| reviewable for anything needed to build and run the
| project.
|
| In the case of an LLM I would expect that to mean the code
| run to train the model, the code for the model data
| structure itself, and the control code for querying the
| model should all be available. I'm not actually sure if
| Meta does share all that, but training data is separate
| from open source IMO.
| solarmist wrote:
| The sticking point is you can't build the model. To be
| able to build the model from scratch you need methodology
| and a complete description of the data set.
|
| They only give you a blob of data you can run.
| _heimdall wrote:
| Got it, that makes sense. I still wouldn't expect them to
| have to publicly share the data itself, but if you can't
| take the code they share and run it against your own data
| to build a model that wouldn't be open source in my
| understanding of it.
| TeMPOraL wrote:
| Data _is_ the source code here, though. Training code is
| effectively a build script. Data that goes into training
| a model does _not_ function like assets in videogames;
| you can 't swap out the training dataset after release
| and get substantially the same thing. If anything, you
| can imagine the weights themselves are the asset - and
| even if the vendor is granting most users a license to
| copy and modify it (unlike with videogames), the _asset
| itself_ isn 't open source.
|
| So, the only bit that's actually open-sourced in these
| models is the inference code. But that's a trivial part
| that people can procure equivalents of elsewhere or
| reproduce from published papers. In this sense, even if
| you think calling the models "open source" is correct, it
| doesn't really mean much, because the only parts that
| matter are _not_ open sourced.
| derefr wrote:
| Compare/contrast:
|
| DOOM-the-engine is open source (https://github.com/id-
| Software/DOOM), even though DOOM-the-asset-and-scenario-
| data is not. While you need a copy of DOOM-the-asset-and-
| scenario-data to "use DOOM to run DOOM", you are free to
| build _other games_ using DOOM-the-engine.
| echoangle wrote:
| I think no one would claim that "Doom" is open source
| though, if that's the situation.
| camgunz wrote:
| That's what op is saying, the engine is GPLv2, but the
| assets are copyrighted. There's Freedoom though and it's
| pretty good [0].
|
| [0]: https://freedoom.github.io/
| stavros wrote:
| Data is to models what code is to software.
| _heimdall wrote:
| I don't quite agree there. Based on other comments it
| sounds like Meta doesn't open source the code used to
| train the model, that would make it not open source in my
| book.
|
| The trained model doesn't need to be open source though,
| and frankly I'm not sure what the value there is
| specifically with regards to OSS. I'm not aware of a
| solution to interpretability problem, even if the model
| is shared we can't understand what's in it.
|
| Microsoft ships obfuscated code with Windows builds, but
| that doesn't make it open source.
| Xelynega wrote:
| Wouldn't the "source code" of the model be closer to the
| source code of a compiler or the runtime library?
|
| IMO a pre-trained model given with the source code used
| to train/run it is analogous to a company shipping a
| compiler and a compiled binary without any of the source,
| which is why I don't think it's "open source" without the
| training data.
| _heimdall wrote:
| You really should be able to train a model on whatever
| data you choose to use though.
|
| Training data instead source code at all, it's content
| fed into the ingestion side to train a model. As long as
| source for ingedting and training a model is available,
| which it sounds like isn't the case for Meta, that would
| be open source as best I understand it.
|
| Said a little differently, I would need to be able to
| review all code used to generate a model and all code
| used to query the model for it to be OSS. I don't need
| Meta's training data or their actual model at all, I can
| train my own with code that I can fully audit and modify
| if I choose to.
| gowld wrote:
| https://opensource.org/osd
|
| "The source code must be the preferred form in which a
| programmer would modify the program. Deliberately
| obfuscated source code is not allowed. Intermediate forms
| such as the output of a preprocessor or translator are
| not allowed."
|
| > In the case of an LLM I would expect that to mean the
| code run to train the model, the code for the model data
| structure itself, and the control code for querying the
| model should all be available
|
| The M in LLM is for "Model".
|
| The code you describe is for an LLM _harness_ , not for
| an LLM. The code for the _LLM_ is whatever is needed to
| enable a developer to _modify_ to inputs and then build a
| modified output LLM (minus standard generally available
| tools not custom-created for that product).
|
| Training data is one way to provide this. Another way is
| some sort of semantic model editor for an interpretable
| model.
| _heimdall wrote:
| I still don't quite follow. If Meta were to provide all
| code required to train a model (it sounds like they
| don't), and they provided the code needed to query the
| model you train to get answers how is that not open
| source?
|
| > Deliberately obfuscated source code is not allowed.
| Intermediate forms such as the output of a preprocessor
| or translator are not allowed.
|
| This definition actually makes it impossible for _any_
| LLM to be considered open source until the
| interpretability problem is solved. The trained model is
| functionally obfuscated code, it can 't be read or
| interpreted by a human.
|
| We may be saying the same thing here, I'm not quite sure
| if you're saying the model must be available or if what
| is missing is the code to train your own model.
| the8thbit wrote:
| I'm not the person you replied directly to so I can't
| speak for them, but I did start this thread, and I just
| wanted to clarify what I meant in my OP, because I see a
| lot of people misinterpreting what I meant.
|
| I did _not_ mean that LLM training data needs to be
| released for the model to be open source. It would be a
| good thing if creators of models did release their
| training data, and I wouldn 't even be opposed to
| regulation which encourages or even requires that
| training data be released when models meet certain
| specifications. I don't even think the bar needs to be
| high there- We could require or encourage smaller
| creators to release their training data too and the
| result would be a net positive when it comes to public
| understanding of ML models, control over outputs, safety,
| and probably even capabilities.
|
| Sure, its possible that training data is being used
| illegally, but I don't think the solution to that is to
| just have everyone hide that and treat it as an open
| secret. We should either change the law, or apply it
| equally.
|
| But that being said, I don't think it has anything to do
| with whether the model is "open source". Training data
| simply isn't source code.
|
| I also don't mean that the license that these models are
| released under is too restrictive to be open source.
| Though that is _also_ true, and if these models had
| source code, that would also prevent them from being open
| source. (Rather, they would be "source available"
| models)
|
| What I mean is "The trained model is functionally
| obfuscated code, it can't be read or interpreted by a
| human." As you point out, it is definitionally impossible
| for any contemporary LLM to be considered open source.
| (Except for maybe some very, very small research models?)
| There's no source code (yet) so there is no source to
| open.
|
| I think it is okay to acknowledge when something is
| technically infeasible, and then proceed to not claim to
| have done that technically infeasible thing. I don't
| think the best response to that situation is to, instead,
| use that as justification for muddying the language to
| such a degree that its no longer useful. And I don't
| think the distinction is trivial or purely semantic.
| Using the language of open source in this way is
| dangerous for two reason.
|
| The first is that it could conceivably make it more
| challenging for copyleft licenses such as the GPL to
| protect the works licensed with them. If the "public" no
| longer treats software with public binaries and without
| public source code as closed source, then who's to say
| you can't fork the linux kernel, release the binary, and
| keep the code behind closed doors? Wouldn't that also be
| open source?
|
| The second is that I think convincing a significant
| portion of the open source community that releasing a
| model's weights is sufficient to open source a model will
| cause the community to put more focus on distributing and
| tuning weights, and less time actually figuring out how
| to construct source code for these models. I suspect that
| solving interpretability and generating something
| resembling source code may be necessary to get these
| models to _actually_ do what we want them to do. As ML
| models become increasingly integrated into our lives and
| production processes, and become increasingly
| sophisticated, the danger created by having models
| optimized towards something other than what we would
| actually like them optimized towards increases.
| achrono wrote:
| > not actually sure if Meta does share all that
|
| Meta shares the code for inference but not for training,
| so even if we say it can be open-source without the
| training data, Meta's models are not open-source.
|
| I can appreciate Zuck's enthusiasm for open-source but
| not his willingness to mislead the larger public about
| how open they actually are.
| swatcoder wrote:
| The open source _movement_ , from which the name derives,
| was about the freedom to make bespoke alterations to the
| software you choose to run. Provided you have reasonably
| widespread proficiency in industry standard tools, you
| can take something that's open source, modify that
| source, and rebuild/redeploy/reinterpret/re-whatever to
| make it behave the way that you want or need it to
| behave.
|
| This is in contrast to a compiled binary or obfuscated
| source image, where alteration may be possible with
| extraordinairy skill and effort but is not expected and
| possibly even specirically discouraged.
|
| In this sense, weights are entirely like those compiler
| binaries or obfuscated sources rather than the source
| code usually associated with "open source"
|
| To be "open source" we would want LLM's where one might
| be able to manipulate the original training data or
| training algorithm to produce a set of weights more
| suited to one's own desires and needs.
|
| Facebook isn't giving us that yet, and very probably
| can't. They're just trading on the weird boundary state
| of the term "open source" -- it still carries prestige
| and garners good will from its original techno-populist
| ideals, but is so diluted by twenty years of naive
| consumers who just take it to mean "I don't have to pay
| to use this" that the prestige and good will is now
| misplaced.
| llm_trw wrote:
| >The open source movement, from which the name derives,
| was about the freedom to make bespoke alterations to the
| software you choose to run.
|
| The open source movement was a cash grab to make the free
| software movement more palatable to big corp by moving
| away from copy left licenses. The MIT license is
| perfectly open source and means that you can buy software
| without ever seeing its code.
| Tepix wrote:
| If you obtain open source licensed software you can pass
| it on legally (and freely). With some licenses you also
| have to provide the source code.
| saurik wrote:
| The thing they are pointing at and which is the thing
| people want is the output of the training engine, not the
| inputs. This is like someone saying they have an open
| source kernel, but they only release a compiler and a
| binary... the kernel code is never released, but the
| kernel is the only reason anyone even wants the compiler.
| (For avoidance of anyone being somehow confused: the
| training code is a compiler which takes training data and
| outputs model weights.)
| _heimdall wrote:
| The output of the training engine, I.E. the model itself,
| isn't source code at all though. The best approximation
| would be considering it obfuscated code, and even then
| it's a stretch since it is more similar to compressed
| data.
|
| It sounds like Meta doesn't share source for the training
| logic. That would be necessary for it to really be open
| source, you need to be able to recreate and modify the
| codebase but that has nothing to do with the training
| data or the trained model.
| saurik wrote:
| I didn't claim the output is source code, any more than
| the kernel is. Are you sure you don't simply agree with
| me?
| croemer wrote:
| But surely you wouldn't call it open source if sentry
| just gave you a binary - and the source code wasn't
| available.
| blackeyeblitzar wrote:
| There is a comment elsewhere claiming there are a few dozen
| fully open source models:
| https://news.ycombinator.com/item?id=41048796
| sigmoid10 wrote:
| >Nobody does open source LLMs.
|
| There are a bunch of independent, fully open source
| foundation models from companies that share everything
| (including all data). AMBER and MAP-NEO for example. But we
| have yet to see one in the 100B+ parameter category.
| stavros wrote:
| Sorry, the tilde before "nobody" is my notation for
| "basically nobody" or "almost nobody". I thought it was
| more common.
| plausibility wrote:
| It is more common when it comes to numbers I guess. There
| are ~5 ancestors in this comment chain, if I would agree
| roughly 4-6 is acceptable.
| politelemon wrote:
| It's the literal (figurative) nobody rather than the
| literal (literal) nobody.
| larodi wrote:
| Indeed, since when the deliverable being a jpeg/exe, which
| is similar to what the model file is, is considered the
| source? it is more like open result or freely available vm
| image, which works, but has its core FS scrambled or
| crypted.
|
| Zuck knows this very well and it does him no honour to
| speak like, and from his position this equals attempt ate
| trying to change the present semantics of open source. Of
| course, others do that too - using the notion of open
| source to describe something very far from open.
|
| What Meta is doing under his command can better be
| desdribed as releasing the resulting...build, so that it
| can be freely poked around and even put to work. But the
| result cannot be effectively reversed engineered.
|
| Whats more ridiculous is that precisely because the result
| is not the source in its whole form, that these graphical
| structures can made available. Only thanks to the fact it
| is not traceable to the source, which makes the whole game
| not only closed, but like... sealed forever. An unfair
| retell of humanity's knowledge tossed around in very
| obscure container that nobody can reverse engineer.
|
| how's that even remotely similar to open source?
| proteal wrote:
| Even if everything was released how you described, what
| good would that really do for an individual without
| access to heaps of compute? Functionally there seems to
| be no difference between open weights and open compute
| because nobody could train a facsimile model.
| Furthermore, all frontier models are inscrutable due to
| their construction. It's wild to me seeing people
| complain semantics when meta dropped their model for
| cheap. Now I'm not saying we should suck the zuck for
| this act of charity, but you have to imagine that other
| frontier models are not thrilled that meta has
| invalidated their compute moats with the release of
| llama. Whether we like it or not, we're on this AI
| rollercoaster and I'm glad that it's not just
| oligopolists dictating the direction forward. I'm happy
| to see meta take this direction, knowing that the
| alternatives are much worse.
| frabcus wrote:
| I'd find knowing what's in the training data hugely
| valuable - can analyse it to understand and predict
| capabilities.
| stavros wrote:
| That's not the discussion. We're talking about what open
| source is, and it's having the weights and the method to
| recreate the model.
|
| If someone gives me an executable that I can run for
| free, and then says "eh why do you want the source, it
| would take you a long time to compile", that doesn't make
| it open source, it just makes it gratis.
| nightski wrote:
| Calling weights an executable is disingenuous and not a
| serious discussion. You can do a lot more with weights
| than you could with a binary executable.
| rizky05 wrote:
| This is debatable, even an executable is valuable
| artifact. You can also do a lot with executable in expert
| hand.
| _flux wrote:
| You can do a lot more with an executable as well than
| just execute it. So maybe the _analogy_ is apt, even if
| not exact.
|
| Actually executables you can reverse engineer it into
| something that could be compiled back into an executable
| with the exact same functionality, which is AFAIK
| impossible to do with "open weights". Still, we don't
| call free executables "open source".
| nine_k wrote:
| Linux is open source and is mostly C code. You cannot run C
| code directly, you have to compile it and produce binaries.
| But it's the C code, not binary form, where the
| collaboration happens.
|
| With LLMs, weights are the binary code: it's how you run
| the model. But to be able to train the model from scratch,
| or to collaborate on new approaches, you have to operate at
| a the level of architecture, methods, and training data
| sets. They are the source code.
| verdverm wrote:
| Analogies are always going to fall short. With LLM
| weights, you can modify them (quant, fine-tuning) to get
| something different, which is not something you do with
| compiled binaries. There are ample areas for
| collaboration even without being able to reproduce from
| scratch, which takes $X Millions of dollars, also
| something that a typical binary does not have as a
| feature.
| piperswe wrote:
| You can absolutely modify compiled binaries to get
| something different. That's how lots of video game
| modding and ROM hacks work.
| krisoft wrote:
| And we would absolutely do it more often if compiling
| would cost as much as training of an LLM costs now.
| verdverm wrote:
| I considered adding "normally" to the binary
| modifications expecting a response like this. The
| concepts are still worlds apart
|
| Weights aren't really a binary in the same sense that a
| compiler produces, they lack instructions and are more
| just a bunch of floating point values. Nor can you run
| model weights without separate code to interpret them
| correctly. In this sense, they are more like a JPEG or 3d
| model
| llm_trw wrote:
| This is bending the definition to the other extreme.
|
| Linux doesn't ship you the compiler you need to build the
| binaries either, that doesn't mean it's closed source.
|
| LLMs are fundamentally different to software and using
| terms from software just muddies the waters.
| saurik wrote:
| Then what is the "source"? If we are to use the term
| "source" then what does that mean here, as distinct from
| it merely being free?
| llm_trw wrote:
| It means nothing because LLMs aren't software.
| Phelinofist wrote:
| Do they not run on a computer?
| llm_trw wrote:
| So does a video. Is a video open source if you're given
| the permissions to edit it? To distribute it? Given the
| files to generate it? What if the files can only be open
| in a proprietary program?
|
| Videos aren't software and neither are llms.
| saurik wrote:
| If a video doesn't have source code, then it can't be
| open source. Likewise, if you feel that an LLM doesn't
| have source code because of some property of what it is
| -- as you claim it isn't software and somehow that means
| that it abstractly removes it from consideration for this
| concept (an idea I think is ridiculous, FWIW: an LLM is
| clearly software that runs in a particularly interesting
| virtual machine defined by the model architecture) --
| then; somewhat trivially, it also can't be open source.
| It is, as the person you are responding to says, at best
| "open weights".
|
| If a video somehow _does_ have source code which can
| "generate it", then the question of what it means for the
| source code to the video to be open even if the only
| program which can read it and generate the video is
| closed source is equivalent to asking if a program
| written in Visual Basic can ever be open source given
| that the Visual Basic compiler is closed source.
| Personally, I can see arguments either way on this issue,
| though _most people_ seem to agree that the program is
| still open source in such a situation.
|
| However, we need not care too much about the answer to
| that specific conundrum, as the moral equivalent of both
| the compiler and the runtime virtual machine are almost
| always open source. What is then important is much
| easier: if you don't provide the source code to the
| project, even if the compiler is open source and even if
| it runs on an open source machine, clearly the project --
| whatever it is that we might try to be discussing,
| including video files -- cannot be open source. The idea
| that a video can be open source when what you mean is the
| video is unencrypted and redistributanle but was merely
| intended to be played in an open source video player is
| absurd.
| dns_snek wrote:
| > Is a video open source if you're given the permissions
| to edit it? To distribute it? Given the files to generate
| it?
|
| If you're given the source material and project files to
| continue editing where the original editors finished, and
| you're granted the rights to re-distribute - Yes, that
| would be open source[1].
|
| Much like we have "open source hardware" where the
| "source" consists of original schematics, PCB layouts,
| BOM, etc. [2]
|
| [1] https://en.wikipedia.org/wiki/Open-source_film
|
| [2] https://en.wikipedia.org/wiki/Open-source_hardware
| the8thbit wrote:
| Videos and images are software. They are compiled
| binaries with very domain specific instructions executed
| in a very non-turing complete context. They are generally
| not released as open source, and in many cases the source
| code (the file used to edit the video or image) is lost.
| They are not seen, colloquially, as software, but that
| does not mean that they are not software.
|
| If a video lacks a specification file (the source code)
| which can be used by a human reader to modify specific
| features in the video, then it is software that is simply
| incapable of being open sourced.
| TeMPOraL wrote:
| And LLMs don't ship with a Python distribution.
|
| Linux sources :: dataset that goes into training
|
| Linux sources' build confs and scripts :: training code +
| hyperparameters
|
| GCC :: Python + PyTorch or whatever they use in training
|
| Compiled Linux kernel binary :: model weights
| llm_trw wrote:
| Just because you keep saying it doesn't make it true.
|
| LLMs are not software any more than photographs are.
| the8thbit wrote:
| "LLMs are fundamentally different to software and using
| terms from software just muddies the waters."
|
| They're still software, they just don't have source code
| (yet).
| mattnewton wrote:
| There are plenty of open source LLMs, they just aren't at
| the top of the leaderboards yet. Here's a recent example, I
| think from Apple: https://huggingface.co/apple/DCLM-7B
|
| Using open data and dclm:
| https://github.com/mlfoundations/dclm
| Aeolun wrote:
| I suspect that even if you allowed people to take the data,
| nobody but a FAANG like organisation could even store it?
| jlokier wrote:
| My impression is the training data for foundation models
| isn't that large. It won't fit on your laptop drive, but
| it will fit comfortably in a few racks of high-density
| SSDs.
| jijji wrote:
| yeah, according to the article [0] about the release of
| Llama 3.1 405B, it was trained on 15 trillion tokens
| using 16000 Nvidia H100's to do it. Even if they did
| release the training data, I don't think many people
| would have the number of gpus required to actually do any
| real training to create the model....
|
| [0] https://ai.meta.com/blog/meta-llama-3-1/
| WithinReason wrote:
| If weights are not the source, then if they gave you the
| training data and scripts but not the weights, would that
| be "open source"?
| guappa wrote:
| Yes, but they won't do that. Possibly because extensive
| copyright violation in the training data that they're not
| legally allowed to share.
| sharpshadow wrote:
| If somebody would leak the training data and they would
| deny that it's real ergo not getting sued and the data
| would be available.
|
| Edit typo.
| guappa wrote:
| It's not available if you can't use it because you don't
| have as many lawyers as facebook and can't ignore laws so
| easily.
| ab5tract wrote:
| If you can't share the dataset, under what twisted reality
| are you fine to share the derivative models based on those
| unsharable datasets?
|
| In a better world, there would be no "I ran some algos on it
| and now it's mine" defense.
| guitarlimeo wrote:
| Yeah was gonna say exactly the same thing. Weird how the
| legislation allows releasing LLMs trained on data that is
| not allowed to be shared otherwise.
| floydnoel wrote:
| Meta might possibly have a license to use (some of) that
| data, but not a license to distribute it. Legislation has
| little to do with it, I imagine.
| yangcheng wrote:
| latest llama 3.1 is in a different repo,
| https://github.com/meta-llama/llama-
| models/blob/main/models/... , but yes, the code is shared. It
| astonishing that in software 2.0 era, powerful applications
| like llama has only hundreds of lines of code, and most work
| hidden in training data. Source code alone is no longer that
| informative as Software 1.0
| twelvechairs wrote:
| Open training data would be great too.
|
| If you have open data and open source code you can reproduce
| the weights
| blharr wrote:
| Not easily for these large scale models, but theoretically
| maybe
| ajxlasA wrote:
| Really? I have to check out the training code again. Last
| time I looked the training and inference code were just
| example toys that were barely usable.
|
| Has that changed?
| danielrhodes wrote:
| For models of this size, the code used to train them is going
| to be very custom to the architecture/cluster they are built
| on. It would be almost useless to anybody outside of Meta.
| The dataset would be more a lot more interesting, as it would
| at the very least show everybody how they got it to behave in
| certain ways.
| mkolodny wrote:
| Llama's code is open source: https://github.com/meta-
| llama/llama3/blob/main/llama/model.p...
| apsec112 wrote:
| That's not the _training_ code, just the inference code. The
| training code, running on thousands of high-end H100 servers,
| is surely much more complex. They also don 't open-source the
| dataset, or the code they used for data
| scraping/filtering/etc.
| the8thbit wrote:
| "just the inference code"
|
| It's not the "inference code", its the code that specifies
| the architecture of the model and loads the model. The
| "inference code" is mostly the model, and the model is not
| legible to a human reader.
|
| Maybe someday open source models will be possible, but we
| will need much better interpretability tools so we can
| generate the source code from the model. In most software
| projects you write the source as a specification that is
| then used by the computer to implement the software, but in
| this case the process is reversed.
| blackeyeblitzar wrote:
| That is just the inference code. Not training code or
| evaluation code or whatever pre/post processing they do.
| patrickaljord wrote:
| Is there an LLM with actual open source training code and
| dataset? Besides BLOOM
| https://huggingface.co/bigscience/bloom
| osanseviero wrote:
| Yes, there are a few dozen full open source models
| (license, code, data, models)
| blackeyeblitzar wrote:
| What are some of the other ones? I am aware mainly of
| OLMo (https://blog.allenai.org/olmo-open-language-
| model-87ccfc95f5...)
| navinsylvester wrote:
| Here you go - https://github.com/apple/corenet
| mesebrec wrote:
| This is like saying any python program is open source because
| the python runtime is open source.
|
| Inference code is the runtime; the code that runs the model.
| Not the model itself.
| mkolodny wrote:
| I disagree. The file I linked to, model.py, contains the
| Llama 3 model itself.
|
| You can use that model with open data to train it from
| scratch yourself. Or you can load Meta's open weights and
| have a working LLM.
| causal wrote:
| Yeah a lot of people here seem to not understand that
| PyTorch really does make model definitions that simple,
| and that has everything you need to resume back-
| propagation. Not to mention PyTorch itself being open-
| sourced by Meta.
|
| That said the LLama-license doesn't meet strict
| definitions of OS, and I bet they have internal tooling
| for datacenter-scale training that's not represented
| here.
| yjftsjthsd-h wrote:
| > The file I linked to, model.py, contains the Llama 3
| model itself.
|
| That makes it source available (
| https://en.wikipedia.org/wiki/Source-available_software
| ), not open source
| macrolime wrote:
| Source available means you can see the source, but not
| modify it. This is kinda the opposite, you can modify the
| model, but you don't see all the details of its creation.
| yjftsjthsd-h wrote:
| > Source available means you can see the source, but not
| modify it.
|
| No, it doesn't mean that. To quote the page I linked,
| emphasis mine,
|
| > Source-available software is software released through
| a source code distribution model that includes
| arrangements where the source can be viewed, _and in some
| cases modified_ , but without necessarily meeting the
| criteria to be called open-source. The licenses
| associated with the offerings range from allowing code to
| be viewed for reference to allowing code to be modified
| and redistributed for both commercial and non-commercial
| purposes.
|
| > This is kinda the opposite, you can modify the model,
| but you don't see all the details of its creation.
|
| Per https://github.com/meta-
| llama/llama3/blob/main/LICENSE there's also a laundry
| list of ways you're not allowed to use it, including
| restrictions on commercial use. So not Open Source.
| Flimm wrote:
| No, it's not. The Llama 3 Community License Agreement is not
| an open source license. Open source licenses need to meet the
| criteria of the only widely accepted definition of "open
| source", and that's the one formulated by the OSI [0]. This
| license has multiple restrictions on use and distribution
| which make it not open source. I know Facebook keeps calling
| this stuff open source, maybe in order to get all the good
| will that open source branding gets you, but that doesn't
| make it true. It's like a company calling their candy vegan
| while listing one its ingredients as pork-based gelatin. No
| matter how many times the company advertises that their
| product is vegan, it's not, because it doesn't meet the
| definition of vegan.
|
| [0] - https://opensource.org/osd
| CamperBob2 wrote:
| _Open source licenses need to meet the criteria of the only
| widely accepted definition of "open source", and that's the
| one formulated by the OSI [0]_
|
| Who died and made OSI God?
| vbarrielle wrote:
| The OSI was created about 20 years ago and defined and
| popularized the term open source. Their definition has
| been widely accepted over that period.
|
| Recently, companies are trying to market things as open
| source when in reality, they fail to adhere to the
| definition.
|
| I think we should not let these companies change the
| meaning of the term, which means it's important to
| explain every time they try to seem more open than they
| are.
|
| I'm afraid the battle is being lost though.
| Suppafly wrote:
| >The OSI was created about 20 years ago and defined and
| popularized the term open source. Their definition has
| been widely accepted over that period.
|
| It was defined and accepted by the community well before
| OSI came around though.
| MaxBarraclough wrote:
| This isn't helpful. The community defers to the OSI's
| definition because it captures what they care about.
|
| We've seen people try to deceptively describe non-OSS
| projects as open source, and no doubt we will continue to
| see it. Thankfully the community (including Hacker News)
| is quick to call it out, and to insist on not cheapening
| the term.
|
| This is one the topics that just keeps turning up:
|
| * https://news.ycombinator.com/item?id=24483168
|
| * https://news.ycombinator.com/item?id=31203209
|
| * https://news.ycombinator.com/item?id=36591820
| CamperBob2 wrote:
| _This isn 't helpful. The community..._
|
| Speak for yourself, please. The term is much older than
| 1998, with one easily-Googled example being
| https://www.cia.gov/readingroom/docs/DOC_0000639879.pdf ,
| and an explicit case of IT-related usage being
| https://i.imgur.com/Nw4is6s.png from https://www.google.c
| om/books/edition/InfoWarCon/09X3Ove9uKgC... .
|
| Unless a registered trademark is involved (spoiler: it's
| not) no one, whether part of a so-called "community" or
| not, has any authority to gatekeep or dictate the terms
| under which a generic phrase like "open source" can be
| used.
| Flimm wrote:
| Neither of those usages relate to IT, they both are about
| sources of intelligence (espionage). Even if they were,
| the OSI definition won, nobody is using the definitions
| from 1995 CIA or the 1996 InfoConWar book in the realm of
| IT, not even Facebook.
|
| The community has the authority to complain about
| companies mis-labelling their pork products as vegan,
| even if nobody has a registered trademark on the term
| vegan. Would you tell people to shut up about that case
| because they don't have a registered trademark? Likewise,
| the community has authority to complain about
| Meta/Facebook mis-labelling code as open source even when
| they put restrictions on usage. It's not gate-keeping or
| dictatorship to complain about being misled or being lied
| to.
| CamperBob2 wrote:
| _Would you tell people to shut up about that case because
| they don 't have a registered trademark?_
|
| I especially like how _I 'm_ the one telling people to
| "shut up" all of a sudden.
|
| As for the rest, see my other reply.
| Flimm wrote:
| You're right, I and those who agree with me were the
| first to ask people to "shut up", in this case, to ask
| Meta to stop misusing the term open source. And I was the
| first to say "shut up", and I know that can be
| inflammatory and disrespectful, so I shouldn't have used
| it. I'm sorry. We're here in a discussion forum, I want
| you to express your opinion even it is to complain about
| my complaints. For what it's worth, your counter-
| arguments have been stronger and better referenced than
| any other I have read (for the case of accepting a looser
| definition of the term open source in the realm of IT).
| CamperBob2 wrote:
| All good, and I also apologize if my objection came
| across as disrespectful.
|
| This whole 'Open Source' thing is a bigger pet peeve than
| it should be, because I've received criticism for using
| the term on a page where I literally just posted a .zip
| file full of source code. The smart thing to do would
| have been to ignore and forget the criticism, which I
| will now work harder at doing.
|
| In the case of a pork producer who labels their products
| as 'vegan', that's different because there _is_ some
| authority behind the usage of 'vegan'. It's a standard
| English-language word that according to Merriam-Webster
| goes back to 1944. So that would amount to an open-and-
| shut case of false advertising, which I don't think
| applies here at all.
| MaxBarraclough wrote:
| > In the case of a pork producer who labels their
| products as 'vegan', that's different because there is
| some authority behind the usage of 'vegan'.
|
| I don't see the difference. _Open source software_ is a
| term of art with a specific meaning accepted by its
| community. When people misuse the term, invariably in
| such a way as to broaden it to include whatever it is
| they 're pushing, it's right that the community responds
| harshly.
| CamperBob2 wrote:
| Terms of art do not require licenses. A given term is
| either an ordinary dictionary word that everyone
| including the courts will readily recognize ("Vegan"), a
| trademark ("Microsoft(r) Office 365(tm)"), or a fragment
| of language that everyone can feel free to use for their
| own purposes without asking permission. "Open Source"
| falls into the latter category.
|
| This kind of argument is literally why trademark law
| exists. OSI did not elect to go down that path. Maybe
| they should have, but I respect their decision not to,
| and perhaps you should, too.
| 8note wrote:
| Isn't the MIT license the generally accepted "open source"
| license? It's a community owned term, not OSI owned
| henryfjordan wrote:
| There are more licenses than just MIT that are "open
| source". GPL, BSD, MIT, Apache, some of the Creative
| Commons licenses, etc. MIT has become the defacto default
| though
|
| https://opensource.org/license (linking to OSI for the
| list because it's convenient, not because they get to
| decide)
| yjftsjthsd-h wrote:
| MIT is _a_ permissive open source license, not _the_ open
| source license.
| NiloCK wrote:
| These discussions (ie, everything that follows here) would
| be much easier if the crowd insisting on the OSI definition
| of open source would capitalize Open Source.
|
| In English, proper nouns are capitalized.
|
| "Open" and "source" are both very normal English words.
| English speakers have "the right" to use them according to
| their own perspective and with personal context. It's the
| difference between referring to a blue tooth, and
| Bluetooth, or to an apple store or an Apple store.
| stale2002 wrote:
| Ok call it Open Weights then if the dictionary definitions
| matter so much to you.
|
| The actual point that matters is that these models are
| available for most people to use for a lot of stuff, and this
| is way way better than what competitors like OpenAI offer.
| the8thbit wrote:
| They don't "[allow] developers to modify its code however
| they want", which is a critical component of "open source",
| and one that Meta is clearly trying to leverage in branding
| around its products. I would like _them_ to start calling
| these "public weight models", because what they're doing now
| is muddying the waters so much that "open source" now just
| means providing an enormous binary and an open source harness
| to run it in, rather than serving access to the same binary
| via an API.
| Voloskaya wrote:
| Feels a bit like you are splitting hair for the pleasure of
| semantic arguments to be honest. Yes there are no source in
| ML, so if we want to be pedantic it shouldn't be called
| open source. But what really matters in the open source
| movement is that we are able to take a program built by
| someone and modify it to do whatever we want with it,
| without having to ask someone for permission or get
| scrutinized or have to pay someone.
|
| The same applies here, you can take those models and modify
| them to do whatever you want (provided you know how to
| train ML models), without having to ask for permission, get
| scrutinized or pay someone.
|
| I personally think using the term open source is fine, as
| it conveys the intent correctly, even if, yes, weights are
| not sources you can read with your eyes.
| wrs wrote:
| Calling that "open source" renders the word "source"
| meaningless. By your definition, I can release a binary
| executable freely and call it "open source" because you
| can modify it to do whatever you want.
|
| Model weights are like a binary that _nobody_ has the
| source for. We need another term.
| Voloskaya wrote:
| No it's not the same as releasing a binary, feels like we
| can't get out of the pedantics. I can in theory modify a
| binary to do whatever I want. In practice it is
| intractably hard to make any significant modification to
| a binary, and even if you could, you would then not be
| legally allowed to e.g. redistribute.
|
| Here, modifying that model is not harder that doing
| regular ML, and I can redistribute.
|
| Meta doesn't have access to some magic higher level
| abstraction for that model that would make working with
| it easier that they did not release.
|
| The sources in ML are the architecture the training and
| inference code and a paper describing the training
| procedure. It's all there.
| the8thbit wrote:
| "In practice it is intractably hard to make any
| significant modification to a binary, and even if you
| could, you would then not be legally allowed to e.g.
| redistribute."
|
| It depends on the binary and the license the binary is
| released under. If the binary is released to the public
| domain, for example, you are free to make whatever
| modifications you wish. And there are plenty of licenses
| like this, that allow closed source software to be used
| as the user wishes. That doesn't make it open source.
|
| Likewise, there are plenty of closed source projects
| who's binaries we can poke and prod with much higher
| understanding of what our changes are actually doing than
| we're able to get when we poke and prod LLMs. If you want
| to make a Pokemon Red/Blue or Minecraft mod you have a
| lot of tools at your disposal.
|
| A project that only exists as a binary which the
| copyright holder has relinquished rights to, or has
| released under some similar permissive closed source
| license, but people have poked around enough to figure
| out how to modify certain parts of the binary with some
| degree of predictability is a more apt analogy.
| Especially if the original author has lost the source
| code, as there is no source code the speak of when
| discussing these models.
|
| I would not call that binary "open source", because the
| source would, in fact, not be open.
| wrs wrote:
| Can you change the tokenizer? No, because all you have is
| the weights trained with the current tokenizer.
| Therefore, by any normal definition, you don't have the
| source. You have a giant black box of numbers with no
| ability to reproduce it.
| Voloskaya wrote:
| > Can you change the tokenizer?
|
| Yes.
|
| You can change it however you like, then look at the
| paper [1] under section 3.2. to know which
| hyperparameters were used during training and finetune
| the model to work with your new tokenizer using e.g.
| FineWeb [2] dataset.
|
| You'll need to do only a fraction of the training you
| would have needed to do if you were to start a training
| from scratch for your tokenizer of choice. The weights
| released by Meta give you a massive head start and cost
| saving.
|
| The fact that it's not trivial to do and out of reach of
| most consumer is not a matter of openness. That's just
| how ML is today.
|
| [1]: https://scontent-
| sjc3-1.xx.fbcdn.net/v/t39.2365-6/452387774_...
|
| [2]:
| https://huggingface.co/datasets/HuggingFaceFW/fineweb
| wrs wrote:
| You can change the tokenizer and build _another_ model,
| if you can come up with your own version of the rest of
| the source (e.g., the training set, RLHF, etc.). You
| can't change the tokenizer for _this_ model, because you
| don't have all of its source.
| slavik81 wrote:
| > The same applies here, you can take those models and
| modify them to do whatever you want without having to ask
| for permission, get scrutinized or pay someone.
|
| The "Additional Commercial Terms" section of the license
| includes restrictions that would not meet the OSI
| definition of open source. You must ask for permission if
| you have too many users.
| bornfreddy wrote:
| "Public weight models" sounds about right, thanks for
| coming up with a good term! Hope it catches.
| stale2002 wrote:
| My central point is this:
|
| "are available for most people to use for a lot of stuff,
| and this is way way better than what competitors like
| OpenAI offer."
|
| I presume you agree with it.
|
| > rather than serving access
|
| Its not the same access though.
|
| I am sure that you are creative enough to think of many
| questions that you could ask llama3, that would instead get
| you kicked off of OpenAI.
|
| > They don't "[allow] developers to modify its code however
| they want"
|
| Actually, the fact that the model weights are available
| means that you can even ignore any limitations that you
| think are on it, and you'll probably just get away with it.
| You are also ignoring the fact that the limitations are
| minimal to most people.
|
| Thats a huge deal!
|
| And it is dishonest to compare a situation where
| limitations are both minimal and almost unenforceable
| (Except against maybe Google) to a situation where its
| physically not possible to get access to the model weights
| to do what you want with them.
| the8thbit wrote:
| > Actually, the fact that the model weights are available
| means that you can even ignore any limitations that you
| think are on it, and you'll probably just get away with
| it. You are also ignoring the fact that the limitations
| are minimal to most people.
|
| The limitations here are technical, not legal. (Though I
| am aware of the legal restrictions as well, and I think
| its worth noting that no _other_ project would get by
| calling themselves open source while imposing a
| restriction which prevents competitors from using the
| system to build their competing systems.) There isn 't
| any source code to read and modify. Yes, you can fine
| tune a model just like you can modify a binary but this
| isn't _source code_. Source code is a human readable
| specification that a computer can use to transform into
| executable code. This allows the human to directly modify
| functionality in the specification. We simply don 't have
| that, and it will not be possible unless we make a lot of
| strides in interpretability research.
|
| > Its not the same access though.
|
| > I am sure that you are creative enough to think of many
| questions that you could ask llama3, that would instead
| get you kicked off of OpenAI.
|
| I'm not saying that systems that are provided as SaaS
| don't tend to be more restrictive in terms of what they
| let you do through the API they expose vs what is
| possible if you run the same system locally. That may not
| always be true, but sure, as a general rule it is. I
| mean, it can't be _less_ restrictive. However, that doesn
| 't mean that being able to run code on your own machine
| makes the code open source. I wouldn't consider Windows
| open source, for example. Why? Because they haven't
| released the source code for Windows. Likewise, I
| wouldn't consider these models open source because their
| creators haven't released source code for them. Being
| technically infeasible to do doesn't mean that the
| definition changes such that its no longer technically
| infeasible. It is simply infeasible, and if we want to
| change that, we need to do work in interpretability, not
| pretend like the problem is already solved.
| stale2002 wrote:
| So then yes you agree with this:
|
| "are available for most people to use for a lot of stuff,
| and this is way way better than what competitors like
| OpenAI offer." And that this is very significant.
| input_sh wrote:
| Open Source Initiative (kind of a de-facto authority on what's
| open source and what not) is spending a whole lot of time
| figuring out what it means for an AI system to be open source.
| In other words, they're basically trying to come up with a new
| license because the existing ones can't easily apply.
|
| I believe this is the current draft:
| https://opensource.org/deepdive/drafts/the-open-source-ai-de...
| downWidOutaFite wrote:
| OSI made themselves the authority because they hated Richard
| Stallman and his Free Software movement. It's just marketing.
| gowld wrote:
| RMS has no interest in governing Open Source, so your
| comment bears no particular relevance.
|
| RMS is an advocate for Free Software. Free Software
| generally implies Open Source, but not the converse.
|
| RMS considers openness of source to be a separate category
| from the freeness of software. "Free software is a
| political movement; open source is a development model."
|
| https://www.gnu.org/licenses/license-list.en.html
| ab5tract wrote:
| Are you really pretending that OSI and the open source
| label itself wasn't a reactionary movement that vilified
| free software principles in hopes of gaining corporate
| traction?
|
| Most of us who were there remember it differently. True
| open source advocates will find little to refute in what
| I've said.
| cheema33 wrote:
| > True open source advocates will find little to refute
| in what I've said.
|
| No true Scotsman
| https://en.wikipedia.org/wiki/No_true_Scotsman
|
| OSI helped popularize the open source movement. They not
| only make it palatable to businesses, but got them
| excited about it. I think that FSF/Stallman alone would
| not have been very successful on this front with
| GPL/AGPL.
| ab5tract wrote:
| Like I said, honest open source advocates won't take
| issue to how I framed their position.
|
| Here's a more important point: how far would the open
| source people have gotten without GCC and glibc?
|
| Much less far than they will ever admit, in my
| experience.
| miffy900 wrote:
| > Most of us who were there remember it differently. True
| open source advocates will find little to refute in what
| I've said.
|
| > Like I said, honest open source advocates won't take
| issue to how I framed their position.
|
| Yet you've failed to provide even a single point of
| evidence to back up your claim.
|
| > "honest open source advocates"
|
| You've literally just made this term up. It's
| meaningless.
| halostatue wrote:
| For some advocates, sure. I was there, too -- although at
| the beginning of my career and not deeply involved in
| most licensing discussions until the founding of Mozilla
| (where I argued _against_ the GNU GPL and was generally
| pleased with the result of the MPL). However, from ~1990,
| I remember sharing some code where I "more or less" made
| my code public domain but recommended people consider the
| GNU GPL as part of the README (I don't have the source
| code available, so I don't recall).
|
| Your characterization is quit easily refutable, because
| at the time that OSI was founded, there was _already_ an
| explosion of possible licenses and RMS and other
| GNUnatics were making lots of noise about GNU /Linux and
| trying to be as maximalist as possible while presenting
| any choice _other_ than the GNU GPL as "against
| freedom".
|
| This _certainly_ would not have held well with people who
| were using the MIT Licence or BSD licences (created
| around the same time as the GNU GPL v1), who believed
| (and continue to believe) that there were options _other_
| than a restrictive viral licence++. Yes, some of the
| people involved vilified the "free software principles",
| but there were also GNU "advocates" who were making RMS
| look tame with their wording (I recall someone telling me
| to enjoy "software slavery" because I preferred licences
| other than the GNU GPL).
|
| The "Free Software" advocates were pretending that the
| goals of their licence were the only goals that should
| matter for all authors and consumers of software. That is
| not and never has been the case, so it is unsurprising
| that there was a bit of reaction to such extremism.
|
| OSI and the open source label _were_ a move to make
| things easier for corporations to accept and understand
| by providing (a) a clear unifying definition, and (b) a
| set of licences and guidelines for knowing what licenses
| did what and the risks and obligations they presented to
| people who used software under those licences.
|
| ++ Don't @ me on this, because both the virality and
| restrictiveness are features of the GNU GPL. If it
| weren't for the nonsense in the preamble, it would be a
| _good_ licence. As it is, it is an _effective_ if
| rampantly misrepresented licence.
| dogleash wrote:
| Didn't the Open Source Definition start as the DFSG? You
| telling me Debian hates the Free Software movement? Unless
| you define "hating Free Software" as "not banning the BSD
| license", then I'll have to disagree.
| Zambyte wrote:
| > If so, then how can current ML models be open source?
|
| The source of a language model is the text it was trained on.
| Llama models are not open source (contrary to their claims),
| they are open weight.
| thayne wrote:
| I think it would also include the code used to train it
| pphysch wrote:
| That would be more analogous to the build toolchain than
| the source code, but yes
| tshaddox wrote:
| Surely traditional "open source" also needs some notion
| of a reproducible build toolchain, otherwise the source
| code itself is approximately useless.
|
| Imagine if the source code was in a programming language
| of which the basic syntax and semantics were known to no
| one but the original developers.
|
| Or more realistically, I think it's a major problem if an
| open source project can only be built by an esoteric
| process that only the original developers have access to.
| pphysch wrote:
| Source code in a vacuum is still valuable as a way to
| deal with missing/inaccurate documentation and diagnose
| faults and their causes.
|
| Raw training datasets similarly has some value as you can
| analyze it for different characteristics to understand
| why the trained model is under/over-representing
| different concepts.
|
| But yes real FOSS should be "open-build" and allow anyone
| to build a test-passing artifact from raw source
| material.
| moffkalast wrote:
| You can find the entire Llama 3.0 pretraining set here:
| https://huggingface.co/datasets/HuggingFaceFW/fineweb
|
| 15T tokens, 45 terrabytes. Seems fairly open source to me.
| Zambyte wrote:
| Where has Facebook linked that? I can't find anywhere that
| they actually published that.
| moffkalast wrote:
| Yeah I don't think I've seen it linked officially, but
| Meta does this sort of semi-official stuff all the time,
| leaking models ahead of time for PR, they even have a
| dedicated Reddit account for releasing unofficial info.
|
| Regardless, it fits the compute used and the claim that
| they trained from public web data, and was suspiciously
| published by HF staff shortly after L3 released. It's
| about as official as the Mistral 7B v0.2 base model. I.e.
| mostly, but not entirely, probably for some weird legal
| reasons.
| nickpsecurity wrote:
| Many companies stopped publishing their data sets after
| people published evidence they were mass, copyright
| infringement. They dropped the specifics of pretraining
| data from the model cards.
|
| Aside from licensing content, that content creators don't
| like redistribution means a lawful model would probably
| only use Gutenberg's collection and permissive code.
| Anything else, including Wikipedia, usually has licensing
| requirements they might violate.
| verdverm wrote:
| Says it is ~94TB, with >130k downloads, implying more than
| 12 exabytes of copying, seems a bit off, wonder how they
| are calculating downloads
| root_axis wrote:
| No. The text is an asset used by the source to train the
| model. The source can process arbitrary text. Text is just
| text, it was written for communication purposes, software
| (defined by source code) processes that text in a particular
| way to train a model.
| Zambyte wrote:
| In programming, "source" and "asset" have specific meanings
| that conflict with how you used them.
|
| Source is the input to some built artifact. It is the
| _source_ of that artifact. As in: where the artifact comes
| from. Textual input is absolutely the source of the ML
| model. What you are using "source" as is analogous to the
| source of the compiler in traditional programming.
|
| Asset is an artifact used as input, that is revered
| verbatim by the output. For example, a logo baked into an
| application to be rendered in the UI. The compilation of
| the program doesn't make a new logo, it just moves the
| asset into the built artifact.
| Zambyte wrote:
| I hadn't had my morning coffee yet when I wrote this and
| I have no idea what I meant instead of "revered", but you
| get the idea :D
| gorgoiler wrote:
| One counterpoint is that major publications (eg New York Times)
| would have you believe that AI is a mildly lossy compression
| algorithm capable of reconstructing the original source
| material.
| actinium226 wrote:
| It's not?
| _flux wrote:
| I believe it is able to reconstruct parts of the original
| source material--if the interrogator already knows the
| original source material to prompt the model appropriately.
| halflings wrote:
| Training code is only useful to people in academia, and the
| closest thing to "code you can modify" are open weights.
|
| People are framing this as if it was an open-source hierarchy,
| with "actual" open-source requiring all training code to be
| shared. This is not obvious to me, as I'm not asking people
| that share open-source libraries to also share the tools they
| used to develop them. I'm also not asking them to share all the
| design documents/architecture discussion behind this software.
| It's sufficient that I can take the end result and reshape it
| in any way I desire.
|
| This is coming from an LLM practitioner that finetunes models
| for a living; and this constant debate about open-source vs
| open-weights seems like a huge distraction vs the impact open-
| sourcing something like Llama has... this is truly a Linux-like
| moment. (at a much smaller scale of course, for now at least)
| kemiller wrote:
| I dunno -- if an open source project required, say, a
| proprietary compiler, that would diminish its open source-
| ness. But I agree it's not totally comparable, since the
| weights are not particularly analogous to machine code. We
| probably need a new term. Open Weights.
| 0-_-0 wrote:
| There are many "compilers", you can download The Pile
| yourself.
| nothrowaways wrote:
| Weight is the new code.
| nomel wrote:
| I think saying it's the new binary is closer to the truth.
| You can't reproduce it, but you can use it. In this new
| version, you can even nudge it a bit to do something a
| _little_ different.
|
| New stuff, so probably not good to force old words, with
| known meanings, onto new stuff.
| GreenWatermelon wrote:
| The model is more akin to a python script than a compiled C
| binary. This is how I see it:
|
| Training Code and dataset are analogous to the developer
| who wrote the script
|
| Model and weights are end product that is then released
|
| Inference Code is the runtime that could execute the code.
| That would be e.g. PyTorch, which can import the weights
| and run inference.
| nomel wrote:
| > The model is more akin to a python script than a
| compiled C binary.
|
| No, I completely disagree. Python is near pseudo-text
| source. Source exists for the specific purpose of being
| easily and _completely_ understood, by humans, because it
| 's for and from humans. You can turn a python calculator
| into a web server, because it can be split and separated
| at any point, because it can be _completely understood_
| at any point, and it 's _deterministic at every point_.
|
| A model _cannot be understood_ by a human. It isn 't
| meant to be. It's meant to be used, very close to as is.
| You can't fundamentally change the model, or dissect it,
| you can only nudge it in a direction, with the force of
| that nudge being proportional to the money you can burn,
| along with hope that it turns out how you want.
|
| That's why I say it's closer to a binary: more of a black
| box you can use. You can't easily make a binary do
| something fundamentally different without changing the
| source. You can't easily see into that black box, or even
| know what it will do without trying. You can only nudge
| it to act a little differently, or use it as part of a
| workflow. (decompilation tools aside ;))
| GuB-42 wrote:
| I like the term "open weights". Open source would be the
| dataset and code that generates these weights.
|
| There is still a lot you can do with weights, like fine tuning,
| and it is arguably more useful as retraining the entire model
| would cost millions in compute.
| szundi wrote:
| Open source = reproducible binaries (weights) by you on your
| computer, IMO.
|
| Strategy of FB is that they are good to be a user only and fine
| ruining competitor's business with good enough free
| alternatives while collecting awards as saviors of whatever.
| ric2b wrote:
| If that were the definition then any software you can install
| on your computer would be open source. It makes open source
| lose nearly all meaning.
|
| Just say "open weights", not "open source".
| rmbyrro wrote:
| If you think about LLMs as a new kind of programming runtime,
| the matrices are the source.
| beloch wrote:
| It's no secret that implementing AI usually involves _far_ more
| investment into training and teaching than actual code. You can
| know how a neural net or other ML model works. You can have all
| the code before you. It 's still a _huge_ job (and investment)
| to do anything practical with that. If Meta shares the code
| their AI runs on with you, you 're not going to be able to do
| much with it unless you make the same investment in gathering
| data and teaching to train that AI. That would probably require
| data Meta _won 't_ share. You'd effectively need your own
| Facebook.
|
| If everyone open sources their AI code, Meta can snatch the
| bits that help them without much fear of helping their direct
| competitors.
| the8thbit wrote:
| I think you're misunderstanding what I'm saying. I don't
| think its technically feasible for current models to be open
| source, because there is no source code to open. Yes, there
| is a harness that runs the model, but the vast, vast amount
| of instructions are contained in the model weights, which are
| akin to a compiled binary.
|
| If we make large strides in interpretability we may have
| something resembling source code, but we're certainly not
| there yet. I don't think the solution to that problem should
| be to change the definition of open source and pretend the
| problem has been solved.
| seoulmetro wrote:
| Unfortunately open source really just means an open API these
| days. The API is heavily intertwined with closed source.
| langcss wrote:
| Coming up with the words and concepts to describe the models is
| a challenge.
|
| Does the training data require permission from the copyright
| holder to use? Are the weights really open source or more like
| compiled assembly?
| shdjkKA wrote:
| Of course you are right, I'd put it less carefully: The quoted
| Linux line is deceptive marketing.
|
| - If we start with the closed training set, that is closed and
| stolen, so call it Stolen Source.
|
| - What is distributed is a bunch of float arrays. The Llama
| architecture is published, but not the training or inference
| code. Without code there is no open source. You can as well
| call a compiler book open source, because it tells you how to
| build a compiler.
|
| Pure marketing, but predictably many people follow their
| corporate overlords and eagerly adopt the co-opted terms.
|
| Reminder again that FB is not releasing this out of altruism,
| but because they have an existing profitable business model
| that does not depend on generated chats. They probably do use
| it internally for tracking and building profiles, but that is
| the same as using Linux internally, so they release the weights
| to destroy the competition.
|
| Isn't price dumping an anti trust issue?
| bjornsing wrote:
| The term "source code" can mean many things. In a legal context
| it's often just defined as the preferred format for
| modification. It can be argued that for artificial neural
| networks that's the weights (along with code and preferably
| training data).
| kashyapc wrote:
| I agree; there's a lot of muddiness in the term "open source
| AI". Earlier this year there was a talk[1] at FOSDEM, titled _"
| Moving a step closer to defining Open Source AI"_. It is from
| someone at the Open Source Initiative. The video and slides are
| available in the link below[1]. From the abstract:
|
| _" Finding an agreement on what constitutes Open Source AI is
| the most important challenge facing the free software (also
| known as open source) movement. European regulation already
| started referring to "free and open source AI", large economic
| actors like Meta are calling their systems "open source"
| despite the fact that their license contain restrictions on
| fields-of-use (among other things) and the landscape is
| evolving so quickly that if we don't keep up, we'll be
| irrelevant."_
|
| [1]
| https://fosdem.org/2024/schedule/event/fosdem-2024-2805-movi...
| defining-open-source-ai/
| rbits wrote:
| You release all the technology and the training data.
| Everything that was used to create the model, including
| instructions.
|
| I'm not sure if facebook has done that
| Oras wrote:
| This is obviously good news, but __personally__ I feel the open-
| source models are just trying to catch up with whoever the market
| leader is, based on some benchmarks.
|
| The actual problem is running these models. Very few companies
| can afford the hardware to run these models privately. If you run
| them in the cloud, then I don't see any potential financial gain
| for any company to fine-tune these huge models just to catch up
| with OpenAI or Anthropic, when you can probably get a much better
| deal by fine-tuning the closed-source models.
|
| Also this point:
|
| > We need to protect our data. Many organizations handle
| sensitive data that they need to secure and can't send to closed
| models over cloud APIs.
|
| First, it's ironic that Meta is talking about privacy. Second,
| most companies will run these models in the cloud anyway. You can
| run OpenAI via Azure Enterprise and Anthropic on AWS Bedrock.
| simonw wrote:
| "Very few companies can afford the hardware to run these models
| privately."
|
| I can run Llama 3 70B on my (64GB RAM M2) laptop. I haven't
| tried 3.1 yet but I expect to be able to run that 70B model
| too.
|
| As for the 405B model, the Llama 3.1 announcement says:
|
| > To support large-scale production inference for a model at
| the scale of the 405B, we quantized our models from 16-bit
| (BF16) to 8-bit (FP8) numerics, effectively lowering the
| compute requirements needed and allowing the model to run
| within a single server node.
| InDubioProRubio wrote:
| CrowdStrike just added "Centralized Company Controlled Software
| Ecosystem" to every risk data sheet on the planet. Everything
| futureproof is self-hosted and open source.
| mesebrec wrote:
| Note that Meta's models are not open source in any interpretation
| of the term.
|
| * You can't use them for any purpose. For example, the license
| prohibits using these models to train other models. * You can't
| meaningfully modify them given there is almost no information
| available about the training data, how they were trained, or how
| the training data was processed.
|
| As such, the model itself is not available under an open source
| license and the AI does not comply with the "open source AI"
| definition by OSI.
|
| It's an utter disgrace for Meta to write such a blogpost patting
| themselves on the back while lying about how open these models
| are.
| ChadNauseam wrote:
| > you can't meaningfully modify them given there is almost no
| information available about the training data, how they were
| trained, or how the training data was processed.
|
| I was under the impression that you could still fine-tune the
| models or apply your own RLHF on top of them. My understanding
| is that the training data would mostly be useful for training
| the model yourself from scratch (possibly after modifying the
| training data), which would be extremely expensive and out of
| reach for most people
| mesebrec wrote:
| Indeed, fine-tuning is still possible, but you can only go so
| far with fine-tuning before you need to completely retrain
| the model.
|
| This is why Silo AI, for example, had to start from scratch
| to get better support for small European languages.
| chasd00 wrote:
| From what i understand the training data and careful curation
| of it is the hard part. Everyone wants training data sets to
| train their own models instead of producing their own.
| causal wrote:
| You are definitely allowed to train other models with these
| models, you just have to give credit in the name, per the
| license:
|
| > If you use the Llama Materials or any outputs or results of
| the Llama Materials to create, train, fine tune, or otherwise
| improve an AI model, which is distributed or made available,
| you shall also include "Llama" at the beginning of any such AI
| model name.
| mesebrec wrote:
| Indeed, this is something they changed in the 3.1 version of
| the license.
|
| Regardless, the license [1] still has many restrictions, such
| as the acceptable use policy [2].
|
| [1] https://huggingface.co/meta-llama/Meta-
| Llama-3.1-8B/blob/mai...
|
| [2] https://llama.meta.com/llama3_1/use-policy
| tw04 wrote:
| >In the early days of high-performance computing, the major tech
| companies of the day each invested heavily in developing their
| own closed source versions of Unix.
|
| Because they sold the resultant code and systems built on it for
| money... this is the gold miner saying that all shovels and jeans
| should be free.
|
| Am I happy Facebook open sources some of their code? Sure, I
| think it's good for everyone. Do I think they're talking out of
| both sides of their mouth? Absolutely.
|
| Let me know when Facebook opens up the entirety of their Ad and
| Tracking platforms and we can start talking about how it's silly
| for companies to keep software closed.
|
| I can say with 100% confidence if Facebook were selling their AI
| advances instead of selling the output it produces, they wouldn't
| be advocating for everyone else to open source their stacks.
| JumpCrisscross wrote:
| > _if Facebook were selling their AI advances instead of
| selling the output it produces, they wouldn 't be advocating
| for everyone else to open source their stack_
|
| You're acting as if commoditizing one's complements is either
| new or reprehensible [1].
|
| [1] https://gwern.net/complement
| tw04 wrote:
| >You're acting as if commoditizing one's complements is
| either new or reprehensible [1].
|
| I'm acting as if calling on other companies to open source
| their core product, just because it's a complement for you,
| and acting as if it's for the benefit of mankind is
| disingenuous, which it is.
| stale2002 wrote:
| > as if it's for the benefit of mankind
|
| But it does benefit mankind.
|
| More free tech products is good for the world.
|
| This is a good thing. When people or companies do good
| things, they should get the credit for doing good things.
| JumpCrisscross wrote:
| > _acting as if it 's for the benefit of mankind is
| disingenuous, which it is_
|
| Is it bad for mankind that Meta publishes its weights?
| Mutually beneficial is a valid game state--there is no
| moral law that requires anything good be made as a
| sacrifice.
| rvnx wrote:
| The source-code to Ad tracking platform is useless to users.
|
| At the end, it's actually Facebook doing the right thing
| (though they are known for being evil).
|
| It's a bit of an irony.
|
| The supposedly "good" and "open" people like Google or OpenAI,
| haven't given their model weights.
|
| A bit like Microsoft became the company that actually supports
| the whole open-source ecosystem with GitHub.
| tw04 wrote:
| >The source-code to Ad tracking platform is useless to users.
|
| It's absolutely not useless for developers looking to build a
| competing project.
|
| >The supposedly "good" and "open" people like Google or
| OpenAI, haven't given their model weights.
|
| Because they're monetizing it... the only reason Facebook is
| giving it away is because it's a complement to their core
| product of selling ads. If they were monetizing it, it would
| be closed source. Just like their Ads platform...
| abetusk wrote:
| Another case of "open-washing". Llama is not available open
| source, under the common definition of open source, as the
| license doesn't allow for commercial re-use by default [0].
|
| They provide their model, with weights and code, as "source
| available" and it looks like they allow for commercial use until
| a 700M monthly subscriber cap is surpassed. They also don't allow
| you to train other AI models with their model:
|
| """ ... v. You will not use the Llama Materials or any output or
| results of the Llama Materials to improve any other large
| language model (excluding Meta Llama 3 or derivative works
| thereof). ... """
|
| [0] https://github.com/meta-llama/llama3/blob/main/LICENSE
| sillysaurusx wrote:
| They cannot legally enforce this, because they don't have the
| rights to the content they trained it on. Whoever's willing to
| fund that court battle would likely win.
|
| There's a legal precedent that says hard work alone isn't
| enough to guarantee copyright, i.e. it doesn't matter that it
| took millions of dollars to train.
| whimsicalism wrote:
| i think these clauses are unenforceable. it's telling that OAI
| hasn't tried a similar suit despite multiple extremely well-
| known cases of competitors training on OAI outputs
| nuz wrote:
| Everyone complaining about not having data access: Remember that
| without meta you would have openai and anthropic and that's it.
| I'm really thankful they're releasing this, and the reason they
| can't release the data is obvious.
| mesebrec wrote:
| Without Meta, you would still have Mistral, Silo AI, and the
| many other companies and labs producing much more open models
| with similar performance.
| Invictus0 wrote:
| The irony of this letter being written by Mark Zuckerburg at
| Meta, while OpenAI continues to be anything but open, is richer
| than anyone could have imagined.
| 1024core wrote:
| "open source AI" ... "open" ... "open" ....
|
| And you can't even try it without an FB/IG account.
|
| Zuck will never change.
| causal wrote:
| I think you can use an HF account as well
| https://huggingface.co/meta-llama
| Gracana wrote:
| You can also wait a bit for someone to upload quantized
| variants, finetunes, etc, and download those. FWIW I'm not
| making a claim about the legality of that, just saying it's
| an easy way around needing to sign the agreement.
| CamperBob2 wrote:
| It doesn't require an account. You do have to fill in your name
| and email (and birthdate, although it seems to accept whatever
| you feed it.)
| mvkel wrote:
| It's a real shame that we're still calling Llama "open source"
| when at best it's "open weights."
|
| Not that anyone would go buy 100,000 H100s to train their own
| Llama, but words matter. Definitions matter.
| sidcool wrote:
| Honest question. As far as LLMs are concerned, isn't open
| weights same as open source?
| mesebrec wrote:
| Open source requires, at the very least, that you can use it
| for any purpose. This is not the case with Llama.
|
| The Llama license has a lot of restrictions, based on user
| base size, type of use, etc.
|
| For example you're not allowed to use Llama to train or
| improve other models.
|
| But it goes much further than that. The government of India
| can't use Llama because they're too large. Sex workers are
| not allowed to use Llama due to the acceptable use policy of
| the license. Then there is also the vague language
| probibiting discrimination, racism etc.. good luck getting
| something like that approved by your legal team.
| aloe_falsa wrote:
| GPL defines the "source code" of a work as the preferred form
| of the work for making modifications to it. If Meta released
| a petabyte of raw training data, would that really be easier
| to extend and adapt (as opposed to fine-tuning the weights)?
| paulhilbert wrote:
| No, I would argue that from the three main ingredients -
| training data, model source code and weights - weights are
| the furthest away from something akin to source code.
|
| They're more like obfuscated binaries. When it comes to fine-
| tuning only however things shift a little bit, yes.
| sidcool wrote:
| I don't expect them to release the data used to train the
| models. But I agree that the code is an important
| ingredient of 'open'.
| frabcus wrote:
| Must include the code that curates the data
| blackeyeblitzar wrote:
| No open weights are the output of a proprietary and secretive
| process of training. It's like sharing a pre compiled
| application instead of what you need to reproduce the
| compiled application.
|
| AI2's OLMo is an example of what open source actually looks
| like for LLMs:
|
| https://blog.allenai.org/hello-olmo-a-truly-open-
| llm-43f7e73...
| lolinder wrote:
| Source versus weights seems like a really pedantic distinction
| to make. As you say, the training code and training data would
| be worthless to anyone who doesn't have compute on the level
| that Meta does. Arguably, the weights are source code
| interpreted by an inference engine, and realistically it's the
| weights that someone is going to want to modify through fine-
| tuning, not the original training code and data.
|
| The far more important distinction is "open" versus "not open",
| and I disagree that we should cede that distinction while
| trying to fight for "source". The Llama license is restrictive
| in a number of ways (it incorporates an entire acceptable use
| policy) that make it most definitely not "open" in the
| customary sense.
| mvkel wrote:
| > training code and training data would be worthless to
| anyone who doesn't have compute on the level that Meta does
|
| I don't fully agree.
|
| Isn't that like saying *nix being open source is worthless
| unless you're planning to ship your own Linux distro?
|
| Knowing how the sausage is made is important if you're an
| animal rights activist.
| JamesBarney wrote:
| https://llama.meta.com/llama3_1/use-policy/
|
| The acceptable use policy is seems fine. Don't use it to
| break the law, solicit sex, kill people, or lie.
| lolinder wrote:
| It's fine in that I'm happy to use it and don't think I'll
| be breaking the terms anytime soon. It's not fine in that
| one of the primary things that makes open source open is
| that an open source license doesn't restrict groups of
| people or whole fields from usage of the software. The
| policy has a number of such blanket bans on industries,
| which, while reasonable, make the license not truly open.
| mvkel wrote:
| This is like saying "You have the right to privacy. The
| police can tap your phone, but you have nothing to worry
| about as long as you're not breaking the law."
|
| "we're open source, you can use it for anything you can
| imagine. But you can't use it for these specific things."
|
| Then there's the added rub of the source not really being
| source code, but a CSV file.
|
| That's fine. If you want to set that expectation, great!
| But don't call it open source.
| frabcus wrote:
| Meta could change the license of future releases of Llama and
| kill your business built on it.
|
| If the training data was openly available, even if you can't
| afford to res train a new version, a competitor like Amazon
| could do it for you
| lolinder wrote:
| > Meta could change the license of future releases of Llama
| and kill your business built on it.
|
| If you built a business on Llama 3.1, you're not going to
| suddenly go down in flames because you can't upgrade to
| Llama 4.
|
| Even saying you really needed to upgrade, Llama 4 would be
| a new model that you'd have to adapt your prompts for
| anyway, you can't just version bump and call it good. If
| you're going to update prompts anyway, at that point you
| can just switch to any other competitor model. Updating
| models isn't urgent, you have time to do it slowly and
| right.
|
| > If the training data was openly available, even if you
| can't afford to res train a new version, a competitor like
| Amazon could do it for you
|
| If Llama 4 changed the license then presumably you wouldn't
| have access to its training data even if you did have
| access to Llama 3.1's. So now you have access to Llama
| 3.1's training data... now what? You want to recreate the
| Llama 3.1 weights in response to the Llama 4 release?
| rybosworld wrote:
| Huge companies like facebook will often argue for solutions that
| on the surface, seem to be in the public interest.
|
| But I have strong doubts they (or any other company) actually
| believe what they are saying.
|
| Here is the reality:
|
| - Facebook is spending untold billions on GPU hardware.
|
| - Facebook is arguing in favor of open sourcing the models, that
| they spent billions of dollars to generate, for free...?
|
| It follows that companies with much smaller resources (money)
| will not be able to match what Facebook is doing. Seems like an
| attempt to kill off the competition (specifically, smaller
| organizations) before they can take root.
| Salgat wrote:
| The reason for Meta making their model open source is rather
| simple: They receive an unimaginable amount of free labor, and
| their license only excludes their major competitors to ensure
| mass adoption without benefiting their competition (Microsoft,
| Google, Alibaba, etc). Public interest, philanthropy, etc are
| just nice little marketing bonuses as far as they're concerned
| (otherwise they wouldn't be including this licensing
| restriction).
| noiseinvacuum wrote:
| All correct, Meta does obviously benefit.
|
| It's helpful to also look at what do the developers and
| companies (everyone outside of top 5/10 big tech companies)
| get out of this. They get open access to weights of SOTA LLM
| models that take billions of dollars to train and 10s of
| billions a year to run the AI labs that make these. They get
| the freedom to fine tune them, to distill them, and to host
| them on their own hardware in whatever way works best for
| their products and services.
| frabcus wrote:
| Meta haven't made an open source model. They have released a
| binary with a proprietary but relatively liberal license.
| Binaries are not source and their license isn't free.
| mattnewton wrote:
| I actually think this is one of the rare times where the small
| guys interests are aligned with Meta. Meta is scared of a world
| where they are locked out of LLM platforms, one where OpenAI
| gets to dictate rules around their use of the platform much
| like Apple and Google dictates rules around advertiser data and
| monetization on their mobile platforms. Small developers should
| be scared of a world where the only competitive LLMs are owned
| by those players too.
|
| Through this lense, Meta's actions make more sense to me. Why
| invest billions in VR/AR? The answer is simple, don't get
| locked out of the next platform, maybe you can own the next
| one. Why invest in LLMs? Again, don't get locked out. Google
| and OpenAi/Microsoft are far larger and ahead of Meta right now
| and Meta genuinely believes the best way to make sure they have
| an LLM they control is to make everyone else have an LLM they
| can control. That way community efforts are unified around
| their standard.
| mupuff1234 wrote:
| Sure, but don't you think the "not getting locked out" is
| just the pre-requisite for their eventual goal of locking
| everyone else out?
| yesco wrote:
| Does it really matter? Attributing goodwill to a company is
| like attributing goodwill to a spider that happens to clean
| up the bugs in your basement. Sure if they had the ability
| to, I'm confident Meta would try something like that, but
| they obviously don't, and will not for the foreseeable
| future.
|
| I have faith they will continue to do what's in their best
| interests and if their best interests happen to align with
| mine, then I will support that. Just like how I don't
| bother killing the spider in my basement because it helps
| clean up the other bugs.
| mupuff1234 wrote:
| But you also know that the spider has been laying eggs so
| you better have an extermination plan ready.
| whitepaint wrote:
| Everyone is aware of that. No one thinks Facebook or Mark
| are some saint entities. But while the spider is doing
| some good deeds why not just go "yeah! go spider!". Once
| it becomes an asshole, we will kill it. People are not
| dumb.
| mupuff1234 wrote:
| It's not even truly open source, they set a user limit.
| xvector wrote:
| I'm not particularly concerned about the user limit. The
| companies for which those limits will matter are so large
| that they should consider contributing back to humanity
| by developing their own SOTA foundation models.
| noiseinvacuum wrote:
| If by "everyone else" here you mean 3 or 4 large players
| trying to create a regulatory moat around themselves then I
| am fine with them getting locked out and not being able to
| create a moat for next 3 decades.
| myaccountonhn wrote:
| > I actually think this is one of the rare times where the
| small guys interests are aligned with Meta
|
| Small guys are the ones being screwed over by AI companies
| and having their text/art/code stolen without any attribution
| or adherence to license. I don't think Meta is on their side
| at all
| MisterPea wrote:
| That's a separate problem which affects small to large
| players alike (e.g. ScarJo).
|
| Small companies interests are aligned with Meta as they are
| now on an equal footing with large incumbent players. They
| can now compete with a similarly sized team at a big tech
| company instead of that team + dozens of AI scientists
| ketzo wrote:
| Meta is, fundamentally, a user-generated-content distribution
| company.
|
| Meta wants to make sure they commoditize their complements:
| they don't want a world where OpenAI captures all the value of
| content generation, they want the cost of producing the best
| content to be as close to free as possible.
| chasd00 wrote:
| i was thinking along the same. A lot of content generated by
| LLMs is going to end up on Facebook or Instagram. The easier
| it is to create AI generated content the more content ends up
| on those applications.
| Nesco wrote:
| Especially because genAI is a copyright laundering system.
| You can train it on copyrighted material and none of the
| content generated with it are copyright-able, which is
| perfect for social apps
| KaiserPro wrote:
| The model it's self isn't actually that valuable to facebook.
| The thing that's important is the dataset, the infrastructure
| and the people to make the models.
|
| There is still, just about, a strong ethos( especially in the
| research teams) to chuck loads of stuff over the wall into
| opensource. (pytorch, detectron, SAM, aria etc)
|
| but its seen internally as a two part strategy:
|
| 1) strong recruitment tool (come work with us, we've done cool
| things, and you'll be able to write papers)
|
| 2) seeding the research community with a common toolset.
| jorblumesea wrote:
| Cynically I think this position is largely due to how they can
| undercut OpenAI's moat.
| wayeq wrote:
| It's not cynical, it's just an awareness that public companies
| have a fiduciary duty to their shareholders.
| cs702 wrote:
| _> We're releasing Llama 3.1 405B, the first frontier-level open
| source AI model, as well as new and improved Llama 3.1 70B and 8B
| models._
|
| _Bravo!_ While I don 't agree with Zuck's views and actions on
| many fronts, on this occasion I think he and the AI folks at Meta
| deserve our praise and gratitude. With this release, they have
| brought the cost of pretraining a frontier 400B+ parameter model
| to ZERO for pretty much everyone -- well, everyone _except_ Meta
| 's key competitors.[a] THANK YOU ZUCK.
|
| Meanwhile, the business-minded people at Meta surely won't mind
| if the release of these frontier models to the public happens to
| completely mess up the AI plans of competitors like
| OpenAI/Microsoft, Google, Anthropic, etc. Come to think of it,
| the negative impact on such competitors was likely a key
| motivation for releasing the new models.
|
| ---
|
| [a] The license is not open to the handful of companies worldwide
| which have more than 700M users.
| swyx wrote:
| > the AI folks at Meta deserve our praise and gratitude
|
| We interviewed Thomas who led Llama 2 and 3 post training here
| in case you want to hear from someone closer to the ground on
| the models https://www.latent.space/p/llama-3
| throwaway_2494 wrote:
| > We're releasing Llama 3.1 405B
|
| Is it possible to run this with ollama?
| jessechin wrote:
| Sure, if you have a H100 cluster. If you quant it to int4 you
| might get away with using only 4 H100 GPUs!
| sheepscreek wrote:
| Assuming $25k a pop, that's at least $100k in just the GPUs
| alone. Throw in their linking technology (NVLink) and cost
| for the remaining parts, won't be surprised if you're
| looking at $150k for such a cluster. Which is not bad to be
| honest, for something at this scale.
|
| Can anyone share the cost of their pre-built clusters,
| they've recently started selling? (sorry feeling lazy to
| research atm, I might do that later when I have more time).
| rty32 wrote:
| You can rent H100 GPUs.
| tomp wrote:
| you're about right.
|
| https://smicro.eu/nvidia-
| hgx-h100-640gb-935-24287-0001-000-1
|
| 8x H100 HGX cluster for EUR250k + VAT
| vorticalbox wrote:
| If you have the ram for it.
|
| Ollama will offload as many layers as it can to the gpu then
| the rest will run on the cpu/ram.
| tambourine_man wrote:
| Praising is good. Gratitude is a bit much. They got this big by
| selling user generated content and private info to the highest
| bidder. Often through questionable means.
|
| Also, the underdog always touts Open Source and standards, so
| it's good to remain skeptical when/if tables turn.
| sheepscreek wrote:
| All said and done, it is a very _expensive_ and balsy way to
| undercut competitors. They've spent > $5B on hardware alone,
| much of which will depreciate in value quickly.
|
| Pretty sure the only reason Meta's managed to do this is
| because of Zuck's iron grip on the board (majority voting
| rights). This is great for Open Source and regular people
| though!
| wrsh07 wrote:
| Zuck made a bet when they provisioned for reels to buy
| enough GPUs to be able to spin up another reels-sized
| service.
|
| Llama is probably just running on spare capacity (I mean,
| sure, they've kept increasing capex, but if they're worried
| about an llm-based fb competitor they sort of have to in
| order to enact their copycat strategy)
| fractalf wrote:
| Well, he didn't do it to be "nice", you can be sure about
| that. Obviously they see a financial gain
| somewhere/sometime
| tambourine_man wrote:
| At Meta level, spending $5B to stay competitive is not
| balsy. It's a bargain.
| ricardo81 wrote:
| >selling user generated content and private info to the
| highest bidder
|
| Was always their modus operandi, surely. How else would they
| have survived.
|
| Thanks for returning everyone else;s content and never mind
| all the content stealing your platform did.
| jart wrote:
| I'm perfectly happy with them draining the life essence out
| of the people crazy enough to still use Facebook, if they're
| funneling the profits into advancing human progress with AI.
| It's an Alfred Nobel kind of thing to do.
| kataklasm wrote:
| It's not often you see a take this bad on HN. Wow!
|
| You are aware Facebook tracks everyone, not just people
| with Facebook accounts, right? They have a history of being
| anti-consumer in every sense of the word. So while I can
| understand where you're coming from, it's just not anywhere
| close to being reality.
|
| If you want to or not, if you consent or not, Facebook is
| tracking and selling you.
| germinalphrase wrote:
| "Come to think of it, the negative impact on such competitors
| was likely a key motivation for releasing the new models."
|
| "Commoditize Your Complement" is often cited here:
| https://gwern.net/complement
| tintor wrote:
| > they have brought the cost of pretraining a frontier 400B+
| parameter model to ZERO
|
| It is still far from zero.
| cs702 wrote:
| If the model is already pretrained, there's no need to
| pretrain it, so the cost of pretraining is zero.
| moffkalast wrote:
| Yeah but you only have the one model, and so far it seems
| to be only good on paper.
| pwdisswordfishd wrote:
| Makes me wonder why he's really doing this. Zuckerberg being
| Zuckerberg, it can't be out of any genuine sense of altruism.
| Probably just wants to crush all competitors before he
| monetizes the next generation of Meta AI.
| spiralk wrote:
| Its certainly not altruism. Given that Facebook/Meta owns the
| largest user data collection systems, any advancement in AI
| ultimately strengthens their business model (which is still
| mostly collecting private user data, amassing large user
| datasets, and selling targeting ads).
|
| There is a demo video that shows a user wearing a Quest VR
| headset and asks the AI "what do you see" and it interprets
| everything around it. Then, "what goes well with these
| shorts"... You can see where this is going. Wearing headsets
| with AIs monitoring everything the users see and collecting
| even more data is becoming normalized. Imagine the private
| data harvesting capabilities of the internet but anywhere in
| the physical world. People need not even choose to wear a
| Meta headset, simply passing a user with a Meta headset in
| public will be enough to have private data collected. This
| will be the inevitable result of vision models improvements
| integrated into mobile VR/AR headsets.
| goatlover wrote:
| That's very dystopian. It's bad enough having cameras
| everywhere now. I never opted in to being recorded.
| warkdarrior wrote:
| That sounds fantastic. If they make the Meta headset easy
| to wear and somewhat fashionable (closer to eyeglass than
| to a motorcycle helmet), I'd take it everywhere and record
| everything. Give me a retrospective search and
| conferences/meetings will be so much easier (I am terrible
| with names).
| meroes wrote:
| I wouldn't even say hi alone my name to someone wearing a
| Meta headset out in public. And if facial recognition
| becomes that common for wearers, most of the population
| is going to adorn something to prevent that. And if it's
| at work, I'm not working there and I have to think many
| would agree. Coworkers don't and wouldn't tolerate
| coworkers taking videos or pictures of them.
| sebastiennight wrote:
| This is not how the overwhelming majority of the world
| works though.
|
| > if facial recognition becomes that common for wearers,
| most of the population is going to adorn something to
| prevent that
|
| "Most of the population" is going to be "the wearers".
|
| > Coworkers don't and wouldn't tolerate coworkers taking
| videos or pictures of them.
|
| Here is a fun experience you can try: just hit "record"
| on every single Teams or Meet meeting you're ever on (or
| just set recording as the default setting in the app).
|
| See how many coworkers comment on it, let alone protest.
|
| I can tell you from experience (of having been in
| thousands of hours of recorded meetings in the last 3
| years) that the answer is zero.
| spiralk wrote:
| You are probably right, but that is truly a cyberpunk
| dystopian situation. A few megacorps will catalog every
| human interaction and there will be no way to opt out.
| xvector wrote:
| I'd gladly wear a headset like that! I think you
| dramatically overestimate the number of people that would
| actually care about any theoretical privacy infringement
| here.
|
| > And if facial recognition becomes that common for
| wearers, most of the population is going to adorn
| something to prevent that.
|
| In my opinion, you do not have an accurate view of how
| much the average person cares about this. London is the
| most surveilled city on the planet with widespread
| CCTV/facial recognition, as is Washington D.C. and China.
| But literally no one bothers with anti-surveillance
| measures.
|
| > I'm not working there and I have to think many would
| agree. Coworkers don't and wouldn't tolerate coworkers
| taking videos or pictures of them.
|
| This is a very antiquated view IMO. You are already being
| filmed and monitored at work. I see no issue with a local
| LLM interpreting my environment, or even a privacy-aware
| secure LLM deployment like Apple's Private Cloud Compute:
| https://security.apple.com/blog/private-cloud-compute/
| troupo wrote:
| > I think you dramatically overestimate the number of
| people that would actually care about any theoretical
| privacy infringement
|
| Not really surprised that you don't see it as a problem
|
| > This is a very antiquated view IMO. You are already
| being filmed and monitored at work.
|
| Not really surprised that you don't see it as a problem
| talldayo wrote:
| > a privacy-aware secure LLM
|
| Funniest thing I've heard all month.
| talldayo wrote:
| Of course, no Hacker News thread is complete without the
| "I would _never_ shake hands with an Android user " guy
| who just _has_ to virtue signal.
|
| > And if facial recognition becomes that common for
| wearers, most of the population is going to adorn
| something to prevent that
|
| My brother in Christ, you sincerely underestimate how
| much "most of the population" gives a shit. Most people
| are being tracked by Google Maps or FindMy, are
| triangulated with cell towers that know their exact
| coordinates, and willingly use social media that profiles
| them individually. The population doesn't even try in the
| slightest to resist any of it.
| phyrex wrote:
| You can always listen to the investor calls for the
| capitalist point of view. In short, attracting talent,
| building the ecosystem, and making it really easy for users
| to make stuff they want to share on Meta's social networks
| bun_at_work wrote:
| I really think the value of this for Meta is content
| generation. More open models (especially state of the art)
| means more content is being generated, and more content is
| being shared on Meta platforms, so there is more advertising
| revenue for Meta.
| chasd00 wrote:
| All the content generated by llms (good or bad) is going to
| end up back in Facebook/Instagram and other social media
| sites. This enables Meta to show growth and therefore demand
| a higher stock price. So it makes sense to get content
| generation tools out there as widely as possible.
| zmmmmm wrote:
| He's not even pretending it's altruism. Literally about 1/3
| of the entire post is the section titled "Why Open Source AI
| Is Good for Meta". I find it really weird that there are
| whole debates in threads here about whether it's altruistic
| when Zuckerberg isn't making that claim in the first place.
| cageface wrote:
| He addresses this pretty clearly in the post. They don't want
| to be beholden to other companies to build the products they
| want to build. Their experience being under Apple's thumb on
| mobile strongly shaped this point of view.
| GreenWatermelon wrote:
| Zuckerberg didn't really say anything about altruism. The
| point he was making is an explicit "I believe open models are
| best for our business"
|
| He was clear in that one of their motivations is avoiding
| vendor lockin. He doesn't want Meta to be under the control
| of their competitors or other AI providers.
|
| He also recognizes the value brought to his company by open
| sourcing products. Just look at React, PyTorch, and GraphQL.
| All industry standards, and all brought tremendous value to
| Facebook.
| troupo wrote:
| There's nothing open source about it.
|
| It's a proprietary dump of data you can't replicate or verify.
|
| What were the sources? What datasets it was trained on? What
| are the training parameters? And so on and so on
| advael wrote:
| Look, absolutely zero people in the world should trust any tech
| company when they say they care about or will keep commitments
| to the open-source ecosystem in any capacity. Nevertheless, it
| is occasionally strategic for them to do so, and there can be
| ancillary benefits for said ecosystem in those moments where
| this is the best play for them to harm their competitors
|
| For now, Meta seems to release Llama models in ways that don't
| significantly lock people into their infrastructure. If that
| ever stops being the case, you should fork rather than trust
| their judgment. I say this knowing full well that most of the
| internet is on AWS or GCP, most brick and mortar businesses use
| Windows, and carrying a proprietary smartphone is essentially
| required to participate in many aspects of the modern economy.
| All of this is a mistake. You can't resist all lock-in. The
| players involved effectively run the world. You should still
| try where you can, and we should still be happy when tech
| companies either slip up or make the momentary strategic
| decision to make this easier
| ori_b wrote:
| > _If that ever stops being the case, you should fork rather
| than trust their judgment._
|
| Fork what? The secret sauce is in the training data and
| infrastructure. I don't think either of those is currently
| open.
| quasse wrote:
| I'm just a lowly outsider to the AI space, but calling
| these open source models seems kind of like calling a
| compiled binary open source.
|
| If you don't have a way to replicate what they did to
| create the model, it seems more like freeware than open
| source.
| advael wrote:
| As an ML researcher, I agree. Meta doesn't include
| adequate information to replicate the models, and from
| the perspective of fundamental research, the interest
| that big tech companies have taken in this field has been
| a significant impediment to independent researchers,
| despite the fact that they are undeniably producing
| groundbreaking results in many respects, due to this
| fundamental lack of openness
|
| This should also make everyone very skeptical of any
| claim they are making, from benchmark results to the
| legalities involved in their training process to the
| prospect of future progress on these models. Without
| being able to vet their results against the same datasets
| they're using, there is no way to verify what they're
| saying, and the credulity that otherwise smart people
| have been exhibiting in this space has been baffling to
| me
|
| As a developer, if you have a working Llama model,
| including the source code and weights, and it's crucial
| for something you're building or have already built, it's
| still fundamentally a good thing that Meta isn't gating
| it behind an API and if they went away tomorrow, you
| could still use, self-host, retrain, and study the models
| warkdarrior wrote:
| The model is public, so you can at least verify their
| benchmark claims.
| advael wrote:
| Generally speaking, no. An important part of a lot of
| benchmarks in ML research is generalization. What this
| means is that it's often a lot easier to get a machine
| learning model to memorize the test cases in a benchmark
| than it is to train it to perform a general capability
| the benchmark is trying to test for. For that reason, the
| dataset is important, as if it includes the benchmark
| test cases in some way, it invalidates the test
|
| When AI research was still mostly academic, I'm sure a
| lot of people still cheated, but there was somewhat less
| incentive to, and norms like publishing datasets made it
| easier to verify claims made in research papers. In a
| world where people don't, and there's significant
| financial incentive to lie, I just kind of assume they're
| lying
| Nuzzerino wrote:
| Which option would be better?
|
| A) Release the data, and if it ends up causing a privacy
| scandal, at least you can actually call it open this
| time.
|
| B) Neuter the dataset, and the model
|
| All I ever see in these threads is a lot of whining and
| no viable alternative solutions (I'm fine with the idea
| of it being a hard problem, but when I see this attitude
| from "researchers" it makes me less optimistic about the
| future)
|
| > and the credulity that otherwise smart people have been
| exhibiting in this space has been baffling to me
|
| Remove the "otherwise" and you're halfway to
| understanding your error.
| wanderingbort wrote:
| > Release the data, and if it ends up causing a privacy
| scandal...
|
| We can't prove that a model like llama will never produce
| a segment of its training data set verbatim.
|
| Any potential privacy scandal is already in motion.
|
| My cynical assumption is that Meta knows that competitors
| like OpenAI have PR-bombs in their trained model and
| therefore would never opensource the weights.
| advael wrote:
| This isn't a dilemma at all. If Facebook can't release
| data it trains on because it would compromise user
| privacy, it is already a significant privacy violation
| that should be a scandal, and if it would prompt some
| regulatory or legislative remedies against Facebook for
| them to release the data, it should do the same for
| releasing the trained model, even through an API. The
| only reason people don't think about it this way is that
| public awareness of how these technologies work isn't
| pervasive enough for the general public to think it
| through, and it's hard to prove definitively. Basically,
| if this is Facebook's position, it's saying that the
| release of the model already constitutes a violation of
| user privacy, but they're betting no one will catch them
|
| If the company wants to help research, it should full-
| throatedly endorse the position that it doesn't consider
| it a violation of privacy to train on the data it does,
| and release it so that it can be useful for research. If
| the company thinks it's safeguarding user privacy, it
| shouldn't be training models on data it considers private
| and then using them in public-facing ways at all
|
| As it stands, Facebook seems to take the position that it
| wants to help the development of software built on models
| like Llama, but not really the fundamental research that
| goes into building those models in the same way
| xvector wrote:
| > If Facebook can't release data it trains on because it
| would compromise user privacy, it is already a
| significant privacy violation that should be a scandal
|
| Thousands of entities would scramble to sue Facebook over
| any released dataset _no matter what the privacy
| implications of the dataset are._
|
| It's just not worth it in _any_ world. I believe you are
| not thinking of this problem from the view of the PM or
| VPs that would actually have to approve this: if I were a
| VP and I was 99% confident that the dataset had no
| privacy implications, I still wouldn 't release it. Just
| not worth the inevitable long, drawn out lawsuits from
| people and regulators trying to get their pound of flesh.
|
| I feel the world is too hostile to big tech and AI to
| enable something like this. So, unless we want to kill
| AGI development in the cradle, this is what we get - and
| we can thank modern populist techno-pessimism for
| cultivating this environment.
| troupo wrote:
| Translation: "we train our data on private user data and
| copyrighted material so of course we cannot disclose any
| of our datasets or we'll be sued into oblivion"
|
| There's no AGI development in the cradle. And the world
| isn't "hostile". The world is increasingly tired of
| predatory behavior by supranational corporations
| advael wrote:
| This post demonstrates a willful ignorance of the factors
| driving so-called "populist techno-pessimism" and I'm
| sure every time a member of the public is exposed to
| someone talking like this, their "techno-pessimism" is
| galvanized
|
| The ire people have toward tech companies right now is,
| like most ire, perhaps in places overreaching. But it is
| mostly justified by the real actions of tech companies,
| and facebook has done more to deserve it than most. The
| thought process you just described sounds like an
| accurate prediction of the mindset and culture of a VP
| within Facebook, and I'd like you to reflect on it for a
| sec. Basically, you rightly point out that the org
| releasing what data they have would likely invite
| lawsuits, and then you proceeded to do some kind of
| insane offscreen mental gymnastics that allow this
| reality to mean nothing to you but that the unwashed
| masses irrationally hate the company for some unknowable
| reason
|
| Like you're talking about a company that has spent the
| last decade buying competitors to maintain an insane
| amount of control over billions of users' access to their
| friends, feeding them an increasingly degraded and
| invasive channel of information that also from time to
| time runs nonconsensual social experiments on them, and
| following even people who didn't opt in around the
| internet through shady analytics plugins in order to sell
| dossiers of information on them to whoever will pay. What
| do you think it is? Are people just jealous of their
| success, or might they have some legit grievances that
| may cause them to distrust and maybe even loathe such an
| entity? It is hard for me to believe Facebook has a
| dataset large enough to train a current-gen LLM that
| wouldn't also feel, viscerally, to many, like a privacy
| violation. Whether any party that felt this way could
| actually win a lawsuit is questionable though, as the US
| doesn't really have signficant privacy laws, and this is
| partially due to extensive collaboration with, and
| lobbying by, Facebook and other tech companies who do
| mass-surveillance of this kind
|
| I remember a movie called Das Leben der Anderen (2006)
| (Officially translated as "the lives of others") which
| got accolades for how it could make people who hadn't
| experienced it feel how unsettling the surveillance state
| of East Germany was, and now your average American is
| more comprehensively surveilled than the Stasi could have
| imagined, and this is in large part due to companies like
| facebook
|
| Frankly, I'm not an AGI doomer, but if the capabilities
| of near-future AI systems are even in the vague ballpark
| of the (fairly unfounded) claims the American tech
| monopolies make about them, it would be an unprecedented
| disaster on a global scale if those companies got there
| first, so inasmuch as we view "AGI research" as something
| that's inevitably going to hit milestones in corporate
| labs with secretive datasets, I think we should
| absolutely kill it to whatever degree is possible, and
| that's as someone who truly, deeply believes that AI
| research has been beneficial to humanity and could
| continue to become moreso
| sensanaty wrote:
| > I feel the world is too hostile to big tech
|
| Lmao what? If the world were sane and hostile to big
| tech, we would've nuked them all years ago for all the
| bullshit they pulled and continue to pull. Big tech has
| politicians in their pockets, but thankfully the
| "populist techno-pessimist" (read: normal people who are
| sick of billionaires exploiting the entire planet) are
| finally starting to turn their opinions, albeit slowly.
|
| If we lived in a sane world Cambridge Analytica would've
| been the death knell of Facebook and all of the people
| involved with it. But we instead live in a world where
| psychopathic pieces of shit like Zucc get away with it,
| because they can just buy off any politician who knocks
| on their doors.
| Nuzzerino wrote:
| > it seems more like freeware than open source.
|
| What would you have them do instead? Specifically?
| wongarsu wrote:
| > If you don't have a way to replicate what they did to
| create the model, it seems more like freeware
|
| Isn't that a bit like arguing that a linux kernel driver
| isn't open source if I just give you a bunch of GPL-
| licensed source code that speaks to my device, but no
| documentation how my device works? If you take away the
| source code you have no way to recreate it. But so far
| that never caused anyone to call the code not open-
| source. The closest is the whole GPL3 Tivoization debate
| and that was very divisive.
|
| The heart of the issue is that open source is kind of
| hard to define for anything that isn't software. As a
| proxy we could look at Stallman's free software
| definition. Free software shares a common history with
| open source and in most open source software is
| free/libre, and the other way around, so this might be a
| useful proxy.
|
| So checking the four software freedoms:
|
| - The freedom to run the program as you wish, for any
| purpose: For most purposes. There's that 700M user
| restriction, also Meta forbids breaking the law and
| requires you to follow their acceptable use policy.
|
| - The freedom to study how the program works, and change
| it so it does your computing as you wish: yes. You can
| change it by fine tuning it, and the weights allow you to
| figure out how it works. At least as well as anyone knows
| how any large neural network works, but it's not like
| Meta is keeping something from you here
|
| - The freedom to redistribute copies so you can help your
| neighbor: Allowed, no real asterisks
|
| - The freedom to distribute copies of your modified
| versions to others: Yes
|
| So is it Free Software(tm)? Not really, but it is pretty
| close.
| advael wrote:
| The model is "open-source" for the purpose of software
| engineering, and it's "closed data" for the purpose of AI
| research. These are separate issues and it's not
| necessary to conflate them under one term
| JKCalhoun wrote:
| A good point.
|
| Forgive me, I am AI naive, is there some way to harness
| Llama to train ones own actually-open AI?
| advael wrote:
| Kinda. Since you can self-host the model on a linux
| machine, there's no meaningful way for them to prevent
| you from having the trained weights. You can use this to
| bootstrap other models, or retrain on your own datasets,
| or fine-tune from the starting point of the currently-
| working model. What you can't do is be sure what they
| trained it on
| QuercusMax wrote:
| How open is it _really_ though? If you 're starting from
| their weights, do you actually have legal permission to
| use derived models for commercial purposes? If it turns
| out that Meta used datasets they didn't have licenses to
| use in order to generate the model, then you might be in
| a big heap of mess.
| ein0p wrote:
| I could be wrong but most "model" licenses prohibit the
| use of the models to improve other models
| logicchains wrote:
| They actually did open source the infrastructure library
| they developed. They don't open source the data but they
| describe how they gathered/filtered it.
| ladzoppelin wrote:
| Is forking really possible with an LLM or one the size of
| future Lama versions, have they even released the weights and
| everything? Maybe I am just negative about it because I feel
| Meta is the worst company ever invented and feel this will
| hurt society in the long run just like Facebook.
| lawlessone wrote:
| > have they even released the weights?
|
| Isn't that what the model is? just a collection weights?
| pmarreck wrote:
| When you run `ollama pull llama3.1:70b`, which you can
| literally do right now (assuming ollama.com is installed
| and you're not afraid of the terminal), and it downloads a
| 40 gigabyte model, _that is the weights_!
|
| I'd consider the ability to admit when even your most hated
| adversary is doing something right, a hallmark of acting
| smarter.
|
| Now, they haven't released the training data with the model
| weights. THAT plus the training tooling would be "end to
| end open source". Apple actually did _that very thing_
| recently, and it flew under almost everyone 's radar for
| some reason:
|
| https://x.com/vaishaal/status/1813956553042711006?s=46&t=qW
| a...
| mym1990 wrote:
| Doing something right vs doing something that seems right
| but has a hidden self interest that is harmful in the
| long run can be vastly different things. Often this kind
| of strategy will allow people to let their guard down,
| and those same people will get steamrolled down the road,
| left wondering where it all went wrong. Get smarter.
| pmarreck wrote:
| How in the heck is an open source model that is free and
| open today going to lock me down, down the line? This is
| nonsense. You can literally run this model forever if you
| use NixOS (or never touch your windows, macos or linux
| install again). Zuck can't come back and molest it. Ever.
|
| The best I can tell is that their self-interest here is
| more about gathering mindshare. That's not a terrible
| motive; in fact, that's a pretty decent one. It's not the
| bully pressing you into their ecosystem with a tit-for-
| tat; it's the nerd showing off his latest and going
| "Here. Try it. Join me. Join us."
| mym1990 wrote:
| Yeah because history isn't absolutely littered with
| examples of shiny things being dangled in front of people
| with the intent to entrap them /s.
|
| Can you really say this model will still be useful in 2
| years, 5 years for _you_? And that FB 's stance on these
| models will still be open source at that time once they
| incrementally make improvements? Maybe, maybe not. But FB
| doesn't give anything away for free, and the fact that
| you think so is your blindness, not mine. In case you
| haven't figured it out, this isn't a technology problem,
| this is a "FB needs marketshare and it needs it fast"
| problem.
| pmarreck wrote:
| > But FB doesn't give anything away for free, and the
| fact that you think so is your blindness, not mine
|
| Is it, though? They are literally giving this away "for
| free". https://dev.to/llm_explorer/llama3-license-
| explained-2915 Unless you build a service with it that
| has over 700 million monthly users (read: "problem anyone
| would love to have"), you do not have to re-negotiate a
| license agreement with them. Beyond that, it can't "phone
| home" or do any other sorts of nefarious shite. The other
| limitations there, which you can plainly read, seem not
| very restrictive.
|
| Is there a magic secret clause conspiracy buried within
| the license agreement that you believe will be magically
| pulled out at the worst possible moment? >..<
|
| Sometimes, good things happen. Sorry you're "too blinded"
| by past hurt experience to see that, I guess
| troupo wrote:
| > How in the heck is an open source model that is free
| and open today
|
| Is free, but it's not open source
| holoduke wrote:
| In tech you can trust the underdogs. Once they turn into
| dominant players they turn evil. 99% of the cases.
| sandworm101 wrote:
| >> Bravo! While I don't agree with Zuck's views and actions on
| many fronts, on this occasion I think he and the AI folks at
| Meta deserve our praise and gratitude.
|
| Nope. Not one bit. Supporting F/OSS when it suits you in one
| area and then being totally dismissive of it in _every other
| area_ should not be lauded. How about open sourcing some of FB
| 's VR efforts?
| y04nn wrote:
| Don't be fooled, it is a "embrace extend extinguish" strategy.
| Once they have enough usage and be the default standard they
| will start to find any possible ways to make you pay.
| war321 wrote:
| Hasn't really happened with PyTorch or any of their other
| open sourced releases tbh.
| GreenWatermelon wrote:
| Credits where due: Facebook didn't do that with React or
| PyTorch. Meta will reap benefit for sure, but they don't seem
| to be betting on selling the model itself, rather they will
| benefit from being at the forefront of a new ecosystem.
| tyler-jn wrote:
| So far, it seems like this release has done ~nothing to the
| stock price for GOOGL/MSFT, which we all know has been propped
| up largely on the basis of their AI plans. So it's probably
| premature to say that this has messed it up for them.
| userabchn wrote:
| Interview with Mark Zuckerberg released today:
| https://www.bloomberg.com/news/videos/2024-07-23/mark-zucker...
| starship006 wrote:
| > Our adversaries are great at espionage, stealing models that
| fit on a thumb drive is relatively easy, and most tech companies
| are far from operating in a way that would make this more
| difficult.
|
| Mostly unrelated to the correctness of the article, but this
| feels like a bad argument. AFAIK, Anthropic/OpenAI/Google are not
| having issues with their weights being leaked (are they?). Why is
| it that Meta's model weights are?
| meowface wrote:
| >AFAIK, Anthropic/OpenAI/Google are not having issues with
| their weights being leaked. Why is it that Meta's model weights
| are?
|
| The main threat actors there would be powerful nation-states,
| in which case they'd be unlikely to leak what they've taken.
|
| It is a bad argument though, because one day possession of AI
| models (and associated resources) might confer great and
| dangerous power, and we can't just throw up our hands and say
| "welp, no point trying to protect this, might as well let
| everyone have it". I don't think that'll happen anytime soon,
| but I am personally somewhat in the AI doomer camp.
| whimsicalism wrote:
| We have no way of knowing whether nation-state level actors
| have access to those weights.
| skybrian wrote:
| I think it's hard to say. We simply don't know much from the
| outside. Microsoft has had some pretty bad security lapses, for
| example around guarding access to Windows source code. I don't
| think we've seen a bad security break-in at Google in quite a
| few years? It would surprise me if Anthropic and OpenAI had
| good security since they're pretty new, and fast-growing
| startups have a lot of organizational challenges.
|
| It seems safe to assume that not all the companies doing
| leading-edge LLM's have good security and that the industry as
| a whole isn't set up to keep secrets for long. Things aren't
| locked down to the level of classified research. And it sounds
| like Zuckerberg doesn't want to play the game that way.
|
| At the state level, China has independent AI research efforts
| and they're going to figure it out. It's largely a matter of
| timing, which could matter a lot.
|
| There's still an argument to be made against making
| proliferation too easy. Just because states have powerful
| weapons doesn't mean you want them in the hands of people on
| the street.
| dfadsadsf wrote:
| We have nationals/citizens of every major US adversary working
| in those companies with looser security practice than security
| at local warehouse. Security check before hiring is a joke
| (mostly checks that resume checks out), laptops can be taken
| home and internal communication are not segmented on need to
| know basis. Essentially if China wants weights or source code,
| it will have hundreds of people to choose from who can provide
| it.
| probablybetter wrote:
| I would avoid Facebook and Meta products in general. I do NOT
| trust them. We have approx. 20 years of their record to go upon.
| diggan wrote:
| > Today we're taking the next steps towards open source AI
| becoming the industry standard. We're releasing Llama 3.1 405B,
| the first frontier-level open source AI model,
|
| Why do people keep mislabeling this as Open Source? The whole
| point of calling something Open Source is that the "magic sauce"
| of how to build something is publicly available, so I could built
| it myself if I have the means. But without the training data
| publicly available, could I train Llama 3.1 if I had the means?
| No wonder Zuckerberg doesn't start with defining what Open Source
| actually means, as then the blogpost would have lost all meaning
| from the get go.
|
| Just call it "Open Model" or something. As it stands right now,
| the meaning of Open Source is being diluted by all these
| companies pretending to doing one thing, while actually doing
| something else.
|
| I initially got very exciting seeing the title and the domain,
| but hopelessly sad after reading through the article and
| realizing they're still trying to pass their artifacts off as
| Open Source projects.
| valine wrote:
| The codebase to do the training is way less valuable than the
| weights for the vast majority of people. Releasing the training
| code would be nice, but it doesn't really help anyone but
| Meta's direct competitors.
|
| If you want to train on top of Llama there's absolutely nothing
| stopping you. Plenty of open source tools to do parameter
| optimization.
| diggan wrote:
| Not just the training code but the training data as well,
| should be under a permissive license, otherwise you cannot
| call the project itself Open Source, which Facebook does
| here.
|
| > is way less valuable than the weights for the vast majority
| of people
|
| The same is true for most Open Source projects, most people
| use the distributed binaries or other artifacts from the
| projects, and couldn't care less about the code itself. But
| that doesn't warrant us changing the meaning of Open Source
| just because companies feel like it's free PR.
|
| > If you want to train on top of Llama there's absolutely
| nothing stopping you.
|
| Sure, but in order for the intent of Open Source to be true
| for Llama, I should be able to build this project from
| scratch. Say I have a farm of 100 A100's, could I reproduce
| the Llama model from scratch today?
| unshavedyak wrote:
| > Not just the training code but the training data as well,
| should be under a permissive license, otherwise you cannot
| call the project itself Open Source, which Facebook does
| here.
|
| Does FB even have the capability to do that? I'd assume
| there's a bunch of data that's not theirs and they can't
| even release it. Let alone some data that they might not
| want to admit is in the source.
| bornfreddy wrote:
| If not, it is questionable if they should train on such
| data anyway.
|
| Also, that doesn't matter in this discussion - if you are
| unable to release the source under appropriate licence
| (for whatever reason), you should not call it Open
| Source.
| talldayo wrote:
| I will steelman the idea that a tokenizer and weights are
| all you need for the "source" of an LLM. They are
| components that can be modified, redistributed and when put
| together, reproduce the full experience intended.
|
| If we _insist_ upon the release of training data with Open
| models, you might as well kiss the idea of usable Open LLMs
| out the door. Most of the content in training datasets like
| The Pile are not licensed for redistribution in any way
| shape or form. It would jeopardize projects that _do_ use
| transparent training data while not offering anything of
| value to the community compared to the training code.
| Republishing all training data is an absolute trap.
| enriquto wrote:
| > Most of the content in training datasets like The Pile
| are not licensed for redistribution in any way shape or
| form.
|
| But distributing the weights is a "form" of distribution.
| You can recover many items of the dataset (most easily,
| the outliers) by using the weights.
|
| Just because they are codified in a non-readily
| accessible way, does not mean that you are not
| distributing them.
|
| It's scary to think that "training" is becoming a thinly
| veiled way to strip copyright of works.
| talldayo wrote:
| The weights are a transformed, lossy and non-complete
| permutation of the training material. You _cannot_
| recover most of the dataset reliably, which is what stops
| it from being an outright replacement for the work it 's
| trained on.
|
| > does not mean that you are not distributing them.
|
| Except you literally aren't distributing them. It's like
| accusing me of pirating a movie because I sent a
| screenshot or a scene description to my friend.
|
| > It's scary to think that "training" is becoming a
| thinly veiled way to strip copyright of works.
|
| This is the way it's been for years. Google is given Fair
| Use for redistributing incomplete parts of copywritten
| text materials verbatim, since their application is
| transformative: https://en.wikipedia.org/wiki/Authors_Gui
| ld,_Inc._v._Google,....
|
| Or Corellium, who won their case to use copywritten Apple
| code in novel and transformative ways: https://www.forbes
| .com/sites/thomasbrewster/2023/12/14/apple...
|
| Copyright has always been a limited power.
| jncfhnb wrote:
| People don't typically modify distributed binaries.
|
| People do typically modify model weights. They are the
| preferred form to modify model.
|
| Saying "build" llama is just a nonsense comparison to
| traditional compiled software. "Building llama" is more
| akin to taking the raw weights as text and putting them
| into a nice pickle file. Or loading it into an inference
| engine.
|
| Demanding that you have everything needed to recreate the
| weights from scratch is like arguing an application cannot
| be open source unless it also includes the user testing
| history and design documents.
|
| And of course some idiots don't understand what a pickled
| weights file is and claim it's as useless as a distributed
| binary if you want to modify the program just because it is
| technically compiled; not understanding that the point of
| the pickled file is "convenience" and that it unpacks back
| to the original form. Like arguing open source software
| can't be distributed in zip files.
|
| > Say I have a farm of 100 A100's, could I reproduce the
| Llama model from scratch today?
|
| Say you have a piece of paper. Can you reproduce
| `print("hello world")` from scratch?
| vngzs wrote:
| Agreed. The Linux kernel source contains everything you need to
| produce Linux kernel binaries. The llama source does not
| contain what you need to produce llama models. Facebook is
| using sleight of hand to garner favor with open model weights.
|
| Open model weights are still commendable, but it's a far cry
| from open-source (or even _libre_ ) software!
| elromulous wrote:
| 100%. With this licensing model, meta gets to reap the benefits
| of open source (people contributing, social cachet), without
| any of the real detriment (exposing secret sauce).
| hbn wrote:
| Is that even something they keep on hand? Or would WANT to keep
| on hand? I figured they're basically sending a crawler to go
| nuts reading things and discard the data once they've trained
| on it.
|
| If that included, e.g. reading all of Github for code, I
| wouldn't expect them to host an entire separate read-only copy
| of Github because they trained on it and say "this is part of
| our open source model"
| jdminhbg wrote:
| > Why do people keep mislabeling this as Open Source? The whole
| point of calling something Open Source is that the "magic
| sauce" of how to build something is publicly available, so I
| could built it myself if I have the means. But without the
| training data publicly available, could I train Llama 3.1 if I
| had the means?
|
| I don't think not releasing the commit history of a project
| makes it not Open Source, this seems like that to me. What's
| important is you can download it, run it, modify it, and re-
| release it. Being able to see how the sausage was made would be
| interesting, but I don't think Meta have to show their training
| data any more than they are obligated to release their planning
| meeting notes for React development.
|
| Edit: I think the restrictions in the license itself are good
| cause for saying it shouldn't be called Open Source, fwiw.
| thenoblesunfish wrote:
| You don't need to have the commit history to see "how it
| works". ML that works well does so in huge part due to the
| training data used. The leading models today aren't
| distinguished by the way they're trained, but what they're
| trained on.
| jdminhbg wrote:
| I agree that you need training data to build AI from
| scratch, much like you need lots of really smart developers
| and a mailing list and servers and stuff to build the Linux
| kernel from scratch. But it's not like having the training
| data and training code will get you the same result, in the
| way something like open data in science is about
| replicating results.
| frabcus wrote:
| Reproducible builds of software binaries are a thing, but
| they aren't routinely done. Likewise training an AI is
| deterministic if you do it the same each time. And slight
| variances lead to similar capability models.
| tempfile wrote:
| For the freedom to change to be effective, a user must be
| given the software in a form they can modify. Can you tweak
| an LLM once it's built? (I genuinely don't know the answer)
| jdminhbg wrote:
| Yes, you can finetune Llama:
| https://llama.meta.com/docs/how-to-guides/fine-tuning/
| diggan wrote:
| > I don't think not releasing the commit history of a project
| makes it not Open Source,
|
| Right, I'm not talking about the commit history, but rather
| that anyone (with means) should be able to produce the final
| artifact themselves, if they want. For weights like this,
| that requires at least the training script + the training
| data. Without that, it's very misleading to call the project
| Open Source, when only the result of the training is
| released.
|
| > What's important is you can download it, run it, modify it,
| and re-release it
|
| But I literally cannot download the project, build it and run
| it myself? I can only use the binaries (weights) provided by
| Meta. No one can modify how the artifact is produced, only
| modify the already produced artifact.
|
| That's like saying that Slack is Open Source because if I
| want to, I could patch the binary with a hex editor and
| add/remove things as I see fit? No one believes Slack should
| be called Open Source for that.
| jdminhbg wrote:
| > Right, I'm not talking about the commit history, but
| rather that anyone (with means) should be able to produce
| the final artifact themselves, if they want. For weights
| like this, that requires at least the training script + the
| training data.
|
| You cannot produce the final artifact with the training
| script + data. Meta also cannot reproduce the current
| weights with the training script + data. You could produce
| some other set of weights that are just about as good, but
| it's not a deterministic process like compiling source
| code.
|
| > That's like saying that Slack is Open Source because if I
| want to, I could patch the binary with a hex editor and
| add/remove things as I see fit? No one believes Slack
| should be called Open Source for that.
|
| This analogy doesn't work because it's not like Meta can
| "patch" Llama any more than you can. They can only finetune
| it like everyone else, or produce an entirely different LLM
| by training from scratch like everyone else.
|
| The right to release your changes is another difference; if
| you patch Slack with a hex editor to do some useful thing,
| you're not allowed to release that changed Slack to others.
|
| If Slack lost their source code, went out of business, and
| released a decompiled version of the built product into the
| public domain, that would in some sense be "open source,"
| even if not as good as something like Linux. LLMs though do
| not have a source code-like representation that is easily
| and deterministically modifiable like that, no matter who
| the owner is or what the license is.
| unraveller wrote:
| Open-weights is not open-source, for sure, but I don't mind it
| being stated as an aspiration goal, the moment it is legally
| possible to publish a source without shooting themselves in the
| foot they should do it.
|
| They could release 50% of their best data but that would only
| stop them from attracting the best talent.
| JeremyNT wrote:
| > _Why do people keep mislabeling this as Open Source?_
|
| I guess this is a rhetorical question, but this is a press
| release from Meta itself. It's just a marketing ploy, of
| course.
| blcknight wrote:
| InstructLab and the Granite Models from IBM seem the closest to
| being open source. Certainly more than whatever FB is doing
| here.
|
| (Disclaimer: I work for an IBM subsidiary but not on any of
| these products)
| hubraumhugo wrote:
| The big winners of this: devs and AI startups
|
| - No more vendor lock-in
|
| - Instead of just wrapping proprietary API endpoints, developers
| can now integrate AI deeply into their products in a very cost-
| effective and performant way
|
| - Price race to the bottom with near-instant LLM responses at
| very low prices are on the horizon
|
| As a founder, it feels like a very exciting time to build a
| startup as your product automatically becomes better, cheaper,
| and more scalable with every major AI advancement. This leads to
| a powerful flywheel effect: https://www.kadoa.com/blog/ai-
| flywheel
| danielmarkbruce wrote:
| It creates the opposite of a flywheel effect for you. It
| creates a leapfrog effect.
| boringg wrote:
| AI might cannabalize a lot of first gen AI businesses.
| jstummbillig wrote:
| What Meta is doing is borderline market distortion. It's
| not that they have figured out some magic sauce they are
| happy to share. They are just deciding to burn brute force
| money that they made elsewhere and give their stuff away
| below cost, first of all because they can.
| anon373839 wrote:
| I know, and it's beautiful to see. Bad actors like
| "Open"AI tried to get in first and monopolize this tech
| with lawfare. But that game plan has been mooted by
| Meta's scorched-earth generosity.
| jstummbillig wrote:
| Meta has actually figured out where the moot is:
| Ecosystem, tooling. As soon as "we" build it, they an
| still do whatever they want with the core/llm, starting
| with Llama 4 or any other point in the future.
|
| The best kind of open source: All the important
| ingredients to make it work (more and more data and
| money) are either not open source or in the hands of
| Meta. It's prohibitive by design.
|
| People seem happy to help build Metas empire once again
| in return for scraps.
| danielmarkbruce wrote:
| It's strange you are downvoted for this. It is a
| legitimate take on things (even if it is likely not
| accurate as far as intent is concerned).
| boringg wrote:
| To be fair MSFT investments with credits into OpenAI is
| also almost market distortion. All the investments done
| with credits posing as dollars has made the VC investment
| world very chaotic in the AI space. No real money
| changing hands and the revenue on the books of MSFT and
| AMAZON is low quality revenue. Those companies AI moves
| are overvalued.
| boringg wrote:
| - Price race to the bottom with near-instant LLM responses at
| very low prices are on the horizon
|
| Maybe a big price war while the market majors fight out for
| positioning but they still need to make money off their
| investments so someone is going to have to raise prices at some
| point and youll be locked into their system if you build on it.
| Havoc wrote:
| >locked into their system
|
| There are going to be loads of providers for these open
| models. Openrouter already has 3 providers for the new 405B
| model within hours.
| boringg wrote:
| Maybe for the time being. I don't see how else they
| monetize the incredible amount the spent on the models
| without forcing people to lock into models or benefits or
| something else.
|
| It's not going to stay like this I can assure you that :).
| Havoc wrote:
| Not sure whether you mean by that post open router
| serving the 405b or meta producing more.
|
| Open router is a paid api so that can absolutely be
| sustainable.
|
| And meta has multiple reasons for going open route - some
| explained in their posts so less so (harms their
| competitors)
|
| I reckon there will be a llama 4 and beyond
| tim333 wrote:
| Meta will make money like it has in the past by having
| data about users and advertising to them. Commoditizing
| AI helps them keep at that.
|
| See Joel on Software "Smart companies try to commoditize
| their products' complements"
| https://www.joelonsoftware.com/2002/06/12/strategy-
| letter-v/
| wavemode wrote:
| > they still need to make money off their investments
|
| Depends on how you define this. Most of the top companies
| don't care as much about making a profit off of AI inference
| itself, if the existence of the -feature- of AI inference
| drives more usage and/or sales of their other products
| (phones, computers, operating systems, etc.)
|
| That's why, for example, Google and Bing searches
| automatically perform LLM inference at no cost to the user.
| choppaface wrote:
| Also the opportunity to run on user compute and on private
| data. That supports a slate of business models that are
| incompatible with the mainframe approach.
|
| Including adtech models, which are predominantly cloud-based.
| drcode wrote:
| and Xi Jingping
| mav3ri3k wrote:
| I am not deep into llms so I ask this. From my understanding,
| their last model was open source but it was in a way that you can
| use them but the inner working were "hidden"/not transparent.
|
| With the new model, I am seeing alot of how open source they are
| and can be build upon. Is it now completely open source or
| similar to their last models ?
| whimsicalism wrote:
| It's intrinsic to transformers that the inner workings are
| largely inscrutable. This is no different, but it does not mean
| they cannot be built upon.
|
| Gradient descent works on these models just like the prior
| ones.
| carimura wrote:
| Looks like you can already try out Llama-3.1-405b on Groq,
| although it's timing out. So. Hugged I guess.
| TechDebtDevin wrote:
| All the big providers should have it up by end of day. They
| just change their API configs (they're just reselling you AWS
| Bedrock).
| jamiedg wrote:
| 405B and the other Llama 3.1 models are working and available
| on Together AI. https://api.together.ai
| Havoc wrote:
| >they're just reselling you AWS Bedrock
|
| Meta announced they have 25 providers ready on day 1, so no
| it's not all AWS.
| mensetmanusman wrote:
| It's easy to support open source AI when the code is 1,000 lines
| and the execution costs $100,000,000 of electricity.
|
| Only the big players can afford to push go, and FB would love to
| see OpenAI's code so they can point it to their proprietary user
| data.
| bun_at_work wrote:
| Meta makes their money off advertising, which means they profit
| from attention.
|
| This means they need content that will grab attention, and
| creating open source models that allow anyone to create any
| content on their own becomes good for Meta. The users of the
| models can post it to their Instagram/FB/Threads account.
|
| Releasing an open model also releases Meta from the burden of
| having to police the content the model generates, once the open
| source community fine-tunes the models.
|
| Overall, this move is good business move for Meta - the post
| doesn't really talk about the true benefit, instead moralizing
| about open source, but this is a sound business move for Meta.
| jklinger410 wrote:
| This is a great point. Eventually, META will only allow LLAMA
| generated visual AI content on its platforms. They'll put a
| little key in the image that clears it with the platform.
|
| Then all other visual AI content will be banned. If that is
| where legislation is heading.
| natural219 wrote:
| AI moderators too would be an enormous boon if they could get
| that right.
| KaiserPro wrote:
| It would be good, but the cost per moderation is still really
| high for it to be practical.
| noiseinvacuum wrote:
| Creating content with AI will surely be helpful for social
| media to some extent but I think it's not that important in
| larger scheme of things, there's already a vast sea of content
| being created by humans and differentiation is already in
| recommending the right content to right people at right time.
|
| More important is the products that Meta will be able to make
| if the industry standardizes on Llama. They would have the
| front seat in not just with access the latest unreleased models
| but also settings the direction of progress and next gen LLM
| optimizes for. If you're Twitter or Snap or TikTok or compete
| with Meta on the product then good luck in trying to keep up.
| apwell23 wrote:
| I am not sure I follow this.
|
| 1. Is there such a thing as 'attention grabbing AI content' ?
| Most AI content I see is the opposite of 'attention grabbing'.
| Kindle store is flooded with this garbage and none of it is
| particularly 'attention grabbing'.
|
| 2. Why would creation of such content, even if it was truly
| attention grabbing, benefit meta in particular ?
|
| 3. How would poliferation of AI content lead to more ad spend
| in the economy. Ad budgets won't increase because of AI
| content?
|
| To me this is typical Zuckerberg play. Attach metas name to
| whatever is trendy at the moment like ( now forgotten)
| metaverse, cryptocoins and bunch of other failed stuff that was
| trendy for a second. Meta is NOT an Gen AI company ( or a
| metaverse company, or a cypto company) like he is scamming (
| more like colluding) the market to believe. A mere distraction
| from slowing user growth on ALL of meta apps.
|
| ppl seem to have just forgotten this
| https://en.wikipedia.org/wiki/Diem_(digital_currency)
| bun_at_work wrote:
| Sure - there is plenty of attention grabbing AI content - it
| doesn't have to grab _your_ attention, and it won't work for
| everyone. I have seen people engaging with apps that redo a
| selfie to look like a famous character or put the person in a
| movie scene, for example.
|
| Every piece of content in any feed (good, bad, or otherwise)
| benefits the aggregator (Meta, YouTube, whatever), because
| someone will look at it. Not everything will go viral, but it
| doesn't matter. Scroll whatever on Twitter, YouTube Shorts,
| Reddit, etc. Meta has a massive presence in social media, so
| content being generated is shared there.
|
| The more content of any type leads to more engagement on the
| platforms where it's being shared. Every Meta feed serves the
| viewer an ad (for which Meta is paid) every 3 or so posts
| (pieces of content). It doesn't matter if the user doesn't
| like 1/5 posts or whatever, the number of ads still goes up.
| apwell23 wrote:
| > it doesn't have to grab _your_ attention
|
| I am talking about in general, not me personally. No
| popular content on any website/platform is AI generated.
| Maybe you have examples that lead you believe that its
| possible on a mass scale.
|
| > look like a famous character or put the person in a movie
| scene
|
| what attention grabbing movie used gen ai persons
| visarga wrote:
| > Meta makes their money off advertising, which means they
| profit from attention. This means they need content that will
| grab attention
|
| That is why they hopped on the Attention is All You Need train
| resters wrote:
| This is really good news. Zuck sees the inevitability of it and
| the dystopian regulatory landscape and decided to go all in.
|
| This also has the important effect of neutralizing the critique
| of US Government AI regulation because it will democratize
| "frontier" models and make enforcement nearly impossible. Thank
| you, Zuck, this is an important and historic move.
|
| It also opens up the market to a lot more entry in the area of
| "ancillary services to support the effective use of frontier
| models" (including safety-oriented concerns), which should really
| be the larger market segment.
| passion__desire wrote:
| Probably, Yann Lecun is the Lord Varys here. He has Mark's ear
| and Mark believes in Yann's vision.
| war321 wrote:
| Unfortunately, there are a number of AI safety people that are
| still crowing about how AI models need to be locked down, with
| some of them loudly pivoting to talking about how open source
| models aid China.
|
| Plus there's still the spectre of SB-1047 hanging around.
| amelius wrote:
| > One of my [Mark Zuckerberg, ed.] formative experiences has been
| building our services constrained by what Apple will let us build
| on their platforms. Between the way they tax developers, the
| arbitrary rules they apply, and all the product innovations they
| block from shipping, it's clear that Meta and many other
| companies would be freed up to build much better services for
| people if we could build the best versions of our products and
| competitors were not able to constrain what we could build.
|
| This is hard to disagree with.
| glhaynes wrote:
| I think it's very easy to disagree with!
|
| If Zuckerberg had his way, mobile device OSes would let Meta
| ingest microphone and GPS data 24/7 (just like much of the
| general public already _thinks_ they do because of the
| effectiveness of the other sorts of tracking they are able to
| do).
|
| There are certainly legit innovations that haven't shipped
| because gatekeepers don't allow them. But there've been lots of
| harmful "innovations" blocked, too.
| throwaway1194 wrote:
| I strongly suspect that what AI will end up doing is push
| companies and organizations towards open source, they will
| eventually realize that code is already being shared via AI
| channels, so why not do it legally with open source?
| talldayo wrote:
| > they will eventually realize that code is already being
| shared via AI channels
|
| Private repos are not being reproduced by any modern AI. Their
| source code is safe, although AI arguably lowers the bar to
| compete with them.
| whimsicalism wrote:
| OpenAI needs to release a new model setting a new capabilities
| highpoint. This is existential for them now.
| ChrisArchitect wrote:
| Related:
|
| _Llama 3.1 Official Launch_
|
| https://news.ycombinator.com/item?id=41046540
| baceto123 wrote:
| The value of AI is in the information used to train the models,
| not the hardware.
| m3kw9 wrote:
| The truth is we need both closed and open source, they both have
| their discovery path and advantages and disadvantages, there
| shouldn't be a system where one is eliminated over the other.
| They also seem to be driving each other forward via competition.
| typpo wrote:
| Thanks to Meta for their work on safety, particularly Llama
| Guard. Llama Guard 3 adds defamation, elections, and code
| interpreter abuse as detection categories.
|
| Having run many red teams recently as I build out promptfoo's red
| teaming featureset [0], I've noticed the Llama models punch above
| their weight in terms of accuracy when it comes to safety. People
| hate excessive guardrails and Llama seems to thread the needle.
|
| Very bullish on open source.
|
| [0] https://www.promptfoo.dev/docs/red-team/
| swyx wrote:
| is there a #2 to llamaguard? Meta seems curiously alone in
| doing this kind of, lets call it, "practical safety" work
| enriquto wrote:
| It's alarming that he refers to llama as if it was open source.
|
| The definition of free software (and open source, for that
| mater), is well-established. The same definition applies to all
| programs, whether they are "AI" or not. In any case, if a program
| was built by training against a dataset, the whole dataset is
| part of the source code.
|
| Llama is distributed in binary form, and it was built based on a
| secret dataset. Referring to it as "open source" is not
| ignorance, it's malice.
| Nesco wrote:
| The training data contains most likely insane amounts of
| copyrighted material. That's why virtually none of the "open
| models" come with their training data
| enriquto wrote:
| > The training data contains most likely insane amounts of
| copyrighted material.
|
| If that is the case then the weights must inherit all these
| copyrights. It has been shown (at least in image processing)
| that you can extract many training images from the weights,
| almost verbatim. Hiding the training data does not solve this
| issue.
|
| But regardless of copyright issues, people here are
| complaining about the malicious use of the term "open
| source", to signify a completely different thing (more like
| "open api").
| tempfile wrote:
| > If that is the case then the weights must inherit all
| these copyrights.
|
| Not if it's a fair use (which is obviously the defence
| they're hoping for)
| anon373839 wrote:
| Also, fair use is just one defense to a copyright
| infringement claim. The plaintiff first has to prove the
| elements of infringement; if they can't do this, no
| defense is needed.
| jdminhbg wrote:
| > In any case, if a program was built by training against a
| dataset, the whole dataset is part of the source code.
|
| I'm not sure why I keep seeing this. What is the equivalent of
| the training data for something like the Linux kernel?
| enriquto wrote:
| > What is the equivalent of the training data for something
| like the Linux kernel?
|
| It's the source code.
|
| For the linux kernel:
| compile(sourcecode) = binary
|
| For llama: train(data) = weights
| jdminhbg wrote:
| That analogy doesn't work. `train` is not a deterministic
| process. Meta has all of the training data and all of the
| supporting source code and they still won't get the same
| `weights` if they re-run the process.
|
| The weights are the result of the development process, like
| the source code of a program is the result of a development
| process.
| indus wrote:
| Is there an argument against Open Source AI?
|
| Not the usual nation-state rhetoric, but something that justifies
| that closed source leads to better user-experience and fewer
| security and privacy issues.
|
| An ecosystem that benefits vendors, customers, and the makers of
| close source?
|
| Are there historical analogies other than Microsoft Windows or
| Apple iPhone / iOS?
| kjkjadksj wrote:
| Lets take the iphone. Secured by the industries best security
| teams I am sure. Closed source, yet teenagers in eastern europe
| have cracked into it dozens of times making jailbreaks. Every
| law enforcement agency can crack into it. Closed source is not
| a security moat, but a trade protection moat.
| finolex1 wrote:
| Replace "Open Source AI" in "is there an argument against xxx"
| with bioweapons or nuclear missiles. We are obviously not at
| that stage yet, but it could be a real, non-trivial concern in
| the near future.
| GaggiX wrote:
| Llama 3.1 405B is on par with GPT-4o and Claude 3.5 Sonnet, the
| 70B model is better than GPT 3.5 turbo, incredible.
| itissid wrote:
| How are smaller models distilled from large models, I know of
| LoRA, quantization like technique; but does distilling also mean
| generating new datasets for conversing with smaller models
| entirely from the big models for many simpler tasks?
| tintor wrote:
| Smaller models can be trained to match log probs of the larger
| model. Larger model can be used to generate synthethic data for
| the smaller model.
| popcorncowboy wrote:
| > Developers can run inference on Llama 3.1 405B on their own
| infra at roughly 50% the cost of using closed models like GPT-4o
|
| Does anyone have details on exactly what this means or where/how
| this metric gets derived?
| rohansood15 wrote:
| I am guessing these are prices on services like AWS Bedrock
| (their post is down right now).
| PlattypusRex wrote:
| a big chunk of that is probably the fact that you don't need to
| pay someone who is trying to make a profit by running inference
| off-premises.
| wesleyyue wrote:
| Just added Llama 3.1 405B/70B/8B to https://double.bot (VSCode
| coding assistant) if anyone would like to try it.
|
| ---
|
| Some observations:
|
| * The model is much better at trajectory correcting and putting
| out a chain of tangential thoughts than other frontier models
| like Sonnet or GPT-4o. Usually, these models are limited to
| outputting "one thought", no matter how verbose that thought
| might be.
|
| * I remember in Dec of 2022 telling famous "tier 1" VCs that
| frontier models would eventually be like databases: extremely
| hard to build, but the best ones will eventually be open and win
| as it's too important to too many large players. I remember the
| confidence in their ridicule at the time but it seems
| increasingly more likely that this will be true.
| didip wrote:
| Is it really open source though? You can't run these models for
| your company. The license is extremely restrictive and there's NO
| SOURCE CODE.
| jamiedg wrote:
| Looks like it's easy to test out these models now on Together AI
| - https://api.together.ai
| KingOfCoders wrote:
| Open Source AI needs to include training data.
| fsndz wrote:
| Small language models is the path forward
| https://medium.com/thoughts-on-machine-learning/small-langua...
| pja wrote:
| "Commoditise your complement" in action!
| manishrana wrote:
| rally useful insights
| manishrana wrote:
| really useful insights
| bufferoverflow wrote:
| Hard disagree. So far every big important model is closed-source.
| Grok is sort-of the only exception, and it's not even that big
| compared to the (already old) GPT-4.
|
| I don't see open source being able to compete with the cutting-
| edge proprietary models. There's just not enough money. GPT-5
| will take an estimated $1.2 billion to train. MS and OpenAI are
| already talking about building a $100 billion training data
| center.
|
| How can you compete with that if your plan is to give away the
| training result for free?
| sohamgovande wrote:
| Where is the $1.2b number from?
| bufferoverflow wrote:
| There are a few numbers floating around, $1.2B being the
| lowest estimate.
|
| HSBC estimates the training cost for GPT-5 between $1.7B and
| $2.5B.
|
| Vlad Bastion Research estimates $1.25B - 2.25B.
|
| Some people on HN estimate $10B:
|
| https://news.ycombinator.com/item?id=39860293
| smusamashah wrote:
| Meta's article with more details on the new LLAMA 3.1
| https://ai.meta.com/blog/meta-llama-3-1/
| 6gvONxR4sf7o wrote:
| > Third, a key difference between Meta and closed model providers
| is that selling access to AI models isn't our business model.
| That means openly releasing Llama doesn't undercut our revenue,
| sustainability, or ability to invest in research like it does for
| closed providers. (This is one reason several closed providers
| consistently lobby governments against open source.)
|
| The whole thing is interesting, but this part strikes me as
| potentially anticompetitive reasoning. I wonder what the lines
| are that they have to avoid crossing here?
| phkahler wrote:
| >> ...but this part strikes me as potentially anticompetitive
| reasoning.
|
| "Commoditize your complements" is an accepted strategy. And
| while pricing below cost to harm competitors is often illegal,
| the reality is that the marginal cost of software is zero.
| Palomides wrote:
| spending a very quantifiable large amount of money to release
| something your nominal competitors charge for without having
| your own direct business case for it seems a little much
| phkahler wrote:
| Companies spend very large amounts of money on all sorts of
| things that never even get released. Nothing wrong with
| releasing something for free that no longer costs you
| anything. Who knows why they developed it in the first
| place, it makes no difference.
| frabjoused wrote:
| Who knew FB would hold OpenAI's original ideals, and OpenAI now
| holds early FB ideals/integrity.
| boringg wrote:
| FB needed to differentiate drastically. FB is at its best
| creating large data infra.
| krmboya wrote:
| Mark Zuckerberg was attacked by the media when it suited their
| tech billionaire villain narrative. Now there's Elon Musk so
| Zuckerberg gets to be on the good side again
| jmward01 wrote:
| I never thought I would say this but thanks Meta.
|
| *I reserve the right to remove this praise if they abuse this
| open source model position in the future.
| frabcus wrote:
| If it was actually open source with data and the data curation
| code releases, they wouldn't be able to abuse it the same way.
| It is open weights, closed training data.
| gooob wrote:
| why do they keep training on publicly available online data, god
| dammit? what the fuck. don't they want to make a good LLM? train
| on the classics, on the essentials reference manuals for
| different technologies, on history books, medical encyclopedias,
| journal notes from the top surgeons and engineers, scientific
| papers of the experiments that back up our fundamental theories.
| we want quality information, not recent information. we already
| have plenty of recent information.
| mmmore wrote:
| I appreciate that Mark Zuckerberg soberly and neutrally talked
| about some of the risks from advances in AI technology. I agree
| with others in this thread that this is more accurately called
| "public weights" instead of open source, and in that vein I
| noticed some issues in the article.
|
| > This is one reason several closed providers consistently lobby
| governments against open source.
|
| Is this substantially true? I've noticed a tendency of those who
| support the general arguments in this post to conflate the
| beliefs of people concerned about AI existential risk, some of
| whom work at the leading AI labs, with the position of the labs
| themselves. In most cases I've seen, the AI labs (especially
| OpenAI) have lobbied against any additional regulation on AI,
| including with SB1047[1] and the EU AI Act[2]. Can anyone provide
| an example of this in the context of actual legislation?
|
| > On this front, open source should be significantly safer since
| the systems are more transparent and can be widely scrutinized.
| Historically, open source software has been more secure for this
| reason.
|
| This may be true if we could actually understand what was
| happening in neural networks, or train them to consistently avoid
| unwanted behaviors. As things are, the public weights are simply
| inscrutable black boxes, and the existence of jailbreaks and
| other strange LLM behaviors show that we don't understand how our
| training processes create models' emergent behaviors. The
| capabilities of these models and their influence are growing
| faster than our understand of them, and our ability to steer them
| to behave precisely how we want, and that will only get harder as
| the models get more powerful.
|
| > At this point, the balance of power will be critical to AI
| safety. I think it will be better to live in a world where AI is
| widely deployed so that larger actors can check the power of
| smaller bad actors.
|
| This paragraph ignores the concept of offense/defense balance.
| It's much easier to cause a pandemic than to stop one, and
| cyberattacks, while not as bad as pandemics, seem to also favor
| the attacker (this one is contingent on how much AI tools can
| improve our ability to write secure code). At the extreme, it
| would clearly be bad if everyone had access to a anti-matter
| weapon large enough to destroy the Earth; at some level of
| capability, we have to limit the commands an advanced AI will
| follow from an arbitrary person.
|
| That said, I'm unsure if limiting public weights at this time
| would be good regulation. They do seem to have some benefits in
| increasing research around alignment/interpretability, and I
| don't know if I buy the argument that public weights are
| significantly more dangerous from a "misaligned ASI" perspective
| than many competing closed companies. I also don't buy the view
| of some in the leading labs that we'll likely have "human level"
| systems by the end of the decade; it seems possible but unlikely.
| But I worry that Zuckerberg's vision of the future does not
| adequately guard against downside risks, and is not compatible
| with the way the technology will actually develop.
|
| [1] https://thebulletin.org/2024/06/california-ai-bill-
| becomes-a...
|
| [2] https://time.com/6288245/openai-eu-lobbying-ai-act/
| btbuildem wrote:
| The "open source" part sounds nice, though we all know there's
| nothing particularly open about the models (or their weights).
| The barriers to entry remain the same - huge upfront investments
| to train your own, and steep ongoing costs for "inference".
|
| Is the vision here to treat LLM-based AI as a "public good", akin
| to a utility provider in a civilized country (taxpayer funded,
| govt maintained, non-for-profit)?
|
| I think we could arguably call this "open source" when all the
| infra blueprints, scripts and configs are freely available for
| anyone to try and duplicate the state-of-the-art (resource and
| grokking requirements nonwithstanding)
| brrrrrm wrote:
| check out the paper. it's pretty comprehensive
| https://ai.meta.com/research/publications/the-llama-3-herd-o...
| openrisk wrote:
| Open source "AI" is a proxy for democratising and making (much)
| more widely useful the goodies of high performance computing
| (HPC).
|
| The HPC domain (data and compute intensive applications that
| typically need vector, parallel or other such architectures) have
| been around for the longest time, but confined to academic /
| government tasks.
|
| LLM's with their famous "matrix multiply" at their very core are
| basically demolishing an ossified frontier where a few commercial
| entities (Intel, Microsoft, Apple, Google, Samsung etc) have
| defined for decades what computing looks like _for most people_.
|
| Assuming that the genie is out of the bottle, the question is:
| what is the shape of end-user devices that are optimally designed
| to use compute intensive open source algorithms? The "AI PC" is
| already a marketing gimmick, but could it be that Linux desktops
| and smartphones will suddenly be "AI natives"?
|
| For sure its a transformational period and the landscape T+10 yrs
| could be drastically different...
| frabcus wrote:
| Unfortunately it is barely more open source than Windows. Llama
| 3 weights are binary code and while the license is pretty good
| it isn't open source.
| LarsDu88 wrote:
| Obligatory reminder of why tech companies subsidize open source
| projects: https://www.joelonsoftware.com/2002/06/12/strategy-
| letter-v/
| avivo wrote:
| The FTC also recently put out a statement that is fairly pro-open
| source: https://www.ftc.gov/policy/advocacy-research/tech-at-
| ftc/202...
|
| I think it's interesting to think about this question of open
| source, benefits, risk, and even competition, without all of the
| baggage that Meta brings.
|
| I agree with the FTC, that the benefits of open-weight models are
| significant for competition. _The challenge is in distinguishing
| between good competition and bad competition._
|
| Some kind of competition can harm consumers and critical public
| goods, including democracy itself. For example, competing for
| people's scarce attention or for their food buying, with
| increasingly optimized and addictive innovations. Or competition
| to build the most powerful biological weapons.
|
| Other kinds of competition can massively accelerate valuable
| innovation. The FTC must navigate a tricky balance here --
| leaning into competition that serves consumers and the broader
| public, while being careful about what kind of competition it is
| accelerating that could cause significant risk and harm.
|
| It's also obviously not just "big tech" that cares about the
| risks behind open-weight foundation models. Many people have
| written about these risks even before it became a subject of
| major tech investment. (In other words, A16Z's framing is often
| rather misleading.) There are many non-big tech actors who are
| very concerned about current and potential negative impacts of
| open-weight foundation models.
|
| One approach which can provide the best of both worlds, is for
| cases where there are significant potential risks, to ensure that
| there is at least some period of time where weights are not
| provided openly, in order to learn a bit about the potential
| implications of new models.
|
| Longer-term, there may be a line where models are too risky to
| share openly, and it may be unclear what that line is. In that
| case, it's important that we have governance systems for such
| decisions that are not just profit-driven, and which can help us
| continue to get the best of all worlds. (Plug: my organization,
| the AI & Democracy Foundation; https://ai-dem.org/; is working to
| develop such systems and hiring.)
| whimsicalism wrote:
| making food that people want to buy is good actually
|
| i am not down with this concept of the chattering class
| deciding what are good markets and what are bad, unless it is
| due to broad-based and obvious moral judgements.
| endorphine wrote:
| Except of 90% of the food in the supermarket shelves out
| there, which is packed in sugar and conservatives.
| tpurves wrote:
| 405 sounds like a lot of B's! What do you need to practically run
| or host that yourself?
| sumedh wrote:
| You cannot run it locally
| tpurves wrote:
| 405 is a lot of B's. What does it take to run or host that?
| danielmarkbruce wrote:
| quantize to 0 bit. Run on a potato.
|
| Jokes aside ~ 405b x 2 bytes of memory (FP16), so say 810 gigs,
| maybe 1000 gigs or so required in reality, need maybe 2 aws p5
| instances?
| dang wrote:
| Related ongoing thread:
|
| _Llama 3.1_ - https://news.ycombinator.com/item?id=41046540 -
| July 2024 (114 comments)
| littlestymaar wrote:
| I love how Zuck decided to play a new game called "commoditize
| some other billionaire's business to piss him", I can't wait
| until this becomes a trend and we get plenty of open source cool
| stuff.
|
| If he really wants to replicate Linux's success against
| proprietary Unices, he needs to release Llama with some kind of
| GPL equivalent, that forces everyone to play the open source
| game.
| Dwedit wrote:
| Without the raw data that trained the model, how is it open
| source?
| suyash wrote:
| Open source is a welcome step but what we really need is complete
| decentralisation so people can run their own private AI Models
| that keep all the data private to them. We need this to happen
| locally on laptops, mobile phones, smart devices etc. Waiting for
| when that will become ubiquitous.
| frabcus wrote:
| It is open weights not open source. If you can't train it and
| don't know the training data and can't use it to train your own
| models, it is a closed model aa a whole. Even if you have the
| binary weights.
| zoogeny wrote:
| Totally tangential thought, probably doomed to be lost in the
| flood of comments on this very interesting announcement.
|
| I was thinking today about Musk, Zuckerberg and Altman. Each
| claims that the next version of their big LLMs will be the best.
|
| For some reason it reminded me of one apocryphal cause of WW1,
| which was that the kings of Europe were locked in a kind of ego
| driven contest. It made me think about the Nation State as a
| technology. In some sense, the kings were employing the new
| technology which was clearly going to be the basis for the future
| political order. And they were pitting their own implementation
| of this new technology against the other kings.
|
| I feel we are seeing a similar clash of kings playing out. The
| claims that this is all just business or some larger claim about
| the good of humanity seem secondary to the ego stakes of the
| major players. And when it was about who built the biggest
| rocket, it felt less dangerous.
|
| It breaks my heart just a little bit. I feel sympathy in some
| sense for the AIs we will create, especially if they do reach the
| level of AGI. As another tortured analogy, it is like a bunch of
| competitive parents forcing their children into adversarial
| relationships to satisfy the parent's ego.
| light_triad wrote:
| They are positioning themselves as champions of AI open source
| mostly because they were blindsided by OpenAI, are not in the
| infra game, and want to commoditize their complements as much as
| possible.
|
| This is not altruism although it's still great for devs and
| startups. All FB GPU investments is primarily for new AI products
| "friends", recommendations and selling ads.
|
| https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/
| baby wrote:
| Meta does a good thing
|
| HN spends a day figuring out how it's actually bad
| shnock wrote:
| It's not actually bad, OP's point is that it is not motivated
| by altruism. An action can be beneficial to the people
| without that effect being the incentive
| j_maffe wrote:
| Of course, it's not altruism; it's a publicly traded
| corporation. No one should ever believe in any such claims
| by these organizations. Non-altruistic organizations can
| still make positive-impact actions when they align with
| their goals.
| satvikpendem wrote:
| No one said it was bad. It's just self interested (as
| companies generally are) and are using that to have a PR spin
| on the topic. But again, this is what all companies do and
| nothing about it is bad per se.
| MrScruff wrote:
| Nothing they're doing is bad, and sometimes we benefit when
| large companies interests align with our own. All the spiel
| about believing in open systems because of being prevented
| from making their best products by Apple is a bit much
| considering we're talking about Facebook which is hardly an
| 'open platform', and the main thing Apple blocked them on
| was profiling their users to target ads.
| m3kw9 wrote:
| So you think FB did this with zero benefit to themselves?
| They did open source so people could improve their models and
| eventually have a paid tier later either from hosting
| services or other strategies
| WithinReason wrote:
| The linked article already spelled out the benefits
| sensanaty wrote:
| By virtue of it being Meta, it's automatically bad.
|
| If we lived in a sensible world we'd have nuked Meta into a
| trillion tiny little pieces some time around the Cambridge
| Analytica bullshit.
| war321 wrote:
| They've been working on AI for a good bit now. Open source
| especially is something they've championed since the mid 2010s
| at least with things like PyTorch, GraphQL, and React. It's not
| something they've suddenly pivoted to since ChatGPT came in
| 2022.
| kertoip_1 wrote:
| They are giving it "for free" because:
|
| * they need LLMs that they can control for features on their
| platforms (Fb/Instagram, but I can see many use cases on VR
| too)
|
| * they cannot sell it. They have no cloud services to offer.
|
| So they would spend this money anyways, but to compensate some
| losses they just decided to use it to fix their PR by
| contenting developers
| sterlind wrote:
| They also reap the benefits of AI researchers across the
| world using Llama as a base. All their research is
| immediately applicable to their models. It's also likely a
| strategic decision to reduce the moat OpenAI is building
| around itself.
|
| I also think LeCunn opposes OpenAI's gatekeeping at a
| philosophical/political level. He's using his position to
| strengthen open-source AI. Sure, there's strategic business
| considerations, but I wouldn't rule out principled
| motivations too.
| TaylorAlexander wrote:
| Yes LeCun has said he thinks AI should be open like
| journalism should be - that openness is inherently valuable
| in such things.
|
| Add to the list of benefits to Meta that it keeps LeCun
| happy.
| sebastiennight wrote:
| I think people massively underestimate how much time/attention
| span (and ad revenue) will be up for grabs once a platform
| really nails the "AI friend" concept. And it makes sense for
| Meta to position themselves for it.
| zmmmmm wrote:
| yes ... I remember when online dating was absolutely cringe /
| weird thing to do. Ten years later and it's the primary way a
| whole generation seeks a partner.
|
| It will seem incredibly weird today to have an imaginary
| friend that you treat as a genuine relationship but I
| genuinely expect this will happen and become a commonplace
| thing within the next two decades.
| Havoc wrote:
| > they were blindsided by OpenAI
|
| Given the mountain of GPUs they bought at precisely the right
| moment I don't think that's entirely accurate
| sumedh wrote:
| > Given the mountain of GPUs they bought at precisely the
| right moment I don't think that's entirely accurate
|
| If I remember correctly, FB didnt buy those GPUs because of
| Open AI, they were going to buy it anyway but Mark said
| whatever we are buying let's double it.
| Havoc wrote:
| Yeah, still not entirely clear what exactly they're doing
| with all of it...but they certainly saw the GPU supply
| crunch earlier than the rest
| brigadier132 wrote:
| Intentions are overrated. Given how many people with good
| intentions fuck up everything, I'd rather have actual results,
| even if the intention is self-serving.
| istjohn wrote:
| AI is not a "complement" of a social network in the way Spolsky
| defines the term.
|
| > A complement is a product that you usually buy together with
| another product. Gas and cars are complements. Computer
| hardware is a classic complement of computer operating systems.
| And babysitters are a complement of dinner at fine restaurants.
| In a small town, when the local five star restaurant has a two-
| for-one Valentine's day special, the local babysitters double
| their rates. (Actually, the nine-year-olds get roped into early
| service.)
|
| > All else being equal, demand for a product increases when the
| prices of its complements decrease.
|
| Smart phones ar a complement of Instagram. VR headsets are a
| complement of the metaverse. AI could be a component of a
| social network, but it's not a complement.
| anthomtb wrote:
| > My framework for understanding safety is that we need to
| protect against two categories of harm: unintentional and
| intentional. Unintentional harm is when an AI system may cause
| harm even when it was not the intent of those running it to do
| so. For example, modern AI models may inadvertently give bad
| health advice. Or, in more futuristic scenarios, some worry that
| models may unintentionally self-replicate or hyper-optimize goals
| to the detriment of humanity. Intentional harm is when a bad
| actor uses an AI model with the goal of causing harm.
|
| Okay then Mark. Replace "modern AI models" with "social media"
| and repeat this statement with a straight face.
| j_m_b wrote:
| > We need to protect our data.
|
| This is a very important concern in Health Care because of HIPAA
| compliance. You can't just send your data over the wire to
| someone's proprietary API. You would at least need to de-identify
| your data. This can be a tricky task, especially with
| unstructured text.
| xpe wrote:
| Zuck needs to get real. They are Open Weights not Open Source.
| Sparkyte wrote:
| The real path forward is recognizing what AI is good at and what
| it is bad at. Focus on making what it is good at even better and
| faster. Open AI will definitely give us that option but it isn't
| a miracle worker.
|
| My impression is that AI if done correctly will be the new way to
| build APIs with large data sets and information. It can't write
| code unless you want to dump billions of dollars into a solution
| with millions of dollars of operational costs. As it stands it
| loses context too quickly to do advance human tasks. BUT this is
| where it is great at assembling data and information. You know
| what is great at assembling data and information? APIs.
|
| Think of it this way if we can make it faster and it trains on a
| datalake for a company it could be used to return information
| faster than a nested micro-service architecture that is just a
| spiderweb of dependencies.
|
| Because AI loses context simple API requests could actually be
| more efficient.
| Bluescreenbuddy wrote:
| >This is how we've managed security on our social networks - our
| more robust AI systems identify and stop threats from less
| sophisticated actors who often use smaller scale AI systems.
|
| So about all the bots and sock puppets on social media..
| pjkundert wrote:
| Deployment of PKI-signed distributed software systems to use
| community-provisioned compute, bandwidth and storage at scale is,
| now quite literally, the future.
|
| We mostly don't all want or need the hardware to run these AIs
| ourselves, all the time. But, when we do, we need lots of it for
| a little while.
|
| This is what Holochain was born to do. We can rent massive
| capacity when we need it, or earn money renting ours when we
| don't.
|
| All running cryptographically trusted software at Internet scale,
| without the knowledge or authorization of commercial or
| government "do-gooders".
|
| Exciting times!
| ayakang31415 wrote:
| Massive props to AI teams at Meta that released this model open
| source
| ceva wrote:
| They have earned so much money on all of their users, this is
| least they can do to give back to the community, if this can be
| considered that ;)
| animanoir wrote:
| "Says the Meta Inc".
| seydor wrote:
| That assumes LLMs are the path to AI, which is increasingly
| becoming an unpopular opinion
| tmsh wrote:
| Software 2.0 is about open licensing.
|
| I.e., the more important thing - the more "free" thing - is the
| licensing now.
|
| E.g., I play around with different image diffusion models like
| Stable Diffusion and specific fine-tuned variations for
| ControlNet or LoRA that I plug into ComfyUI.
|
| But I can't use it at work because of the licensing. I have to
| use InvokeAI instead of ComfyUI if I want to be careful and only
| very specific image diffusion models without the latest and
| greatest fine-tuning. As others have said - the weights
| themselves are rather inscrutable. So we're building on more
| abstract shapes now.
|
| But the key open thing is making sure (1) the tools to modify the
| weights are open and permissive (ComfyUI, related scripts or
| parts of both the training and deployment) and (2) the underlying
| weights of the base models and the tools to recreate them have
| MIT or other generous licensing. As well as the fine-tuned
| variants for specific tasks.
|
| It's not going to be the naive construction in the future where
| you take a base model and as company A you produce company A's
| fine tuned model and you're done.
|
| It's going to be a tree of fine-tuned models as a node-based
| editor like ComfyUI already shows and that whole tree has to be
| open if we're to keep the same hacker spirit where anyone can
| tinker with it and also at some point make money off of it. Or go
| free software the whole way (i.e., LGPL or equivalent the whole
| tree of tools).
|
| In that sense unfortunately Llama has a ways to go to be truly
| open: https://news.ycombinator.com/item?id=36816395
| Palmik wrote:
| In the LLM world there are many open source solutions to find
| tuning, maybe the best one being from Meta:
| https://github.com/pytorch/torchtune
|
| In terms of inference and interface (since you mentioned comfy)
| there are many truly open source options such as vLLM (though
| there isn't a single really performant open source solution for
| inference yet).
| jameson wrote:
| It's hard to say Llama is an "open source" when their license
| states Meta has full control under certain circumstances
|
| https://raw.githubusercontent.com/meta-llama/llama-models/ma...
|
| > 2. Additional Commercial Terms. If, on the Llama 3.1 version
| release date, the monthly active users of the products or
| services made available by or for Licensee, or Licensee's
| affiliates, is greater than 700 million monthly active users in
| the preceding calendar month, you must request a license from
| Meta, which Meta may grant to you in its sole discretion, and you
| are not authorized to exercise any of the rights under this
| Agreement unless or until Meta otherwise expressly grants you
| such rights.
| __loam wrote:
| It should be transparently clear that this move was taken by
| Meta to drive their competitors out of business in a capital
| intensive space.
| apwell23 wrote:
| not sure how it drives competitors out of business. OpenAI is
| losing money on queries not on model creation. This
| opensource model has no impact of their business model of
| charging users money to run queries.
|
| on a side note OpenAI is losing users on its own. It doesn't
| need meta to put it out of business.
| systemvoltage wrote:
| Tbh, it's incredibly generous.
| nailer wrote:
| Llama isn't open source. The license is at
| https://llama.meta.com/llama3/license/ and includes various
| restrictions on use, which means it falls outside the rules
| created by the https://opensource.org/osd
| war321 wrote:
| Even if it's just open weights and not "true" open source, I'll
| still give Meta the appreciation of being one of the few big AI
| companies actually committed to open models. In an ecosystem
| where groups like Anthropic and OpenAI keep hemming and hawing
| about safety and the necessity of closed AI systems "for our
| sake", they stand out among the rest.
| meowtimemania wrote:
| Why would openai/anthropic's approach be more safe? Are people
| able to remove all the guard rails on the llama models?
| alfalfasprout wrote:
| They're not safer. The claim is that OpenAI will enforce
| guard rails and take steps to ensure model outputs and
| prompts are responsible... but only a fool would take them at
| their word.
| seoulmetro wrote:
| Yeah.. and Facebook said they would enforce censorship on
| their platforms to ensure content safety.. that didn't turn
| out so well. Now it just censors anything remotely
| controversial, such as World War 2 historical facts or even
| just slightly offensive wording.
| Spivak wrote:
| You're really just arguing about the tuning. I get that
| it's annoying as a user but as a moderator going into it
| with the mentality that any post is expendable and
| bringing down the banhammer on everything near the line
| keeps things civil. HN does that too with the no flame-
| bait rule.
| koolala wrote:
| Censorship isn't moderation.
| seoulmetro wrote:
| HN moderation is quite poor and very subjective. The
| guidelines are not the site rules, the rules are made up
| on the spot.
|
| HN censors too. Facebook just does it automatically on a
| huge scale with no reasoning behind each censor.
|
| Censorship is just tuning people or things you don't want
| out. Censorship of your own content as a user is
| extremely annoying and Facebook's censorhsip is quite
| unethical. It doesn't help safety of the users, it helps
| safety of the business.
|
| Also Facebook censors things that are not objectively not
| offensive in lots of instances. YouTube too. Safety for
| their brand.
| zelphirkalt wrote:
| The banhammer can quickly become a tool of net negative
| though, when actual facts are being repressed/censored.
| Der_Einzige wrote:
| Yes, it's trivial to remove guardrails from any open access
| model:
|
| https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-
| in...
|
| https://huggingface.co/failspy/Llama-3-70B-Instruct-
| ablitera...
| Zuiii wrote:
| Humanity is so fortunate this "guardrails" mentality didn't
| catch on when we started publishing books. While too close
| for comfort, we got twice lucky that computing wasn't
| hampered by this mentality either.
|
| This time, humanity narrowly averted complete disaster
| thanks to the huge efforts and resources of a small number
| of people.
|
| I wonder if we are witnessing humanity's the end of open
| knowledge and compute (at least until we pass through a neo
| dark ages and reach the next age of enlightenment).
|
| Whether it'll be due to profit or control, it looks like
| humanity is posed to get fucked.
| xvector wrote:
| > Humanity is so fortunate this "guardrails" mentality
| didn't catch on when we started publishing books.
|
| Ah, but it almost did[1]:
|
| > novels [...] were accused of corrupting the youth, of
| planting dangerous ideas into the heads of housewives
|
| The pessimist playbook is familiar when it comes to
| technological/human progress. Today, the EU has made it
| so hard to release AI models in the region that most
| companies simply don't bother. Case in point: Meta and
| others have decided to make their models unavailable in
| the EU for the forseeable future [2]. I can only imagine
| how hard it is for a startup like Mistral.
|
| [1]: https://pessimistsarchive.org/list/novels/clippings
|
| [2]: https://www.theguardian.com/technology/article/2024/
| jul/18/m...
| shkkmo wrote:
| The EU hasn't made it hard to release models (yet). The
| EU has made it hard to train models on EU data. Meta has
| responded by blocking access to the models trained on
| non-EU data as a form of leverage/retribution. This is
| explained by your own reference.
| loceng wrote:
| To me it will be most interesting to see who attempts to
| manipulate the models by stuffing them with content,
| essentially adding "duplicate" content such as via tautology,
| in order to make it have added-misallocated weight; which I
| don't think an AI model will automatically be able to
| determine, unless it was truly intelligent, instead it would
| require to be trained by competent humans.
|
| And so the models that have mechanisms for curating and
| preventing such misapplied weighting, and then the
| organizations and individuals who accurately create adjustments
| to the models, will in the end be the winners - where truth has
| been more honed for.
| rednafi wrote:
| How's only sharing the binary artifact is open source? There's
| the data aspect of things that they can't share because of
| licensing and the code itself isn't accessible.
| Palmik wrote:
| It's much better than sharing a binary artifact of regular
| software, since the weights can be and are easily and
| frequently modified by fine tuning the model. This means you
| can modify the "binary artifact" to your needs, similar to how
| you might change the code of open source software to add
| features etc.
| AMICABoard wrote:
| Okay if anyone wants to try Llama 3.1 inference on CPU, try this:
| https://github.com/trholding/llama2.c (L2E)
|
| It's a bit buggy but it is fun.
|
| Disclaimer: I am the author of L2E
| sebastiennight wrote:
| I've summarized this entire thread in 4 lines (didn't even use AI
| for it!)
|
| Step 1. Chick-Fil-A releases a grass-fed beef burger to spite
| other fast-food joints, calls it "the vegan burger"
|
| Step 2. A couple of outraged vegans show up in the comments,
| pointing out that beef, even grass-fed beef, isn't vegan
|
| Step 3. Fast food enthusiasts push back: it's unreasonable to
| want companies to abide by this restrictive definition of
| "vegan". Clearly this burger is a gamechanger and the definition
| needs to adapt to the times.
|
| Step 4. Goto Step 2 in an infinite loop
| nathansherburn wrote:
| Open source software is one of our best and most passionately
| loved inventions. It'd be much easier to have a nuanced
| discussion about "open weights" but I don't think that's in
| Facebook's interest.
| llm_trw wrote:
| More like vegetarians show up claiming to be vegans, then
| vegans show up and explain why eating animal products is still
| wrong.
|
| That's the difference between open source and free software.
| dogcomplex wrote:
| Yeah the moral step up from the status quo is still laudable.
| Open weights are still much improved over the closed creepy
| spy agency clusterfucks that OpenAI/Microsoft/Google/Apple
| are bringing to the table.
| dilliwal wrote:
| On point, and pretty good analogy
| dev1ycan wrote:
| Reality: they've realize gpt 4 is a wall, they can't keep pouring
| trillions of dollars into it for no improvement or little at all,
| so now they want to put it to the open source until someone
| figures out the next step then they'll take it behind closed
| doors again.
|
| I hate how the moment it's too late will be, by design, closed
| doors.
| Ukv wrote:
| > Reality: they've realize gpt 4 is a wall, they can't keep
| pouring trillions of dollars into it for no improvement or
| little at all, so now they want to put it to the open source
| [...]
|
| This is Meta (LLaMA, which has had available weights for a
| while), not OpenAI (GPT).
| dev1ycan wrote:
| How does that change anything about my comment? what does
| "available weights" change about the system being closed
| source? additionally, they have the developers, the second
| they figure out a way to achieve AGI or something close to it
| they'll take it closed source, this is just outsourcing
| maintenance and small tweaks.
| smashah wrote:
| Actually open source Whatsapp is the way forward.
| petetnt wrote:
| Great, now release the datasets used for training your AI so
| everyone can get compensated accordingly and ask that your
| competition follow suit.
| twelve40 wrote:
| It'll be interesting to come back here in a couple of years and
| see what's left. What do they even do anymore? They have
| Facebook, which hasn't visibly changed in a decade. They have
| Instagram, which feels a bit sleeker but also remained more or
| less the same. and Whatsapp. Ad network that runs on top of those
| services and floods them with trash. Bunch of stuff that doesn't
| seem to exist anymore - Libra, the grandiose multibillion dollar
| Legless VR, etc.
|
| But they still have 70 thousand people (a small country) doing
| _something_. What are they doing? Updating Facebook UI? Not
| really, the UI hasn't been updated, and you don't need 70
| thousand people to do that. Stuff like React and Llama? Good, I
| guess, we'll see how they make use of Llama in a couple of years.
| Spellcheck for posts maybe?
| therealdrag0 wrote:
| And still making 135B dollars in revenue, or 2M per employee. I
| don't know what they do either lol, but I don't mind that
| revenue supporting jobs.
| benreesman wrote:
| In general I look back on my time at FB with mixed feelings, I'm
| pretty skeptical that modern social media is a force for good and
| I was there early enough to have moved the needle.
|
| But this is really positive stuff and it's nice to view my time
| there through the lens of such a change for the better.
|
| Keep up the good work on this folks.
|
| Time to start thinking about opening up a little on the training
| data.
| bentice wrote:
| Ironically, this benefits Apple so much.
| netsec_burn wrote:
| How? They are prohibited from using it in the license.
| ysofunny wrote:
| or else is not even scientific
| elecush wrote:
| Ok one notable difference: did the linux researchers of yore warn
| about adversarial giants getting this tech? Or is this unique to
| the current moment? That for me is the largest question when
| considering the logical progression on "linux open is better
| therefore ai open is better".
| Spivak wrote:
| We can't open source Linux because bad people might run
| servers?
|
| Can you imagine the disinformation they could spread with
| those? With enough of them you could have a massively global
| site made entirely for spreading it. God what if such a thing
| got into the hands of an egocentric billionaire?
| endorphine wrote:
| An operating system is not "generative", hence it's not a
| force multiplier.
| Spivak wrote:
| Are we on the same forum? Our _entire_ field is building
| force multipliers to extend ourselves well beyond of what
| we 're capable as individuals and the OS is tool that lets
| you get it done. Scale is like.. our entire thing. I feel
| like we're just so used to the world with computers that we
| forget how much power they allow people to wield. Which,
| honestly is maybe a good sign for the next of tools because
| AI isn't going to be more impactful than computers and we
| all survived.
| Zuiii wrote:
| I don't understand. A bad actor can use a linux server to
| automatically run botnets and exploit new devices. How is
| that not a force multiplier?
| turingbook wrote:
| Open-weights models are not really open source.
| scotty79 wrote:
| It's more like a freeware than open source. You can launch it on
| your hardware and use it but how it was created is mostly not
| published.
|
| Still huge props to them for doing what they do.
| casebash wrote:
| I expect this to end up having been one of the worst timed blog
| posts in history. Open source AI has mostly been good for the
| world up until now, but we're getting to the point where we're
| about to find out why open-sourcing sufficiently bad models is a
| terrible idea.
| tananaev wrote:
| I don't think weights is the source. Data is the source. But
| still better than nothing.
| cratermoon wrote:
| You first, Zuck.
| ofou wrote:
| Llama 3 Training System Total: 19.2 exaFLOPS
| | +-------------+-------------+ |
| | Cluster 1 Cluster 2 9.6
| exaFLOPS 9.6 exaFLOPS |
| | +------+------+ +------+------+ |
| | | | 12K GPUs 12K GPUs 12K GPUs
| 12K GPUs | | | |
| [####] [####] [####] [####] 400+
| 400+ 400+ 400+ TFLOPS/GPU TFLOPS/GPU
| TFLOPS/GPU TFLOPS/GPU
| tomjen3 wrote:
| Re safety: Just release two models, one that has been tuned and
| one that hasn't.
|
| Claude is supposed to be better, but it is also even more locked
| down than ChatGPT.
|
| Word will let me write a manifest for a new Nazi party, but
| Claude is so locked down that it won't find a cartoon in a
| picture and Gemini... well.
|
| If AIs are not to harm society, they need to enable us to think
| in new ways.
| slowhadoken wrote:
| First you're going to have to write some laws that prevent
| openwashing and legitimate open source projects from becoming
| proprietary.
| thntk wrote:
| When Zuck said spy can easily steal models, I wonder how much of
| it comes from experiences. I remember they struggled to train OPT
| not long ago.
|
| On a more serious note, I don't really buy his arguments about
| safety. First, widespread AI does not reduce unintentional harm
| but increases it, because the _rate of accident_ is compound.
| Second, the chance of success for threat actors will increase,
| because of the _asymmetric advantage_ of gaining access to all
| open information and hiding their own information. But there is
| no reverse at this point, I enjoy it while it lasts, AGI will
| come sooner or later anyway.
| Simon_ORourke wrote:
| I thoroughly support Meta's open-sourcing of these AI models
| going forward. However, for a company that absolutely closed down
| discussions about providing API access to their platform, I'm
| left wondering what's in it (monetarily) for them by doing this?
| Is it to simply undercut competition in the space, like some
| grocery store selling below cost?
| yard2010 wrote:
| Stay assured these guys are working day and night to make our
| world a darker place
| GreenWatermelon wrote:
| Meta open sources its tools: React, GraphQL, PyTorch, and now
| these new models. Meta seems to be about open sourcing tools,
| not providing open access to their platforms.
|
| The AI model complements the platform, and their platform is
| the money maker. They hold the belief that open sourcing their
| tools benefit their platform on the long run, which is why
| they're doing it. And in doing so, they aren't under the
| control of any competitors.
|
| I would say it's more like a grocery store providing free
| parking, a bus stop, self-checkout, online menu, and free
| delivery.
| msnkarthik wrote:
| Interesting discussion! While I agree with Zuckerberg's vision,
| the comments raise valid concerns. The point about GPU
| accessibility and cost is crucial. Public clusters are great, but
| sustainable funding and equitable access are essential to avoid
| exacerbating existing inequalities. I also resonate with the call
| for CUDA alternatives. Breaking the dependence on proprietary
| technology is key for a truly open AI ecosystem. While existing
| research clusters offer some access, their scope and resources
| often pale in comparison to what companies like Meta are
| proposing. We need a multi-pronged approach: open-sourcing models
| AND investing in accessible infrastructure, diverse hardware
| options, and sustainable funding models for a truly democratic AI
| future.
| fnordpiglet wrote:
| I suspect we are still early in the optimization evolution. The
| weights are what matter. The ability to run them anywhere might
| come.
| troupo wrote:
| The training datasets and methodology are what matters. None
| of that is disclosed by anyone
| narrator wrote:
| 3nm chip fabs take years to build. You don't just go to AWS and
| spin one up. This is the very hard part about AI that breaks a
| lot of the usual tech assumptions. We have entered a world
| where suddenly there isn't enough compute, because it's just
| too damn hard to build capacity and that's different from the
| past 40 years.
| ohthehugemanate wrote:
| Has anyone taken apart the llama community license and compared
| it to validated open source licenses? Red Hat is making a big
| deal about releasing the Granite LLM released under Apache. Is
| there a real difference between that and what Llama does?
|
| https://www.redhat.com/en/topics/ai/open-source-llm
| rightbyte wrote:
| What would be the speed of a querry of running this model from
| disk on a ordinary PC?
|
| Has anyone tried that?
| v3ss0n wrote:
| We welcome Mark Zuckerberg's Redemption Arc! Opensource AI Here
| we go!
| rldjbpin wrote:
| this is very cool indeed that meta has made available more than
| they need to _in terms of model weights_.
|
| however, the "open-source" narrative is being pushed a bit too
| much like descriptive ML models were called "AI", or applied
| statistics "data science". with reinforced examples such as this,
| we start to lose the original meaning of the term.
|
| the current approach of startups or small players "open-sourcing"
| their platforms and tools as a means to promote network effect
| works but is harmful in the long run.
|
| you will find examples of terraform and red hat happening, and a
| very segmented market. if you want the true spirit of open-
| source, there must be a way to replicate the weights through
| access to training data and code. whether one could afford
| millions of GPU hours or not, real innovation would come from
| remixing the internals, and not just fine-tuning existing stuff.
|
| i understand that this is not realistically going to ever happen,
| but don't perform deceptive marketing at the same time.
| avereveard wrote:
| they said, while not releasing the video part of chamleon model
| chx wrote:
| A total ban on generative AI is the way forward. If the industry
| refuses to make it safe by self regulating then the regulator
| must step in and ban it until better, more fine tuned regulation
| can be made. It is needed to protect our environment, our
| democracy, our very lives.
| yard2010 wrote:
| Isn't it like banning Christianity? I don't think it can be
| done
| yard2010 wrote:
| I don't believe in any word coming out from this lizard. He is
| the most evil villain I know, and I live in the middle east, can
| you imagine
| pandaswo wrote:
| Way to go!
| tarruda wrote:
| From the "Why Open Source AI Is Good for Meta" section, none of
| the four given reasons seem to justify spending so much money to
| train these powerful models and give them away for free.
|
| > Third, a key difference between Meta and closed model providers
| is that selling access to AI models isn't our business model.
| That means openly releasing Llama doesn't undercut our revenue,
| sustainability, or ability to invest in research like it does for
| closed providers. (This is one reason several closed providers
| consistently lobby governments against open source.)
|
| Maybe this is a strategic play to hurt other AI companies that
| depend on this business model?
| sensanaty wrote:
| Meanwhile Facebook is flooded with AI-generated slop with
| hundreds of thousands of other bots interacting with it to boost
| it to whoever is insane enough to still use that putrid hellhole
| of a mass-data-harvesting platform.
|
| Dead internet theory is very much happening in real time, and I
| dread what's about to come since the world has collectively
| decided to lose their minds with this AI crap. And people on this
| site are unironically excited about this garbage that is
| indistinguishable from spam getting more and more popular. What a
| fucking joke
| rocgf wrote:
| I agree with the overall sentiment, but it's is not necessarily
| the case that "the world has collectively decided to lose their
| minds with this AI crap". You only need a relatively small
| number of bad actors for this to be the case.
| consf wrote:
| I think the situation highlights the need for better
| regulation
| throwaway7ahgb wrote:
| No thank you. Use existing laws that cover wide nets and
| actually enforce them.
| TheAceOfHearts wrote:
| I think there's room for a middle-ground. I agree that there's
| a lot of slop being generated and shared around. Part of it is
| due to the initial excitement over these AI tools. Part of it
| is that most AI tools still kinda suck at the moment. In the
| long term I expect tools to get way better, which I'm hopefully
| could help enable smaller teams of people than ever before to
| execute on their creative vision.
|
| Personally I have tons of creative ideas which I think would be
| interesting and engaging but for which I lack the resources to
| bring into this world, so I'm hoping that in the long term AI
| tools can help bridge this gap. I'm really hopeful for a world
| where people from all over the world can share their creative
| stories, rather than being mostly limited to a few rich people
| in Hollywood.
|
| Unfortunately I do expect this to end up being the minority of
| content, especially as we continue being flooded by increasing
| amounts of trash. But maybe that's just opening up the
| opportunity for someone to develop new content curation tools.
| If anything, even before the rise of AI stuff there were
| mountains of content, and we saw with the rise of TikTok that a
| good recommendation algorithm still leaves room for new
| platforms.
| have_faith wrote:
| The feed certainly is, but I suspect most activity left on
| Facebook is happening in group pages. Groups are the only thing
| I still log in for as some of them, particularly the local
| ones, have no other way of taking part. They are also commonly
| membership by request and actively moderated. If I had the time
| (and energy) I might put some effort into advocating to moving
| to something else, but it will be an uphill battle.
| consf wrote:
| The challenges of moving to alternative solutions
| have_faith wrote:
| The irony of a bot account sliding into a convo about
| internet slop is not lost.
| sgu999 wrote:
| How do you know?
| justneedaname wrote:
| The comment history does read much like you'd expect from
| a bot, lots of short, generic statements that vaguely tie
| to the subject of the post
| jrnx wrote:
| I'd assume that any platform which get's sufficiently popular
| will become a bot and AI content target...
| baq wrote:
| I stopped going on facebook a few years ago and don't miss
| it; I don't even need messenger as everyone migrated to
| whatsapp (yes I know, normal people don't want to move to
| signal, but got quite a few techy friends to migrate). The
| FB-only groups are indeed a problem, I'm delegating them to
| my wife.
|
| _IF_ I ever had to go to FB for anything, I 'd probably
| install a wall-removing browser extension. Mobile app is of
| course out of question.
| makingstuffs wrote:
| > IF I ever had to go to FB for anything, I'd probably
| install a wall-removing browser extension. Mobile app is of
| course out of question.
|
| You'll probably find you can no longer make an account. I'm
| in the same boat as you (not used and haven't missed in
| over a decade), however, my partner needed an account to
| manage an ad campaign for a client and neither of us were
| able to make one. Both tried a load of different things
| and, ultimately, gave up. Had to tell the client what they
| needed over a video call
| ohlookcake wrote:
| I just tried making one after reading your comment, and
| it was... pretty straightforward? I'm curious what
| blockers you encountered
| throwawayfour wrote:
| For me it's the Marketplace. Left FB many years ago only to
| come back to keep an eye out for used Lego for the kiddos. At
| least in my region, and for my purposes, Marketplace is miles
| better than any other competing sites/apps.
| throwaway7ahgb wrote:
| Same here, Groups + Marketplace is actually a wealth of
| information. There are still a few dark patterns but most
| manageable for a "free" platform.
|
| OPs comments read like we're describing something the SS
| built (Godwin says hi).
| giancarlostoro wrote:
| It is probably a mix of people who got nowhere else to
| interact with people, and people using Groups. Facebook was
| where you'd go to talk to all your friends and family, most
| of my friends have been getting shadowbanned since 2012 ~ so
| it made me use it less. I got auto striked on my account for
| making a meme joke about burning a house down due to a giant
| spider in a video. I appealed, and it got denied. I'm not
| using a platform that will inadvertently ban me by AI. But
| the people actually posting to kill others, and actually burn
| shit down, and bots stay just fine?
|
| Plus I didn't want to risk my employers Facebook App being in
| limbo if I got banned, so I left Facebook alone, never to
| return.
|
| Facebook trying to police the world is the only thing keeping
| me away, if I can use the platform and post meme comments
| again, maybe I might reconsider, but I doubt it. Reddit is in
| a similar boat. You can get banned, but all the creepy
| pedophile comments from decades and recently are still up no
| problem.
| kalsnab wrote:
| > But the people actually posting to kill others, and
| actually burn shit down ...
|
| That kind of burning down is classified as "mostly
| peaceful" by mainstream and AI.
| sgu999 wrote:
| > If I had the time (and energy) I might put some effort into
| advocating to moving to something else, but it will be an
| uphill battle.
|
| What are the alternatives for local groups? I've recently
| seen an increase in the amount of Discourse forums available,
| which is nice, but I don't think it'd be very appealing to
| the average cycling or hiking group.
| consf wrote:
| Indeed, this perspective is understandable, given the rapid and
| often disruptive changes brought by AI, but it is also
| important to consider the potential benefits which are quite
| promising
| kashyapc wrote:
| I get your frustration of a scorched internet. But I don't
| think it's all that gloomy. Whether we like it or not, LLMs and
| some kind of a "down-to-earth AI" is here to stay, once the
| dust settles. Right now, it feels like everything is burning
| because we're in the thick of an evolving situation; and the
| Silicon Valley tech-bros are in a hurry to ride the wave and
| make a quick buck with their ChatGPT wrapper. (I can't speak of
| social networks, I don't have any accounts for 10+ years,
| except for HN.) * * *
|
| On "collective losing of minds", you might appreciate this
| quote from 1841 (!) by Charles MacKay. I quoted it in the
| past[1] here, but is worth re-posting:
|
| _" In reading the history of nations, we find that, like
| individuals, they have their whims and their peculiarities;
| their seasons of excitement and recklessness, when they care
| not what they do. We find that whole communities suddenly fix
| their minds upon one object, and go mad in its pursuit; that
| millions of people become simultaneously impressed with one
| delusion, and run after it, till their attention is caught by
| some new folly more captivating than the first [...]_
|
| _" Men, it has been well said, think in herds; it will be seen
| that they go mad in herds, while they only recover their senses
| slowly, and one by one."_
|
| -- from MacKay's book, 'Extraordinary Popular Delusions and the
| Madness of Crowds'
|
| [1] https://news.ycombinator.com/item?id=25767454
| sgu999 wrote:
| What a nice quote. What were "millions of people"
| simultaneously obsessed with around 1841?
| kashyapc wrote:
| I didn't read the book, I'm afraid. I don't know if he
| actually mentions it anywhere.
| LoveMortuus wrote:
| In regards to the dead internet hypothesis, the content that
| you're enjoying today, will still be there tomorrow. What I
| mean is if you, for example, like Mr.Beast, AI is not going to
| replace him and the content that he produces. Now, he might use
| AI to boost the productivity of his company, but the end result
| is still "making the best video ever" as he's often said.
| LauraMedia wrote:
| The big problem with this is that content is harder and
| harder to find. Try to find a non-AI generated reply to a
| viral post on Twitter, you're looking at having to scroll
| down 5-6 1080p screens to finally get to some actual stuff
| people wrote.
|
| The content you're enjoying today still exists, but it's a
| needle in a haystack of AI spam
| Shinchy wrote:
| This is the exact thing I keep telling people. It's all
| well and good saying human made content will still be
| around, but it will be covered in a tidal wave of cheaply
| generated AI hogwash.
| throwawayfour wrote:
| Reminds me of shopping on <enter your favorite large
| ecommerce site>
| shinycode wrote:
| We need a law or something that impose platforms to label
| any text that is only AI and text reworked by AI. And the
| possibility to filter both (we did this with industrial
| products). Then let humanity decide what it wants to feed
| itself with. I prefer to give up completely internet if it
| would only be filled with generated content. I gladly let
| it to people who enjoy that. Maybe a platform that label
| this and allows strict filtering (if possible) would be a
| success.
| cpursley wrote:
| Can we do that for the mountains of ghost-written content
| and books as well?
| shinycode wrote:
| My take is to explicitly mark a difference between human
| generated content and AI generated content. Not to label
| one superior to the other. It's just to let people choose
| what they prefer. Like in chat bots for some companies
| they let you know you don't talk to a human. Would you
| blindly accept a medical prescription generated by an AI
| ? Some people might even prefer the prescription made by
| the AI. All I'm saying is to inform people. After they
| make their choice.
| lukas099 wrote:
| The signal:noise ratio is decreasing because it's easier to
| generate noise. I think paying for content (or content
| curation) is probably the way to curate high-signal
| information feeds.
| elorant wrote:
| Every content that exists on the web could now be rewritten and
| repurposed by LLMs. This could lead into an explosion of web
| sites that could easily double in size every few years. Good
| luck indexing all that crap and deciding what is duplicate and
| what not.
| sbeaks wrote:
| Are we going to end up with rererecapture where to post you
| need something on your phone/laptop measuring typing speed and
| scanning your face with an IR cam? Or a checkmark showing typed
| out by hand? Wouldn't get rid of ai content but may slow down
| bots posting it.
| kristopolous wrote:
| They're trajectory is so close to AOL's it's almost
| implausible. Their cash cow flagship product is widely panned
| by tech insiders as abusive, manipulative, and toxic but they
| also place significant financial resources in high quality open
| source projects out of what can only be described as
| benevolence and some commitment to the common good.
| aranke wrote:
| Which open-source projects is AOL known for? A quick Google
| search isn't returning much.
| jonathanwallace wrote:
| https://www.google.com/search?hl=en&q=aol%20tcl
|
| https://wiki.tcl-lang.org/page/AOLserver
| kristopolous wrote:
| Dropping cash for Netscape/Mozilla is the big one.
| csomar wrote:
| That's actually the preferred outcome. The open internet noise
| ratio will be so high that it turns into pure randomness. The
| traditional old venues (reputed blogs, small communities
| forums, pay for valued information, pay for your search, etc..)
| will resurface again. The Popular Web has been in a slow
| decline, time to kill it.
| wildrhythms wrote:
| My concern is that these platforms will soon sell Human
| Created (tm) content back to us.
| thierrydamiba wrote:
| Is it really a decline? If people are looking for and
| consuming the slop, where is the issue?
|
| There is still plenty high quality stuff too if that is what
| you're looking for. If you want to roll with the pigs in the
| shit, who am I to tell you no?
| dspillett wrote:
| _> The traditional old venues [...] will resurface again._
|
| ... to be subsequently drowned out by AI "copies" of
| themselves, which in turn are used to train more AIs, until
| we don't have a Dead Internet1 but a Habsburg Internet.
|
| --
|
| [1] https://en.wikipedia.org/wiki/Dead_Internet_theory
| eightysixfour wrote:
| I am one of those people unironically excited - the social
| parts of the internet have been dead and filled with bots for a
| long time. Now people just see it more.
|
| Maybe they'll go outside.
| zwnow wrote:
| Bots were easy to detect, now they're almost
| indistinguishable from human interaction. The death of the
| internet would be an incredible loss for humanity. You will
| not be able to trust anything you find online. Nothing is
| safe from this.
| eightysixfour wrote:
| I am using the broad definition of bots to include large
| numbers of accounts controlled by small groups of people to
| influence online discourse.
|
| Between those bots (for nefarious, mundane, or marketing
| reasons) and previous attempts at automated bots, "broad"
| internet discourse was already ruined. Now people recognize
| it. This will have the effect of pushing communities back
| to smaller sizes, I think this is a good thing.
|
| People shouldn't have trusted all the things they read
| online from untrusted sources in the first place.
| Kiro wrote:
| And I don't understand why people lump all AI together as if a
| coding assistant is the same thing as AI generated spam and
| other garbage. I'm pretty sure no-one here is excited about
| that.
|
| I'm excited about the former since AI has massively improved my
| productivity as a programmer to a point where I can't imagine
| going back. Everything is not black or white and people can be
| excited about one part of something and hate another at the
| same time.
| chr15m wrote:
| fear is why
| sensanaty wrote:
| Seeing some of the code my colleagues are shitting out with
| the help of coding "assistants", I would definitely
| categorize their contributions as spam, and has had nothing
| but an awful effect on my own time and energy, having to sift
| through the unfiltered crap. The problem being, of course,
| that the idiotic C-suite in their infinite wisdom decided to
| push "use AI assistants" as a KPI, so people are even
| encouraged to spam PRs with terrible code.
|
| If this is what productivity looks like then I'm proud to be
| unproductive.
| Kiro wrote:
| I'm sorry that you work at a dysfunctional company.
| testfrequency wrote:
| Hear, hear!
| codedokode wrote:
| AI should not be open source because it can be used in military
| applications. It doesn't make sense to give away a technology
| others might use against you.
| ChanderG wrote:
| I think all this discussion around Open-source AI is a total
| distraction from the elephants in the room. Let's list what you
| need to run/play around with something like Llama:
|
| 1. Software: this is all Pytorch/HF, so completely open-source.
| This is total parity between what corporates have and what the
| public has.
|
| 2. Model weights: Meta and a few other orgs release open models -
| as opposed to OpenAI's closed models. So, ok, we have something
| to work with.
|
| 3. Data: to actually do anything useful you need tons of data.
| This is beyond the reach of the ordinary man, setting aside the
| legality issues.
|
| 4. Hardware: GPUs, which are extremely expensive. Not just that,
| even if you have the top dollars, you have to go stand in a queue
| and wait for O(months), since mega-corporates have gotten there
| before you.
|
| For Inference, you need 1,2 and 4. For training (or fine-tuning),
| you need all of these. With newer and larger models like the
| latest Llama, 4 is truly beyond the reach of ordinary entities.
|
| This is NOTHING like open-source, where a random guy can
| edit/recompile/deploy software on a commodity computer. Wrt LLMs,
| Data/Hardware are in the equation, the playing field is complete
| stacked. This thread has a bunch of people discussing nuances of
| 1 and 2, but this bike-shedding only hides the basic point:
| Control of LLMs are for mega-corps, not for individuals.
| fishermanbill wrote:
| But there is an insidiousness to Meta calling their software
| 'open source'. It feels as if they are riding on the coat tails
| of the term as if they are being altruistic, when in fact they
| are being no more altruistic than any large corporation that
| wants to capture market share via their financial muscle -
| which I suppose touches on your last point.
| fishermanbill wrote:
| Its not open source.
|
| We don't get the data or training code. The small runtime
| framework is open source but that's of little use as its largely
| fixed in implementation due to the weights. Yes we can fine tune
| but that is akin to modifying video games - we can do it but
| there's only so much you can do within reasonable effort and no
| one would call most video games 'open source'*.
|
| _Its freeware and Meta 's strategy is much more akin to the
| strategy Microsoft used with Internet Explorer to capture the web
| browser market. No one was saying god bless Microsoft for trying
| to capture the browser market with I.E. Nothing wrong with Meta's
| strategy just don't call it open source._
|
| *weights are data and so is the video/audio output of a video
| game. If we gave away that video game output for free we wouldn't
| call the video game open source as the myriad freeware games
| essentially do.
| Palmik wrote:
| I don't think these analogies work.
|
| Meta provides open source code to modify the the weights (fine
| tune the model). In this context, fine-tuning the model is
| better converted to being able to modify the code of the game.
| fishermanbill wrote:
| So do video game developers (provide source code to modify
| their games) the analogy absolutely works. I can list a huge
| amount of actually open source software that I can see the
| source code and data for which is very different from Llama
| etc.
| OriginalMrPink wrote:
| Open Source AI is the path forward, but I have hard time
| believing that Meta should be affiliated with it.
| pera wrote:
| I wish Meta stopped using the "open source" misnomer for free of
| charge weights. In the US the FTC already uses the term _Open-
| Weights_ , and it seems the industry is also adopting this term
| (e.g. Mistral).
|
| Someone can correct me here but AFAIK we don't even know which
| datasets are used to train these models, so why should we even
| use "open" to describe Llama? This is more similar to a freeware
| than an open-source project.
|
| [1] https://www.ftc.gov/policy/advocacy-research/tech-at-
| ftc/202...
| benrutter wrote:
| This is such a good point. The industry is really putting the
| term "open source" through the ringer at the moment but I don't
| see any justification for considering the final weight output a
| "source" anymore than releasing a compiled binary would be open
| source.
|
| In fairness to Llama, the source code itself (though not the
| training data) _is_ available to access, although not really
| under a license that many would consider open source.
| zelphirkalt wrote:
| Facebook is one of the great when it comes to twisting words
| and appropriating terms in ways that benefit Facebook.
| largbae wrote:
| The All-In podcast predicted this exact strategy for keeping
| OpenAI and other upstarts from disrupting the existing big tech
| firms.
|
| By giving away higher and higher quality models, they undermine
| the potential return on investment for startups who seek money to
| train their own. Thus investment in foundation model building
| stops and they control the ecosystem.
| giancarlostoro wrote:
| I enjoy that podcast, only really listened to it a few times,
| but they definitely bring up some interesting topics, the kind
| I come on HN for.
| thierrydamiba wrote:
| I predicted this 7 months ago-can I get a podcast?
|
| https://news.ycombinator.com/item?id=38556771#38559118
| zelphirkalt wrote:
| Open data for open algo for open AI is the path forward.
| dcist wrote:
| So commoditize the complement.
| ssahoo wrote:
| Additional Commercial Terms. If, on the Llama 3.1 version release
| date, the monthly active users of the products or services made
| available by or for Licensee, or Licensee's affiliates, is
| greater than 700 million monthly active users in the preceding
| calendar month, you must request a license from Meta, which Meta
| may grant to you in its sole discretion, and you are not
| authorized to exercise any of the rights under this Agreement
| unless or until Meta otherwise expressly grants you such rights.
|
| Which open-source has such restrictions and clause?
| rmbyrro wrote:
| Which open source costs dedicated usage of 16 thousand H100
| over several months?
|
| C'mon folks, they're opening up for free to 99.99% of potential
| users what cost hundreds of millions of dollars, if not in the
| ballpark of a billion.
|
| Let's appreciate that, instead of focusing on semantics for a
| while.
| arrosenberg wrote:
| I don't think the largest tech companies in the world have
| earned that view of benevolence. Its real hard to take
| altruism seriously when it is coming from Zuckerberg.
| birdalbrocum wrote:
| Licensing is not a simple semantic problem. It is a legal
| problem that have strong ramifications, especially things are
| on their way to standardize. What Facebook is trying to do
| with their "open source" models is to exhaust possibility of
| fully open source models to be industry standarts. and create
| an alternative monopoly to Microsoft/OpenAI. Think of it as
| if an entity had right to ISO standards, they would be
| extremely rich. Eventually researchers will release pretty
| advance ML models that are fully open source(from dataset to
| training code) and Facebook is trying to block them even
| before start to prevent of the possibility of this models to
| be standard. This is a complementary tactic of the industry
| to closed source rivals and should not be understood as
| challenging to them.
|
| A good wording for this is "open-washing" as described in
| this paper:
| https://dl.acm.org/doi/fullHtml/10.1145/3630106.3659005
| maxdo wrote:
| I'm really unsure if it's a good idea given the current
| geopolitics.
|
| Open-Source Code in the past was fantastic because the West had a
| monopoly on CPUs and computers. Sharing and contributing was
| amazing while ensured that tyrants couldn't use this tech to harm
| people simply because they don't have a hardware to run.
|
| But now, things are different. China is advancing in chip
| technology, and Russia is using open-source AI to harm people on
| the scale today, with auto-targeting drones being just the start.
| Red sea conflict etc.
|
| And somehow, Zuckerberg keeps finding ways to mess up people's
| lives, despite having the best intentions.
|
| Right now you can build a semi-autonomous drone with AI to kill
| people for ~$500-700. The western world will still use safe and
| secure commercial models. While new axis of evil will use models
| based on Meta or any other open source to do whatever harm they
| can imagine with not a hint of control.
|
| This particular model. Fine-tune it to develop a nuclear bomb
| using all possible research that level of government can get on
| the scale. Killing drone swarms etc. Once the knowledge got
| public these models can be a base model to get expert-level
| knowledge to anyone who wants it, uncensored. Especially if you
| are government that wants to destroy a peaceful order for
| whatever reason.
| rmbyrro wrote:
| You think Russia and China wouldn't be able to steal any closed
| model for a couple million dollars?
| that_guy_iain wrote:
| Wouldn't even need to pay to steal it. FAANG have been shown
| to be hacked by state actors. China has some of the best
| hackers in the world.
| maxdo wrote:
| stealing a model and building the entire AI community is a
| very very big difference :
|
| Fine tune, update, Run model without very deep domain
| knowledge, that's what we receive as an outcome.
|
| If you are a software engineer and you steal a model in some
| close format of Open AI , you will not get lots of benefits
| even if you understand the format of that model, it's a
| complex beast by all means.
|
| This is a playbook how anyone can run it.
|
| So yeah, big corp is evil from one side, but oh well, think
| of North Korea, Russia etc level of evilness and what they
| can do whit that.
| talldayo wrote:
| > think of North Korea, Russia etc level of evilness and
| what they can do whit that.
|
| To date, I have not seen any "evil" applications of AI, let
| alone dangerous or even useful ones. If Russia or North
| Korea get their hands on a modern AI model, the CIA will
| get their "Red Mercury" call:
| https://en.wikipedia.org/wiki/Red_mercury
| adhamsalama wrote:
| So, only the west should be able to use AI to kill people
| because they're the good guys?
| AlexandrB wrote:
| This argument reminds me a lot of restrictions on exporting
| encryption in the 90s.
| wavemode wrote:
| You're vastly overestimating the capability of LLMs to create
| new knowledge not already contained in their training material.
| keepswag wrote:
| It was not amazing that the West had monopolies because they
| are the ones using AI and advancing AI tech to harm people. Im
| not sure what youre getting at here with that comment
|
| https://www.vox.com/future-perfect/24151437/ai-israel-gaza-w...
|
| https://www.972mag.com/mass-assassination-factory-israel-cal...
|
| https://www.theguardian.com/world/2024/apr/03/israel-gaza-ai...
| Gravityloss wrote:
| Can't it be divided into multiple parts to have a more meaningful
| discussion? For example the terminology could identify four key
| areas: - Open training data (this is very big)
| - Open training algorithms (does it include infrastructure code?)
| - Open weights (result of previous two) - Open runtime
| algorithm
| jll29 wrote:
| The question is what is "open source" in the case of a matrix of
| numbers, as opposed to code.
|
| Also, are there any "IP" rights attached at all to a bunch
| numbers coming out of a formula that someone else calculated for
| you? (edit: after all, a "model" is just a matrix of numbers
| coming out of running a training algorithm that is not owned by
| Meta over training data that is not owned by Meta.)
|
| Meta imposes a notification duty AND a request for another
| license (no mention of the details of these) for applications of
| their model with a large number of users. This is against the
| spirit of open source. (In practical terms it is not a show
| stopper since you can easily switch models, although they all
| have subtlely different behaviours and quality levels.)
| abss wrote:
| Interesting, but we have to consider this information with
| skepticism since it comes from Meta. Additionally, merely open-
| sourcing models is insufficient; the training data must also be
| accessible to verify the outcomes. Furthermore, tools and
| applications must be freely deployable and capable of storing and
| sharing data under our personal control. Self-promotion: We have
| initiated experiments for an AI-based operating system, check
| AssistOS.org. We recently received a European research grant to
| support the improvement of AssistOS components. Contact us if you
| find our work interesting, wish to contribute, conduct research
| with us, or want to build an application for AssistOS.
| Purplehermann wrote:
| Well that's it then, we're gonna die
| bzmrgonz wrote:
| I see it as a new race to build the personal computer (PC) all
| over again. I hope we can apply the lessons learned and can jump
| into open source to speed up development and democratize ai for
| all. We know how Microsoft played dirty in the early days of the
| PC revolution.
| arisAlexis wrote:
| Zuckerberg and LeCunn put humans at great risk
___________________________________________________________________
(page generated 2024-07-24 23:14 UTC)