[HN Gopher] Hello Dolly: Democratizing the magic of ChatGPT with...
___________________________________________________________________
Hello Dolly: Democratizing the magic of ChatGPT with open models
Author : hnuser0000
Score : 387 points
Date : 2023-03-24 12:21 UTC (10 hours ago)
(HTM) web link (www.databricks.com)
(TXT) w3m dump (www.databricks.com)
| bob1029 wrote:
| > Surprisingly, instruction-following does not seem to require
| the latest or largest models: our model is only 6 billion
| parameters, compared to 175 billion for GPT-3.
|
| We started seeing this in our testing. OpenAI's Curie model is
| responding very well to our fine-tuning experiments for chatbot-
| style interface. I am trying to keep us focused on quality of
| training data rather than obsessing over raw network size.
| Davinci (and derivatives) might turn out to be overkill for our
| use cases.
| imwithstoopid wrote:
| here come the "Me Too!!" announcements from everyone trying to
| catch some of the energy of this new market
|
| how long until IBM, Tesla and Oracle announce Me-Too LLMs?
| [deleted]
| gavi wrote:
| Its trained on alpaca dataset which in turn was generated from
| open ai davinci, wondering if it is actually transferring the
| weights by generating content from the source model?
| epups wrote:
| I think this is cool, but it's on the range of complexity that I
| would expect from a personal project. When you put a whole
| organization behind it, I feel you could have provided something
| extra - better datasets? Improved weights from a ton of training?
| kvmakes wrote:
| Super cool stuff!
| Mizza wrote:
| It's immediately become difficult to untangle the licensing here.
| Is this safe for production use - I have no idea if I can expect
| a DMCA from Mark if I step out of bounds with this or other post-
| Alpaca models, unless I'm missing something important. Meta
| really botched the Llama release.
| pwendell wrote:
| Yes it's nuanced, but will be simplified going forward.
|
| This uses a fully open source (liberally licensed) model and we
| also open sourced (liberally licensed) our own training code.
| However, the uptraining dataset of ~50,000 samples was
| generated with OpenAI's text-davinci-003 model, and depending
| on how one interprets their terms, commercial use of the
| resulting model may violate the OpenAI terms of use. For that
| reason we are advising only noncommercial use of this model for
| now.
|
| The next step here is to create a set of uptraining samples
| that is 100% open. Stay tuned.
| Taek wrote:
| Are you in touch with the OpenAssistant team? I believe they
| already have a more or less complete set of samples
| (100,000!) that were produced in an open environment and
| aren't encumbered by any licensing.
| pwendell wrote:
| No I haven't heard of that, we'll engage with that team.
| This is exactly what we need will look into it.
| babyyoda wrote:
| Given that Alpaca strictly specified that they released purely
| for academic use and any commercial use was prohibited given
| doing so would violate terms of service, I don't see this as
| viable for use. Looks like marketing gimmick
| rnosov wrote:
| This has nothing to do with facebook. The foundational model
| here is GPT-J which is opensource and safe to use. Sadly, it is
| inferior to state-of-the-art models such as LLaMA.
| Mizza wrote:
| But they're "using data from Alpaca". I don't know what that
| means, isn't Alpaca using data generated by ChatGPT, which
| isn't "clean" to use? Or data from Facebook, which isn't
| "clean" to use? I'm drowning.
| bilekas wrote:
| I don't know the full details but Alpaca is from Stanford
| and only based on the LLamA (not a derivative work afaik).
| That said :
|
| Also Meta's licensing here
| https://github.com/facebookresearch/llama/blob/main/LICENSE
|
| Can't be sure what that license actually reffers to, the
| language model or just the tooling in the Git Repo.
|
| I agree its a minefield, but with Meta I would eer on the
| side of caution.
| rnosov wrote:
| They are instruction tuning it using the dataset released
| by stanford-alpaca team. The dataset itself is synthetic
| (created using GPT-3) and somewhat noisy and in my view can
| be easily recreated if OpenAI ever tries to go after it
| (which is very unlikely). Anyway, facebook has nothing to
| do with anything used by this project.
| Mizza wrote:
| So, this is a "dirty" model, in that is was created by
| data which violated OpenAI ToS. Obviously, this kind of
| violation is basically fine if you're a massive
| corporation who the rules don't apply to, but it's a huge
| risk if you're a small fish.
| sebzim4500 wrote:
| That's between OpenAI and the people that recorded the
| data. No one else needs to care.
| hutzlibu wrote:
| "basically fine if you're a massive corporation who the
| rules don't apply to, but it's a huge risk if you're a
| small fish"
|
| With these things, it is usually the other way around.
|
| If you are a small fish, no one will care. But if you are
| big enough, that money could be extracted from you, then
| they will come. A big org just has better lawers and
| negotiating power, but they really cannot ignore the law.
| Especially not, if there is a competitor with money to
| sue.
|
| So if you are small and want to become big, better be
| cautious on the legal ground you are walking.
| gremlinsinc wrote:
| If you use output, from a non-profit who open sourced the
| output gained by following the TOS, as in they aren't
| using it 'for profit', it's not illegal, because:
|
| A. it's an output gained via following the letter of the
| law (TOS).
|
| B. TOS only applies directly to people who've accepted
| the TOS, unless alpaca's license/TOS ALSO forwards the
| same criterion as it's source at openai, then derivatives
| wouldn't apply.
|
| It's like if an app developer on IOS violated a TOS, and
| apple tried to go after everybody who ever used the app,
| they didn't agree directly to the TOS, only the developer
| did.
| rnosov wrote:
| ToS are not the law. It would be similar to your power
| company claiming copyright over the code written using
| "their" electricity. Not going to happen. I wouldn't be
| too concerned.
| sp332 wrote:
| No, but you could be banned from using OpenAI products in
| the future, which seems like quite a liability for a
| researcher or company.
| rnosov wrote:
| That would be anticompetitive practice that is actually
| against the law in many countries[1]. In the unlikely
| event of OpenAI ever engaging in such things they will be
| sued into oblivion.
|
| [1] https://en.wikipedia.org/wiki/Refusal_to_deal
| Spivak wrote:
| Especially when OpenAI explicitly doesn't have a claim to
| copyright on the model output.
| bilekas wrote:
| > Meta really botched the Llama release.
|
| It's no surprise really though, from what I see they recognised
| some way to monitize and rolled back their commitment.
|
| But this Dolly doesn't depend on Llama (unless I'M missing
| something), so you don't have to use it.
| leobg wrote:
| Why? Dolly had nothing to do with Llama or its weights.
|
| Besides: How would anyone ever know which model generated the
| output you are serving? AFAIK there is no fingerprint in any
| model's output. And even if there was, it would probably be
| destroyed by fine tuning "over it".
| stametseater wrote:
| > _AFAIK there is no fingerprint in any model's output._
|
| It seems like there easily could be. What if some of the data
| they trained it on didn't exist anywhere else except in the
| training set, and was put there specifically for this
| purpose? For instance they could have taught it a few poems
| that don't exist anywhere else. If you can coax the LLM of
| unknown origin into reciting those poems back to you, you
| know where it came from.
| kurthr wrote:
| Even easier have a small set of 8-10 character gibberish
| tokens it's trained on in a particular contexts (eg a non-
| existent poem). Then feed it one or several poems and see
| if a gibberish token pops out.
| eigenvalue wrote:
| I think they call these canary GUIDs. If you manage to
| generate one from an LLM then you can conclude with
| certainty that the model saw that document during
| training.
| neilv wrote:
| > _Besides: How would anyone ever know which model generated
| the output you are serving?_
|
| There's precedent for "whatever you can get away with" in
| tech companies, but establishing a culture of that at the
| start of this new big change could end up undesirable for
| most people.
|
| For example, it could relieve demand for more legal and
| sustainable ways, until it's too late. (Look at the history
| of digital entertainment media piracy and DRM and
| legislation, for example. Or look at the history of software
| piracy, where some big companies seem to actually want their
| product to be pirated, partly because it builds a bigger moat
| against competitors, and they can legally strongarm some of
| those pirates later.)
| bilekas wrote:
| This is really great news and something I felt was missing from
| the market so far. It seems everyone wants to create `moats` or
| walled-gardens with some aspect of their models etc.
|
| Nice job DataBricks, nice numbers too. Looking forward to more
| improvements.
| detrites wrote:
| Thought the same until I read this:
|
| > Contact us at hello-dolly@databricks.com if you would like to
| get access to the trained weights.
| bilekas wrote:
| This is not an issue though, they would just be the weights
| used by DataBricks, there is no reason you can't add your own
| right ?
|
| Like giving away a website template without the demo content,
| it's perfectly normal.
| superchink wrote:
| https://github.com/databrickslabs/dolly it's now available on
| GitHub
| jppope wrote:
| data transfer might actually be the problem there not
| something like trying to hide the model
| yieldcrv wrote:
| bittorrent, come on
| crosen99 wrote:
| Fine-tuning these models reminds me of the good ol' days with
| tube TVs where the slightest twist of the vertical hold dial
| meant the difference between a clear picture and useless,
| dizzying, visual nonsense.
| woeirua wrote:
| This is the real risk to OpenAI's business model. If it turns out
| that you can get most of the same outcome with drastically
| smaller and cheaper models, then OpenAI is going to have a hell
| of a time keeping customers around as it will just be a race to
| the bottom on price and bigger, more expensive models will lose
| just from a hardware cost standpoint.
| xpe wrote:
| No disrespect to the author intended, but the above comment is
| muddled.
|
| 1. OpenAI, the organization, is not equivalent to its chat
| offering.
|
| 2. Saying "the" real risk isn't persuasive. Let's examine many
| risks before claiming one is the most significant. Also, "real"
| is this usage often a throwaway (i.e. unneeded) word, in editor
| speak.
|
| 3. Let's talk about OpenAI's "business model" (though such
| discussions are tricky).
|
| 3A. Originally, OpenAI wasn't trying to "hold onto" AI
| advancements. It claimed to be a broadly funded way to explore
| fundamental questions of artificial intelligence in a non-
| commercial, ethical way.
|
| 3B. Of course, the above claim was largely aspirational,
| because it wasn't baked into their DNA in way that could
| survive the surrounding temptations for more funding, glory,
| and resources.
|
| 3C. Even with their more commercialized model of the last
| several years, it seems their business model feels like (a)
| fundraise in exchange for (b) (claimed) collective good open
| source, tools and shared research.
|
| 3D. OpenAI feels to me more and more like a commercial research
| lab; there does seem to be a lot of commercial partnering with
| their funding organizations (e.g. Microsoft).
|
| 4. I doubt the leadership there views the current ChatGPT
| models as unchanging. I expect there is a considerable revenue
| stream _around_ the space. OpenAI is well positioned to play
| the game several steps ahead of others.
|
| I would frame the broader question this way: for many years,
| there has been a hunger for this deeper AI research, due not
| only to (i) the expertise and resources required, but also (ii)
| to this hope that there is an organization that can maybe keep
| it within human or ethical bounds.
|
| Unfortunately, this amorphous hope doesn't seem to be matching
| the actual organizational incentives nor dynamics. It is also
| unclear how much demand the public in free market will have for
| nobler research.
|
| My position on these kinds of things is simple: follow the
| money. If we want an accountable public interest, AI research
| laboratory it's going to have to be designed, funded, and
| overseen very differently.
| smoldesu wrote:
| On the flip-side, OpenAI is primed to destroy their
| competitors. Partnership with Microsoft means they can buy
| Azure compute at-cost if need be. Their current portfolio of
| models is diverse on the expensive and cheap ends of the
| spectrum, with thousands of people on Twitter and HN still
| giving them lip-service. With dozens of clones hitting the
| market, OpenAI is the only one staying consistently relevant.
|
| The widespread adoption of local AI won't obsolete a well-
| priced AI API. I feel like we learned that lesson pretty
| thoroughly in the SaaS era.
| xpe wrote:
| > The widespread adoption of local AI won't obsolete a well-
| priced AI API. I feel like we learned that lesson pretty
| thoroughly in the SaaS era.
|
| Unless I am misunderstanding (?), this seems like an
| overgeneralized lesson. There are many key differences
| between these situations that make such a connection
| unlikely. Could you explain your reasoning?
| ijustlovemath wrote:
| The difference between this and SaaS is that businesses have
| been moving their (end user) products to SaaS due to wider
| broadband availability, as well as greed (read: MRR), but on
| the LLM side, people are _building new products with it_ , so
| the incentives are to keep your costs low (or free) so you
| can make more money once you release.
| nico wrote:
| That's why they are moving so fast and trying to get as much
| press/media attention as possible.
|
| They want to stay top of mind.
|
| Think about CocaCola, anyone can make a drink just as good. But
| it's almost impossible to build their brand and distribution
| from scratch.
| lfciv wrote:
| I wouldn't underestimate the power of momentum
| rashkov wrote:
| What about the high quality training data that OpenAI has
| encoded into ChatGPT? Do these other models come close to that?
| woeirua wrote:
| Why couldn't you just use OpenAI's API to feed prompts and
| then take the outputs and use them to train your own model to
| exfiltrate the best features of GPT?
| xpe wrote:
| Give it a try if you feel like it is a good thing to do.
| I'm sure some nation states are doing it.
|
| P.S. this comment does not reflect my personal values. But
| I would rather someone with values try it almost like a
| white hat pen test.
| wsgeorge wrote:
| Because it would be against their TOS, and things could
| look ugly, legally.
| tspike wrote:
| How many TOS agreements do you suppose they violated
| while training their models?
| AJ007 wrote:
| It's still an open question if any of these models,
| trained on copyright work, will themselves be eligible
| for copyright protection.
| ImHereToVote wrote:
| Ironic
| ImprobableTruth wrote:
| Is this a bit? If it's illegal to train on copyrighted
| material, then OAI has broken the law ten times over by
| training GPT3. There's absolutely zero reason for them to
| sue, they'll just ban the responsible people.
| nickthegreek wrote:
| I think their TOS forbids using the API for this. I don't
| think it covers the use of the web interface.
| circuit10 wrote:
| However:
|
| "You may not [...] except as permitted through the API,
| use any automated or programmatic method to extract data
| or output from the Services, including scraping, web
| harvesting, or web data extraction;"
| nickthegreek wrote:
| Can't be automated, so manual extraction is allowed.
| typon wrote:
| That's how Alpaca is made
| aabajian wrote:
| Anyone care to comment on why the output of these models changes
| so dramatically given so little Q&A training? It's a 6 billion
| parameter model with only 50 thousand Q&A samples.
|
| It's clear the model already "knows" the format of a Tweet (short
| length, attention-grabbing, contains hashtags). The model also
| knows stuff about language models (word2vec, tokenization), and
| can include entities from the question in its response (Dolly,
| Databricks). Yet, it just doesn't put these pieces together in
| the right way without the Q&A training.
|
| Edit: For kicks, I asked GPT-4 this question:
| https://imgur.com/a/sM4uyBn
| pwendell wrote:
| Yes this was a very surprising result... that the relatively
| small uptraining was able to unlock so much latent knowledge in
| the model.
| bogwog wrote:
| Open Assistant is doing the same thing, but actually creating a
| dataset that isn't on questionable legal grounds by creating a
| gamified web app where people can contribute: https://open-
| assistant.io/dashboard
|
| I wonder how small can these models get? From 175B to 6B with
| comparable performance is huge, but can it go lower?
| highwaylights wrote:
| I see that in its five book suggestions that it has suggested you
| should read Hitchhikers Guide twice.
|
| Not many humans would even get this answer correct.
|
| I am impressed.
| Zaheer wrote:
| How hard would it be to embed this into a NPM module so anyone
| can use it in their servers / apps locally?
| [deleted]
| sbussard wrote:
| I'd like some clarification of terms - when they say it takes 3
| hours to train, they're not saying from scratch are they? There's
| already a huge amount of training to get to that point, isn't
| that correct? If so, then it's pretty audacious to claim they've
| democratized an LLM because the original training likely cost an
| epic amount of money. Then who knows how much guidance their
| training has incorporated, and it could have a strong undesirable
| viewpoint bias based on the original training.
| joshhart wrote:
| The 3 hours is the instruction fine-tuning. The base
| foundational model is GPT-J which was already provided by
| Eleuther-AI and has been around for a couple of years.
|
| Note: I work at Databricks and am familiar with this project
| but didn't work on it.
| Taek wrote:
| Do you know why GPT-J is being used instead of NeoX or any of
| the other larger open source models?
| cuuupid wrote:
| I don't love the lack of quantitative comparison to Alpaca but a
| commercial model (which sounds like it's in the works) would
| finally move the needle on democratizing access to LLMs.
|
| Will also commend the authors for not falling into the "LLMs
| can't perform without 200B params!" fallacy. For anyone reading,
| 6B params is enough to train on a 3090. A PC rig for training or
| running inference with this would put you back maybe 4k$.
|
| The end game here is likely getting the model to perform well in
| millions of parameters on specific tasks. Most business uses of
| ChatGPT are pretty closed domain tasks, it wouldn't be a huge
| step to distill this model on a specific task and get it down to
| 150-350M params (which is roughly BART size and can run on AWS
| Lambda even).
| nothrowaways wrote:
| "ChatGPT, a proprietary instruction-following model" pun
| intended.
| jawadch93 wrote:
| [dead]
| mydpy wrote:
| What a great time to be in this field. It's advancing so quickly!
| sillysaurusx wrote:
| Interesting. DALL-E, Dalai
| (https://cocktailpeanut.github.io/dalai/), and now Dolly are all
| pronounced the same way.
|
| It feels like there should be an xkcd for this.
| joseda-hg wrote:
| Are they? (Not sarcastic, I'm not native and I wouldn't
| pronounce them all that similar at first sight)
| dwringer wrote:
| As a native speaker, no, there's hardly any consensus I've
| seen about how to pronounce them. Certainly there are trends.
| But I pronounce Dalai somewhere between "Dah-lay" and "Dah-
| lie", and DALL-E _sorta_ like Dolly ( "Dah-lee"), but with a
| deliberate pause ("Dahl Ee").
| [deleted]
| outside1234 wrote:
| What could go wrong
| JLCarveth wrote:
| > are all pronounced the same way
|
| No they're not.
| mejutoco wrote:
| AFAIK DALL-E is pronounced as Dali, as in Salvador Dali.
|
| https://en.wikipedia.org/wiki/Salvador_Dal%C3%AD
| chatmasta wrote:
| I figured it was a reference to the Dalai Lama (which doesn't
| invalidate your comment, since that's also pronounced like
| Dali). LLM -> Llama -> Dalai Lama
| rburhum wrote:
| Dali has an accent at the end, which has the emphasis in
| the last letter. Dalai does not. They sound very different.
| "dah-lee" vs https://m.youtube.com/watch?v=JhFbvuKn45w
| sillysaurusx wrote:
| Hmm. Is Salvador Dali pronounced differently than Dolly or
| Dalai? The wikipedia page has "dah-lee" as the phonetic,
| and https://www.google.com/search?q=pronounce+salvador+dali
| sounds the same as
| https://www.google.com/search?q=pronounce+dalai+lama. So it
| seems like all three are identical.
| ToValueFunfetti wrote:
| The emphasis in Dali is on the second syllable, which is
| at least different from Dolly. I've always pronounced
| Dalai Lama the same as I would Dolly Lama, but Cambridge
| dictionary is saying it should be Da-lay in both US and
| UK pronunciations.
|
| Tangentially, it seems like most of the results for both
| searches were autogenerated with TTS programs. I wonder
| if our pronunciations will shift towards TTS mistakes
| over time. Probably not, these videos only have a few
| thousand views, but neat if true.
| mejutoco wrote:
| Dali has the stress on the last syllable, hence the
| accent (but Dall-e probably not). In my native language
| Dalai is pronounced "Da-lie", like another comment says
| above. TIL Dolly is pronounced so similarly. I thought
| the Do sounded like Doberman, but apparently not.
|
| https://www.merriam-webster.com/dictionary/dolly
| 4ndrewl wrote:
| I thought "Dalai" pronounced "Dall Eye" rhymes with "Shall
| I" "Dali" pronounced "Dahl eee" rhymes with "Carly"
| tetraca wrote:
| This is all very weird to me because I've always
| pronounced Dalai as "Dah-lay".
| chatmasta wrote:
| Interesting. According to Google, it's a British ("Da-
| lie") vs. American ("Da-lee") difference.
| [deleted]
| bilekas wrote:
| Handy also to think off WALL-E. At least that where my
| assumption came from.
| [deleted]
| cosmojg wrote:
| It's quite clearly a reference to WALL-E the environmentally
| conscious robot, which is pronounced as you'd expect. I like
| to think of it as DALL-E the surrealist robot painter.
| mejutoco wrote:
| That is exactly my interpretation. Both wall-e and Dali. I
| think we are in agreement.
| JohnFen wrote:
| I totally failed to make that connection! Was that the
| intended reference? What's the link to WALL-E?
| renewiltord wrote:
| Wow, just discovered that the American pronunciation for Dalai
| Lama is Da-lee. Well, that's a discovery.
|
| This is like when Khan Academy came out and there was a guy
| online saying it's a terrible brand because it sounds like Con
| Academy which it doesn't in my dialect.
|
| Took a while to get it.
| rockzom wrote:
| How do you say Khan?
| ricardobeat wrote:
| Kan / k-a-n, A like in "father"
| theSuda wrote:
| I found this which matches how I say it (as an Indian)
| https://www.howtopronounce.com/khan/4145893
|
| It's the KH sound that doesn't really exist in English
| hence many get it wrong.
| gowld wrote:
| The KH is one thing, but for "con"-fusion (hah!), it's
| also about the "higher" "caan" vs "cawn", which is a very
| subtle difference.
| xg15 wrote:
| I guess after carcinisation comes dolly-fication...
|
| But I do like the hang for whimsical naming schemes in that
| field. First sesame street characters, now apparently
| everything sheep...
| thewataccount wrote:
| I might be having a moment - but I can't find any links to a git
| repo, huggingface, or anything about the
| models/weights/checkpoints directly from the article.
|
| I just see a zip download that AFAIK also doesn't contain the
| weights/checkpoints. I find this a bit odd, the contents of the
| zip (from the gdrive preview) look like they should be in a git
| repo, and I assume they download the model from somewhere? GDrive
| usually has rate limits which I'm concerned about.
|
| If anyone from databricks reads this - are there plans to publish
| this on a git repo somewhere, as well as the weights/checkpoints?
|
| EDIT: Oh I just noticed
|
| > Contact us at hello-dolly@databricks.com if you would like to
| get access to the trained weights.
|
| This... seems odd for a article titled "Democratizing the magic
| of ChatGPT with open models"?
| MagicMoonlight wrote:
| So it's another classic private only model that they'll pull as
| soon as the suckers have trained it up for them
| thequadehunter wrote:
| Lol. This is classic ML crap. Files with no documentation, no
| links, multiple files with the same-ish name but no explanation
| for which one is what.
| nofinator wrote:
| Yes, the ZIP on Google Drive owned by one of their engineers is
| weird considering they have a pretty active GitHub presence of
| open source projects, though it does use an Apache license like
| their others.
|
| Perhaps Databricks suspected another big announcement coming
| soon and wanted to get this announcement out?
| amrb wrote:
| Are they pulling a Facebook, on model access?
| thewataccount wrote:
| From what I can tell they're fine-tuning EleutherAI's GPT-J.
|
| Alpaca was made to fine-tune LLaMa, however they also
| released their dataset they used to do this, and it looks
| like Dolly is this dataset applied to GPT-J, and does not use
| LLaMa itself.
| dragonwriter wrote:
| I think they are dodging unclear legal issues surrounding
| certain steps of the model-building process while being as
| open as possible with the components given that constraint,
| allowing downstream users to make their own legal risk vs.
| effort choices.
| pwendell wrote:
| Yes, this.
| amrb wrote:
| Given the hardware/energy need to train it be nice, to have
| a legal document that said something like this model has no
| warranty, it may be a break through machine or a hand
| grenade. Use at you own risk!
| slimsag wrote:
| The README also says this:
|
| > This fine-tunes the [GPT-J
| 6B](https://huggingface.co/EleutherAI/gpt-j-6B) model on the
| [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca)
| dataset using a Databricks notebook.
|
| > Please note that while GPT-J 6B is Apache 2.0 licensed, the
| Alpaca dataset is licensed under Creative Commons NonCommercial
| (CC BY-NC 4.0).
|
| ...so, this cannot be used for commercial purposes
| dwallin wrote:
| Essentially every model worth anything has been trained on a
| unfathomably large amount of data under copyright, with every
| possible licensing scheme you could imagine, under the
| assumption that it is fair use. While you can argue that it's
| all built on a house of cards (and a court may well agree
| with you) it's kind of arbitrary to draw a line here.
| judge2020 wrote:
| > under the assumption that it is fair use.
|
| No, because you as a human looking at "art" over your
| lifetime and learning from it is not "fair use" of the
| copyright, it's no-use at all. This is the crux of every
| argument for both for language models and AI Art models,
| that their tools are learning how to draw, learning what
| styles and characteristics of input art correspond the most
| with words, and creating art with that knowledge just like
| any other human, not simply collaging together different
| pieces of art.
| Taek wrote:
| Fair use via "this is completely impossible to regulate so
| you might as well embrace it"
| ambicapter wrote:
| > ...so, this cannot be used for commercial purposes
|
| The implication being that you're only "democratizing"
| something if people can make money off of it?
| dragonwriter wrote:
| > ...so, this cannot be used for commercial purposes.
|
| The legal relation between models and training data sets
| seems murky; of course, with the build tooling, you can also
| substitute in another instruction-following training set if
| you want to avoid licensing issues with the Alpaca set,
| whereas if you aren't concerbed with them, you can just blaze
| ahead.
| chpatrick wrote:
| As far as I know the copyright situation for models is
| ambiguous and also depends on the region. In the US you can't
| copyright data made by an automated process but you can in
| the EU, or something to that effect.
| yieldcrv wrote:
| > ...so, this cannot be used for commercial purposes
|
| or you can raise $30,000,000 right now and worry about the
| copyright infringement lawsuit in 2026 or never.
| thewataccount wrote:
| > ...so, this cannot be used for commercial purposes
|
| Can't they also release the fine-tuned weights as non-
| commercial as well?
| dopidopHN wrote:
| Thanks I missed that email while skimming
| pwendell wrote:
| Full source code is up here now:
|
| https://github.com/databrickslabs/dolly
|
| Sorry it took us a day to get the external repo setup.
| thewataccount wrote:
| Awesome thank you!
|
| Was the Alpaca dataset being licensed as non-commercial only
| the reason you aren't releasing the weights? Is it possible
| to just release them under the same license?
| pwendell wrote:
| Yes the issue is that some of the training data is arguably
| tainted with some noncommercial license (it's nuanced,
| discussed below in my comment). We are releasing weights to
| people who request but we just wanted to have an email
| request flow so that we can make sure people know it's just
| for noncommercial purposes.
|
| Working on a model without this issue. Certainly our goal
| is totally open models anyone can use for anything.
| thewataccount wrote:
| Understandable, thank you for the response!
|
| I've been a bit jaded by the "open/democratizing ai"
| stuff and then having companies stiff us at actually
| making it open - but not wanting to be the first to
| litigate these new types of issues ml brings is very
| understandable.
|
| Question - Would you consider benchmarking a single 4090
| for your training? While training in a few hours with 8x
| A100's is impressive, myself and I think others are
| curious how that translates to consumer hardware. IMO
| running/fine-tuning on consumer hardware is the ultimate
| endgame for all ai models.
| robwwilliams wrote:
| Look forward to a response. We are heading toward a 6X
| Bizon 4090 system as a test bed.
|
| https://bizon-tech.com/bizon-zx5500.html
| m3affan wrote:
| Databricks is on a roll
| jppope wrote:
| Does anyone else find it ironic that all these ChatGPT "clones"
| are popping up when OpenAi is supposed to be the ones open
| sourcing and sharing their work?
|
| I guess: "You Either Die A Hero, Or You Live Long Enough To See
| Yourself Become The Villain"?
| Taek wrote:
| Sam Altman has turned into a megalomaniac.
| brandall10 wrote:
| Possibly, but it is a bit unusual that he has zero equity in
| the company. So it might not be for monetary reasons.
| [deleted]
| JohnFen wrote:
| > when OpenAi is supposed to be the ones open sourcing and
| sharing their work?
|
| OpenAI renounced being open source. Don't let the name fool
| you.
| throwaway4837 wrote:
| I think all of the "AI alignment" talk is mostly
| fearmongering. It's a cunningly smart way to get ignorant
| people scared enough of AI so they have no choice but to
| trust the OpenAI overlords when they say they need AI to be
| closed. Then OpenAI gets a free pass to be the gatekeeper of
| the model, and people stop questioning the fact that they
| went from Open to Closed.
|
| AI being tuned to be "safe" by an exceedingly small set of
| humans is the thing we should be afraid of. It's the
| effective altruism effect: if you bombard people enough with
| "safety" and "alignment" speak, they will look past the fact
| that you're mainly interested in being a monopoly. My bigger
| conspiracy theory is that Bill Gates getting behind "AI
| alignment" is a calculated move to get people to look past
| Microsoft's unilateral involvement.
| soup10 wrote:
| I don't know what press releases you've been reading, but
| the model is closed so they can make money off it, that's
| pretty obvious.
| throwaway4837 wrote:
| I think that is a simple take and underestimates the
| insidious nature of the AI alignment initiatives. Or
| maybe I'm overestimating it.
| TigeriusKirk wrote:
| At this point I'm really not sure what they're up to in
| terms of grand strategy. I don't even know that making
| money is their ultimate goal. At a certain level of
| ambition money is just a tool to get what you really
| want.
| brandall10 wrote:
| It's interesting to note that Altman has no equity in the
| company. One of the primary motives of being a for-profit
| company that was espoused was to be competitive with big-
| tech AFA bringing in top-level research talent.
| JohnFen wrote:
| I don't think that Altman's lack of equity position in
| OpenAI means anything at all when it comes to what
| OpenAI's goals are.
|
| We know what their immediate goals are: to make as much
| money as possible. The only question is what their
| longer-term goals are.
| 0xDEF wrote:
| AI and high-performance semiconductors are the only
| technological fields where the US and allies haven't been
| surpassed by Russia and China.
|
| There is probably a lot of political pressure on OpenAI to be
| as closed as possible. Remember the US government has banned
| Nvidia from exporting A100/H100 to China/Russia. Those are the
| same chips OpenAI uses for both training and inference.
| amelius wrote:
| Anyone in China/Russia who can comment on the actual
| situation? How difficult is it to train/run AI models where
| you are living?
| coolspot wrote:
| Russia is simply importing A100s through shell companies in
| UAE.
| htrp wrote:
| TLDR:
|
| Download GPT-J-6B from Eleuther
|
| Download Alpaca Fine Tuning Code + Alpaca Examples
|
| Train for 6 hours or so.
|
| Get vaguely good RLHF model
| typon wrote:
| Key point is vaguely good. Scale is still important and that
| manifests in the difference between gpt3.5 and gpt4 based
| chatgpts. It's qualitatively and quantitatively so much better
| in pretty much every benchmark. There is no way around the
| bitter lesson.
| bodyfour wrote:
| > There is no way around the bitter lesson.
|
| Isn't there? I'm certainly not sure, based on the results
| published over the last weeks and months.
|
| The giant GPT-{3.5,4} models show that if you make the model
| big enough and throw enough data at it you can produce an AI
| capable of conversing on basically any topic, in dozens of
| languages. There are plenty of different takes on how near-
| human its abilities are on specific tasks, but it's worth
| stepping back and appreciating how super-human the _breadth_
| of this knowledge is.
|
| But it's also not clear if a mega-model is anything close to
| the most efficient way of storing knowledge. After all, you
| don't need to memorize every fact in Wikipedia if you know
| how to effectively search it.
|
| And we're currently seeing a daily explosion in these
| capabilities. Today's flavor is interfacing with Wolfram, but
| we've also seen web searches, python coding, etc. That, I
| think, it the real superpower that comes out of this: you or
| I can answer a question by "doing a web search" or "query a
| database" or "use wolfram" or "develop a python program that
| finds the answer" However, an AI could do tasks like this
| just by "thinking" about it. Maybe it would be as natural as
| we find blinking.
|
| That to me is the real breakthrough in stuff like Alpaca --
| start with a mega-model and prompt it with something like:
| "After this paragraph, you are going to be speaking to a AI
| model similar to yourself but much more primitive. Its task
| will involve interfacing with English speakers, so converse
| with it only in that language. It has access to the same
| {X,Y,Z} APIs you have so any time it has trouble answering a
| question, prefer to give hints about how it could find the
| answer using those APIs rather than providing the answer
| directly yourself. Only give an answer directly if it
| repeatedly fails to be able to answer it by using an API.
| I've provided a large set of standardized tests used by
| humans at this URL -- start by asking it questions intended
| for a preschool-aged child. Each time it is able to answer
| new questions at a given level correctly 99% of the time
| increase the material's level until it is able to achieve
| that score on a test designed for a Computer Science PhD
| candidate"
|
| How large would the "student" model have to be to succeed at
| this deep but narrower task? I think the answer right now "we
| have no idea". However if the model has the advantage that it
| can rely on external knowledge and tools from the start (and
| is rewarded by the "teacher" for doing just that) I bet it'll
| be a lot smaller than these mega-models. Sure, you wouldn't
| be able to disconnect the "student-AI" from its APIs and
| expect it to converse with you in Hungarian about the history
| of yacht design, but that might not be a capability it needs
| to have.
|
| My personal hunch is that we're going to find these "AI-
| taught specialist AI, with API access" models will be a lot
| smaller than most people are expecting. That's the moment
| when things REALLY change: instead of pairing a human with a
| mega-model AI, if specialized models are cheap someone can
| say "spin up 100K expert-programmer AIs and have them
| supervized by 5K expert-manager AIs and have them build XYZ"
|
| Or if you need it to work on an existing task you'd
| specialize further -- you'd go to your AI vendor and say "I'd
| like to license the weights for your expert-programmer model,
| but first have it read these 200 books I consider important
| to my problem domain and then show it every commit ever made
| by a human to my git repo and every design document I have"
| typon wrote:
| Very good analysis. I disagree with a fundamental point
| though: If you don't consider compute cost and just want
| the best possible AGI, then there's nothing stopping you
| from supercharging the mega-models with the same
| capabilities as the smaller models - and if the current
| scaling shows anything, the mega models will just become
| even better.
| bodyfour wrote:
| > If you don't consider compute cost [...]
|
| Yes, but what if you do? Imagine your hyper-specialzied
| API-heavy model takes 10x less resources to answer a
| question (or at least a question relevant to the task at
| hand) Won't it be more powerful to have a model that can
| run 10 times as fast (or run 10 instances in parallel)?
|
| What if the ratio turns out to be 100x or 1000x?
|
| So I agree that the cutting edge of "best possible AGI"
| might mean building the largest models we can train on
| massive clusters of computers and then run on high-end
| hardware. My hunch, though, is that models that can be
| run on cheap hardware and then "swarmed" on a problem
| space will be even more powerful in what they can perform
| in aggregate.
|
| Again, it's just my hunch but right now I think
| everybody's predictions are hunches.
|
| I'll actually go one bit further: even for a linear task
| that can't be "swarmed" in the same way, it could be that
| cheaper-per-token models could even do better on linear
| problem-solving tasks. Existing models already have the
| ability to use randomness to give more "creative", if
| less reliable, answers. This is inherently parallelizable
| though -- in fact Bard seems to be exposing this in its
| UI in the form of multiple "drafts". So what if you just
| ran 100 copies of your cheap-AI against a problem and
| then had one cheap-AI (or maybe a medium-AI) judge the
| results?
|
| Or at the risk of a getting too anthropomorphic about it:
| imagine you as a human are writing a program and you get
| stuck on a tricky bit -- you know that the problem should
| be solvable but you've never doing anything similar and
| don't know what algorithm to start with. Suppose then you
| could tell your brain "Temporarily fork off 100 copies of
| yourself. 10 of them go do a literature review of every
| CS paper you can find related to this topic. 10 of you
| search for open source programs that might have a similar
| need and try to determine how their code does it. The
| other 80 of you just stare off into the middle distance
| and try to think of a creative solution. In two human-
| seconds write a summary of your best idea and exit. I'll
| then read them all and see if I/we are closer to
| understanding what to do next"
|
| For us, this type of mental process is so alien we can't
| even imagine what it would feel like to be able to do. It
| might come completely natural to an AI, though.
| not2b wrote:
| Sometimes you do need to consider compute cost, say if
| you want a small but high quality model that can run on a
| smart phone to perform a task. For example, with camera
| input, identify a plant or animal, while in a remote area
| with no cell signal, so it has to yield an answer without
| communicating with a server. What's the smallest, most
| efficient model that can do that effectively? Build that.
| avereveard wrote:
| > you don't need to memorize every fact in Wikipedia if you
| know how to effectively search it.
|
| yeah you're onto something. models good enough to sustain a
| conversation where I bring my own data as a primer are
| probably more useful that models that have a frozen
| knowledge of everything. the killer feature of gpt-4 is the
| 32k token size, which allows unprecedented amount of input
| to be fed into the knowledge graph and queried.
| feanaro wrote:
| Isn't it the case that we literally have no clue how GPT4 and
| GPT3.5 are different in terms of training, given OpenAI
| doesn't want to disclose anything at all?
| typon wrote:
| It's not true we know nothing. We know a little bit by
| using the two models from their API. Given the time per
| inference and the limit on messages per day for GPT4, I'm
| willing to bet it's doing around 10x more compute than
| GPT3.5. If that's because it has 10x more weights, I don't
| know. But it wouldn't be a terrible guess.
| feanaro wrote:
| So your estimate is that GPT4 has 1.75 trillion weights?
| dwaltrip wrote:
| Is there anything that affects inference compute time
| besides the number of parameters? Assuming same hardware,
| etc.
| typon wrote:
| Yes - for example adding memory to the attention
| mechanism (similar to RETRO or Memorizing Transformers
| paper)
| computerex wrote:
| We don' have the details, it is true. But empirically and
| based on their report gpt-4 is notably better than chatgpt.
| feanaro wrote:
| Better, yes, and for that we have evidence. But is the
| improvement stemming simply from even more data? That's
| what I'm questioning.
| computerex wrote:
| This paper is pretty approachable and goes over the
| "scaling laws" in detail:
| https://arxiv.org/abs/2206.07682
|
| In short, yes. More data, higher quality data, more
| epochs on the data. That is the name of the game.
| stevenhuang wrote:
| It's speculated it has same number of parameters, but
| more compute and is multi modal.
| UncleEntity wrote:
| Free is better than $$/token imho.
|
| If you have a use case or a bunch of disposable income then
| go with the "bitter" one.
___________________________________________________________________
(page generated 2023-03-24 23:01 UTC)