[HN Gopher] StableLM: A new open-source language model
___________________________________________________________________
StableLM: A new open-source language model
Author : davidbarker
Score : 980 points
Date : 2023-04-19 15:10 UTC (7 hours ago)
(HTM) web link (stability.ai)
(TXT) w3m dump (stability.ai)
| [deleted]
| Garcia98 wrote:
| I really dislike this approach of announcing new models that some
| companies have taken, they don't mention evaluation results or
| performance of the model, but instead talk about how
| "transparent", "accessible" and "supportive" these models are.
|
| Anyway, I have benchmarked stablelm-base-alpha-3b (the open-
| source version, not the fine-tuned one which is under a NC
| license) using the MMLU benchmark and the results are rather
| underwhelming compared to other open source models:
|
| - stablelm-base-alpha-3b (3B params): 25.6% average accuracy
|
| - flan-t5-xl (3B params): 49.3% average accuracy
|
| - flan-t5-small (80M params): 29.4% average accuracy
|
| MMLU is just one benchmark, but based on the blog post, I don't
| think it will yield much better results in others. I'll leave
| links to the MMLU results of other proprietary[0] and open-
| access[1] models (results may vary by +-2% depending on the
| parameters used during inference).
|
| [0]: https://paperswithcode.com/sota/multi-task-language-
| understa...
|
| [1]: https://github.com/declare-lab/flan-
| eval/blob/main/mmlu.py#L...
| mnkv wrote:
| How did you run the benchmarking, zero-shot or few-shot? I
| think a fair comparison would be Llama-7B which got an average
| ~35% for 5-shot.
| Garcia98 wrote:
| 5-shot prompting.
| jvm wrote:
| Doesn't make much sense to compare a model that's not fine
| tuned to flan models that are fine tuned. Makes more sense to
| compare to something like T5 base where it's probably a lot
| more comparable.
| Garcia98 wrote:
| I would have compared it to the fine-tuned version if it had
| been released under a truly open-source license. I think
| developers implementing LLMs care more about licensing than
| about the underlying details of the model.
|
| Also t5-base is 220M params vs 3B params of stablelm, not
| really a fair comparison anyways.
| Taek wrote:
| They used training data with restricted licenses to produce
| the fine-tuned models, it's not a decision that stability
| was able to make.
| Garcia98 wrote:
| I disagree, they made the decision to use datasets with
| restrictive licensing, jumping the
| alpaca/gpt4all/sharegpt bandwagon.
|
| They also chose to toot their horn about how open-source
| their models are, even though for practical uses half of
| their released models are not more open source than a
| leaked copy of LLaMa.
| peterstjohn wrote:
| So just use their base model and fine-tune with a non-
| restrictive dataset (e.g. Databricks' Dolly 2.0
| instructions)? You can get a decent LoRA fine-tune done
| in a day or so on consumer GPU hardware, I would imagine.
|
| The point here is that you can use their bases in place
| of LLaMA and not have to jump through the hoops, so the
| fine-tuned models are really just there for a bit of
| flash...
| sunshadow wrote:
| Demo is overloaded. Here is Colab link if you wanna check the
| quality: https://colab.research.google.com/github/Stability-
| AI/Stable...
| [deleted]
| drexlspivey wrote:
| Your session crashed after using all available RAM. If you are
| interested in access to high-RAM runtimes, you may want to
| check out Colab PRO
| sunshadow wrote:
| You can try 8bit
| xkapastel wrote:
| Using 8-bit still runs out of RAM for both the 3B and 7B
| models. It's unclear if it's because it still uses more
| than the available RAM, or if it's just quietly not using
| 8-bit since it's not implemented.
| BudaDude wrote:
| This notebook will need high ram and a premium GPU. It blew
| past the 16gb GPU that comes with Colab Pro
| andy_xor_andrew wrote:
| > The context length for these models is 4096 tokens.
|
| !!! And I was excited that llama gave us 2048!!
| txtai wrote:
| Great to see the continued release of open models. The only
| disappointing thing is that models keep building on CC-BY-NC
| licensed datasets, which severely limits their use.
|
| Hopefully, people consider txtinstruct
| (https://github.com/neuml/txtinstruct) and other approaches to
| generate instruction-tuning datasets without the baggage.
| vagab0nd wrote:
| I look forward to a future where almost all software would allow
| me to plug in an LLM API:
|
| - browser: ad removal/skipping
|
| - RSS: information aggregation
|
| - recommendation systems
|
| - games: customized NPC scripts; AI opponents
|
| - home automation: personal butler
|
| Hopefully, there would be more than one base-layer LLM providers
| to choose from.
| rvz wrote:
| Finally, the true race to zero is now on.
|
| Stability and others are already at the finish line in the race
| to zero. OpenAI.com cannot get any cheaper and eventually will
| have to increase their prices.
|
| There is no escape and as I said before, open source AI models
| will eventually swamp and survive longer and intergrate faster
| than even OpenAI.com's ChatGPT.
|
| Expect open source models to get smaller and even better such
| that it will fit in an iPhone, on device.
|
| Stay tuned.
| [deleted]
| anticensor wrote:
| Why are you referring to OpenAI as ai.com?
| netsroht wrote:
| I think he's just emphasizing that OpenAI is in fact not
| open, thusly it's crossed out.
| [deleted]
| jhoelzel wrote:
| Has anybody figured out the hardware requirements yet?
|
| This could be the beginning of something beautifull and i am
| always impressed by the work stable diffusion does!
|
| They are the true openAI (without empty promises in the middle
| ;))
| lanstin wrote:
| GPT4 supposedly cost USD 100M to train and seems much smarter
| than anything else released. is there any chance to get this much
| money for open source training? rea lly big EU grants or some
| philanthropist or?
| dougmwne wrote:
| This is absolutely huge. LLaMA weights leaking was a big deal,
| but you couldn't actually use them without attracting Meta's ire.
| Would love to see some benchmarking vs. LLaMA and GPT.
| pyinstallwoes wrote:
| What is IRE?
| vineyardmike wrote:
| It's a noun meaning anger or wrath. Usually it's specific not
| just to triggering anger _the emotion_ , but an angry
| _response_. So attracting Meta's ire means attracting an
| angry response from meta - like the legal response users of
| LLaMA have triggered.
|
| It's not an acronym.
| dougmwne wrote:
| The real thing we should be asking is if the 3B or 7B model
| is capable of correctly answering this question.
| jwineinger wrote:
| anger. probably a lawsuit in this context
| jmiskovic wrote:
| There were no lawsuits around LLaMA. They used DCMA
| requests to take down some GitHub and HuggingFace releases
| but the majority of releases (Alpacas and other animals)
| was left alone. It was enough to prevent commercial use,
| though. Edit: a lawsuit would be quite interesting, to
| clear up many things around how copyright works for LLMs.
| inportb wrote:
| Ire is the result of irritation.
| ElFitz wrote:
| Are you saying we shouldn't scratch Meta's itch?
| wsgeorge wrote:
| Oh this is good. :)
| sangnoir wrote:
| It's a department at meta: Intellectual Rights Enforcement.
| /s
|
| "Ire" is a synonym for "anger" or "wrath"
| pyinstallwoes wrote:
| That's actually what I thought it might be. Thanks lol.
| antibasilisk wrote:
| Internal recurrent expenditure
| jacooper wrote:
| Do we even know if model weights are copyrightable or not?
| sebzim4500 wrote:
| No but unless you enjoy being sued by Meta you have to act as
| if they are.
| janmo wrote:
| I was asking myself the same question and am 99% sure it
| isn't protected by IP laws. It is another story for the
| training data and the source code used to run the model.
| Invictus0 wrote:
| A JPG is copyrightable and that's just an array of numbers.
| Model weights seem pretty similar as a class.
| dougmwne wrote:
| It's not the JPG data you can copyright, but the
| recognisable image it produces. Every time you re-save or
| resize the image, the data changes, but the recognizable
| image remains.
|
| It's not clear how this process applies to model weights.
| Once you run another training epoch on them, the data has
| changed. What is the essential copyrightable, trademarkable
| or patentable thing that remains? A legally untested
| question for sure.
| AustinDev wrote:
| Couldn't you just transform the weights in some predictable
| way and then untransform them before use in runtime? The
| weights at rest would be completely distinct from the meta
| weights.
| brucethemoose2 wrote:
| All sorts of llama weights are on huggingface now.
| prions wrote:
| > Supportive. We build models to support our users, not replace
| them. We are focused on efficient, specialized, and practical AI
| performance - not a quest for god-like intelligence. We develop
| tools that help everyday people and everyday firms use AI to
| unlock creativity, boost their productivity, and open up new
| economic opportunities.
|
| Refreshing take on the peak alarmism we see from tech "thought
| leaders"
| gumballindie wrote:
| Well exactly. AI _is_ a tool and a very good one at that.
| CamperBob2 wrote:
| I dunno... god-like intelligence would be pretty useful. I'll
| take a brochure.
| nashashmi wrote:
| do you trust god?
| CamperBob2 wrote:
| Well, ...
|
| OK, I withdraw the comment.
| sdenton4 wrote:
| Why should I need to? Isn't God on the blockchain? (j/k)
| garbagecoder wrote:
| Yeah all the Terminator energy around these AI things is so
| off-putting. They aren't like that. They're big matrices and
| they are very cool tools!
| bbor wrote:
| What if big matrices are the last missing piece to research
| going on since the 50s...
| circuit10 wrote:
| But the concerns about AI taking over the world are valid and
| important; even if they sound silly at first, there is some
| very solid reasoning behind it. They're big matrices, yes,
| but they're Turing-complete which means they can
| theoretically do any computational task
|
| See https://youtu.be/tcdVC4e6EV4 for a really interesting
| video on why a theoretical superintelligent AI would be
| dangerous, and when you factor in that these models could
| self-improve and approach that level of intelligence it gets
| worrying...
| varunjain99 wrote:
| Well, it's to their benefit to portray their models as working
| alongside and enhancing humans, as opposed to replacing us. So
| it sounds a bit like marketing speak to me.
|
| And it's to the benefit of many of those tech "thought leaders"
| to be alarmist since they don't have much of the AI pie
| seydor wrote:
| Doesn't sell as much, though
| reubenmorais wrote:
| This is just marketing. They're positioning themselves as
| somehow "more human" while building the exact same technology.
| When a model supports me by doing the work I'd otherwise hire
| someone to do, the model just replaced someone. And this goes
| without saying, but a large amount of outsourced tasks today
| don't exactly require "god-like intelligence".
| rmbyrro wrote:
| That was probably said about the automobile, when it replaced
| horses, or about electrical lamps, when replaced oil-based
| lamps, no?
|
| I mean, every city had an army of people to light up and down
| oil lamps in the streets, and these jobs went away. But
| people were freed up to do better stuff.
| leovailati wrote:
| It is different this time. I bet that was also said when
| the transformations that you mentioned occurred, but _this
| time_ it really is different.
|
| LLM models are pretty general in their capabilities, so it
| is not like the relatively slow process of electrification,
| when lamplighters lost their jobs. _Everyone_ can lose
| their jobs in a matter of months because AI can do close to
| everything.
|
| I am excited to live in a world where AI has "freed" humans
| from wage slavery, but our economic system is not ready to
| deal with that yet.
| [deleted]
| antibasilisk wrote:
| >Refreshing take on the peak alarmism we see from tech "thought
| leaders"
|
| It's not alarmism when people have openly stated their intent
| to do those things.
| dragonwriter wrote:
| Its alarmism to support government regulation to reinforce
| the moat when industry leaders say they intend to do it, but
| also that the danger of it being done is why competition with
| them must be restricted by the State (and why they can't,
| despite being, or being a subsidiary of, a nonprofit founded
| on an openness mission, share any substantive information on
| their current models.)
| ericpauley wrote:
| I think there's a misunderstanding here (and on Alpaca) about
| exactly what _non-commercial_ means with respect to software
| licenses. Here 's some related discussion:
| https://opensource.stackexchange.com/a/12074
|
| The CC-NC licenses cover modification and redistribution ("use"
| in the license). They apparently have no bearing on whether you
| can personally _run_ the software. And of course the outputs of
| the software are not copyrightable.
| GaggiX wrote:
| the models: https://huggingface.co/stabilityai/stablelm-base-
| alpha-3b, https://huggingface.co/stabilityai/stablelm-base-
| alpha-7b
|
| There are also tuned version of these models:
| https://huggingface.co/stabilityai/stablelm-tuned-alpha-3b
| https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b, these
| versions are fine-tuned on various chat and instruction-following
| datasets.
|
| The Github repo mentions that the models will be trained on 1.5T
| tokens, this is pretty huge in my opinion, the alpha models are
| trained on 800B tokens. The context lenght is 4096.
| bhouston wrote:
| These models are huge. I assume they are not quantized down to
| 4bits yet.
| brucethemoose2 wrote:
| Quantized versions will pop up on huggingface very soon, if
| they arent already there. It takes basically no time, much
| less than something like a alpaca finetune.
| nwoli wrote:
| I love stability ai so much
| enono wrote:
| [flagged]
| paxys wrote:
| OpenAI really needs to donate its name to Stability.
| Filligree wrote:
| The 7B model just barely fits in a 3090, but I've yet to get
| useful output; it usually gets stuck in a repetition loop.
|
| Spent some time playing with sampling parameters, to no effect.
| Has anyone else had a go?
| cheald wrote:
| If you're getting stuck in a sampling loop, try raising the
| temperature (0.6+) and/or increasing the repetition penalty
| (1.1-1.2 often work well).
| nlightcho wrote:
| Make sure you are using the instruction tuned model. The base
| model will be difficult to prompt.
|
| It works in 8-bit with about 12GB of VRAM usage. Here's sample
| code:
|
| https://gist.github.com/AlexanderDzhoganov/a1d1ebdb018e2e573...
| dang wrote:
| https://github.com/Stability-AI/StableLM
| [deleted]
| quickthrower2 wrote:
| I found it a bit difficult to get answers from:
|
| Q. Can you recommend a good vegan restaurant in Sydney, near the
| water
|
| A. I'm not a vegan, so I don't have a favorite restaurant. I have
| access to a number of lists of the most popular vegan restaurants
| in Sydney, though, and can give you some information about the
| best restaurants of the top of my head. Please let me know if
| there's anything else I can do to assist.
|
| Q. Ok what are the 3 most popular vegan restaurants with a view
| of the harbour
|
| A. The most popular vegan restaurants near the water are:
|
| Q. Are what?
|
| A. The most popular vegan restaurants near the water are
|
| Q. Name a vegan restaurant
|
| A.
|
| 1. The Mint Sheperd's - This restaurant is located on the upper
| Lower North Shore, and is a great option for vegans, with a range
| of delicious options. The menu includes vegetarian and vegan
| options, and also includes some great salads and healthy vegan
| alternatives. 2. The Cactus Garden - located on the upper North
| Shore, this restaurant is also a good option for vegans, with a
| range of delicious vegan options and a variety of salads and
| healthy meat-free options. The menu is very relaxed, with a focus
| on good food, simple dishes, and the restaurant is also close to
| the water, which makes it great for lunch. 3. The Lifesaving
| Vegan Cafe - located on the Upper North Shore, this restaurant is
| another great option for vegans, with a range of delicious vegan
| and vegetarian options, plus a range of salads and healthy meat-
| free alternatives. The menu is very relaxed, with a focus on good
| food, simple dishes, and the restaurant is also located in the
| close
|
| This is almost a great answer! Just that these restaurants don't
| exist.
|
| I had 100x better experience with
| https://news.ycombinator.com/item?id=35583349, running locally!
| mousepad12 wrote:
| [dead]
| la64710 wrote:
| Thanks for breaking ground here and being truly OSS
| convery wrote:
| Just tested the 7B-tuned version on Huggingface and it's not
| great.. > What features were added in C++ 98?
| < C++98 (also known as C++11) was a major development milestone
| for C++ < new gcc 5.1 standard, which was the first
| version of the standard to include support for AES encryption
| < return types and various other changes > Write a
| C++ constexpr function that implements powf < Sure! Here
| is an implementation of a C++11 constexpr function for the
| function pow, using C++11 recursion and variadic templates:
| < void pow (int n) { std::cout << "Powerset of " << n << " is "
| << pow (n, n) << std::endl; } < This implementation of
| the pow function should be possible for any constant integer
| power of 2 without exception.
| brucethemoose2 wrote:
| Maybe you are not following the right syntax? LLaMA Vicuna and
| Alpaca suffer if you dont format the query jusy right.
| [deleted]
| josephwegner wrote:
| Anyone have a sense for system requirements on running this
| locally? StableDiffusion was too heavy for either of my machines
| (which are admittedly not very powerful) - I'm not seeing much in
| the docs to indicate whether or not this is more or less
| intensive to run locally.
| ranguna wrote:
| The tuned 7B model is around 33 GBs, so you'll need a PC with
| that much VRAM or RAM. I haven't tried to load it on text
| generation ui though.
| wsgeorge wrote:
| If you can run any models on llama.cpp, that might be a good
| indicator of which StableLM models you'll be able to run.
|
| I easily ran 7B int 4 ggml models on an MBP with 16gig RAM.
| Same works on a MBA with 8 gig RAM, but you'll have to not run
| any other memory-hogging app.
| tucnak wrote:
| On the off-note, can anybody tell me what's going on with
| embeddings, & vector databases? Certainly it would seem that
| forward-pass completion is pretty much solved, & a smaller,
| better model will appear eventually. Let's say you even managed
| to solve both complete() and embed() but what do you do with it,
| how are you going to organise, query, and multiply this dataset?
| Now the question I know that text-embedding-ada-002 has twice as
| many dimensions as mainstream Sentence transformers. Do we need
| all the extra dimensions? If not, how do I make it work better
| for my specific dataset with lots of jargon and abbreviations and
| stuff like that? What are the hardware requirements for that?
| I.e. could I do a fine-tuning job on some specific jargon-heavy
| text to get better embeddings for them? For one, the more I look
| into similarity-based use-cases the more I see that it's not
| normally speaking "top-percentile nearest-neightbour search" but
| the data is also terribly relational, i.e. it's probably like a
| slowly changing dimension, and there's a tree traversal type
| structure in how documents are generated as output from other
| documents as inputs? So you kind of have to think about these
| complete/embed ops both in aggregate; for batching but also in
| particular, from the cost/reward ROI type calculation. Not just
| in aggregate but also in terms of memory usage patterns to
| further optimise layout-- tiering and stuff like that really
| comes to light.
|
| Also: vector database shilling on HN is getting out of hand;
| multiple companies literally plugging every mention on the radar,
| some actively begging for upvotes. Looking at it all makes you
| really appreciate pgvector[1] to a point where you would be more
| willing to buy 3.2 TB of high-bandwidth NVMe and dedicate it to a
| large IFV index than ever have to deal with all of this "purpose-
| built vector database" bullshit.
|
| [1]: https://github.com/pgvector/pgvector
| visarga wrote:
| > Do we need all the extra dimensions?
|
| This discussion seems relevant:
| https://www.reddit.com/r/MachineLearning/comments/12q8rp1/di...
| Der_Einzige wrote:
| Yes, you need all of the dimensions. All of the dimensionality
| reduction techniques, including SOTA ones (UMAP or better) are
| going to massively harm your embeddings.
| tucnak wrote:
| Perhaps I didn't word by question correctly, I'm looking to
| compare capability of Sentence transformers vs. OpenAI Ada-
| based embeddings relative to their respective dimensionality?
| lxe wrote:
| Finetuning and training models of this size from scratch is now
| relatively inexpensive and accessible to anyone, especially with
| all these datasets becoming freely available. This is just the
| beginning of smaller LLMs that anyone can finetune to achieve
| high quality performance on particular tasks.
| antimatter15 wrote:
| Looks like Nolano.org's "cformers" includes a fork of
| llama.cpp/ggml by HCBlackFox that supports the GPT-NeoX
| architecture that powers EleutherAI's Pythia family of open LLMs
| (which also powers Databrick's new Dolly 2.0), as well as
| StabilityAI's new StableLM.
|
| I quantized the weights to 4-bit and uploaded it to HuggingFace:
| https://huggingface.co/cakewalk/ggml-q4_0-stablelm-tuned-alp...
|
| Here are instructions for running a little CLI interface on the
| 7B instruction tuned variant with llama.cpp-style quantized CPU
| inference. pip install transformers wget
| git clone https://github.com/antimatter15/cformers.git cd
| cformers/cformers/cpp && make && cd .. python chat.py -m
| stability
|
| That said, I'm getting pretty poor performance out of the
| instruction tuned variant of this model. Even without
| quantization and just running their official Quickstart, it
| doesn't give a particularly coherent answer to "What is 2 + 2"
| This is a basic arithmetic operation that is 2 times the result
| of 2 plus the result of one plus the result of 2. In other words,
| 2 + 2 is equal to 2 + (2 x 2) + 1 + (2 x 1).
| jmatthews wrote:
| The AI war has officially begun.
| mach1ne wrote:
| No, that comes when China catches up in the quest for
| superintelligence.
| xyzzy4747 wrote:
| I don't think China has enough skilled AI workers to compete
| in any quick timespan. Top talent doesn't want to live and
| work there generally.
| sanxiyn wrote:
| China is demonstrably capable of training GLM-130B. Skills
| involved seem comparable to Stability AI.
|
| https://keg.cs.tsinghua.edu.cn/glm-130b/
| [deleted]
| kriro wrote:
| I'm curious why you'd think that. China as a country has
| many people to start. Some percentage of these people will
| end up in AI. Assuming people from all countries are
| roughly equally intelligent, the numbers clearly favor
| China. Universities over there are quite good, there's a
| pretty strong "work hard" mentality I see from all our
| Chinese students. Plenty of Chinese graduating or starting
| university these days during the AI hype peak. China as a
| country isn't sleeping on AI either. I think China as an AI
| hub looks quite promising. Anecdotally, China also retains
| quite a lot of talent or people go abroad to study and
| return to China. Compared to some European countries or
| India that "leak" a lot of talent to the U.S. I think China
| is quite a bit more stable.
|
| On the hardware side, things tend to be produced there as
| well.
| snordgren wrote:
| China definitely "leaks" a lot of talent to American
| companies - most AI papers that I've seen from respected
| Western universities include at least one Chinese name.
|
| One challenge for China has been the university
| enrollment rate. While in Western countries half of each
| cohort has been going to university for decades, China is
| not there yet. In 2019, just 17% of Chinese adults have
| degrees compared to 44% in the US.
|
| So the large Chinese population is offset by its relative
| lack of access to education, while the US can draw from
| its own highly educated population in addition to
| attracting the best and the brightest from the rest of
| the world, including China.
| HybridCurve wrote:
| There are plenty of authoritarians in the US to conduct
| warfare against our institutions that are up to speed. We
| don't need to wait for anyone else overseas to get this party
| started.
| ChatGTP wrote:
| Stop worrying about China...wow
| rafark wrote:
| I'm more than ready. It's exciting to see what can come out of
| this.
| andy_xor_andrew wrote:
| One question I have is- can we expect GPTQ quantization down to 4
| bits (or lower) for these models, the same as for llama (which is
| partly why llama was so amazing?)
|
| I'm sure the answer is "yes", but I wonder if there's anything
| that can be done that could make these models more "friendly" to
| quantization like this from the very beginning.
|
| It's absolutely incredible that you can cut the size of a model
| in half... and then cut the size of the model in half *again*...
| and it fits on your consumer GPU with only a small cut in
| accuracy.
| brianjking wrote:
| What is the actual LICENSE?
|
| * HuggingFace shows CC-by-NC
| https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b
|
| * Github is Apache 2.0
|
| https://github.com/Stability-AI/StableLM/blob/main/LICENSE
| Taek wrote:
| NC is just for the fine tuned model
| GistNoesis wrote:
| I'm worried about the version hell of relying on HuggingFace's
| transformers.
|
| In the model card : config.json [1] and generation_config.json
| there are strings "transformers_version": "4.28.1" and it refers
| to some common architecture "GPTNeoXForCausalLM" .
|
| Which if I follow the string soft link correctly means, it refers
| to this file [2] with a long history of commits including some
| recents [3].
|
| I don't understand how to get out of the version hell. I can
| install and pin a version of the transformer library, but sooner
| or later if I chain different foundation models I'll have to do
| this for each different model card that has its own version.
|
| And then those transformer library version are probably dependent
| on some pytorch versions.
|
| I don't understand how you can't reliably track a bug, or a
| change in behavior, or guarantee that the model that you spend
| millions of dollar training doesn't rust, is reproducible or
| become unusable due to this mess.
|
| (And also the weights are de-serialized pickled python dictionary
| which is from a safety point of view equivalent to running
| untrusted binaries and very worrying on its own.)
|
| [1]https://huggingface.co/stabilityai/stablelm-tuned-
| alpha-7b/b... [2]https://github.com/huggingface/transformers/blob
| /v4.28.1/src... [3]https://github.com/huggingface/transformers/co
| mmits/v4.28.1/...
| lhl wrote:
| I've been diving in lately and while it's not efficient, the
| only way to do manage is to create a new conda/mamba
| environment, or a custom Docker image for all the conflicting
| packages.
|
| For safety and speed, you should prefer the safetensor format:
| https://huggingface.co/docs/safetensors/speed
|
| If you know what you are doing you can do your own conversions:
| https://github.com/huggingface/safetensors or for safety,
| https://huggingface.co/spaces/diffusers/convert
| brucethemoose2 wrote:
| > And then those transformer library version are probably
| dependent on some pytorch versions.
|
| They are not, and I dont think the model even cares about the
| transformers version. I run git transformers/diffusers and
| PyTorch 2.1 in all sorts of old repos, and if it doesnt
| immediately work, usually theres just small changes to APIs
| here and there that make scripts unhappy, and that you can
| manually fix.
| jxy wrote:
| In terms of pickled files, we may ask them to release the
| models in safetensors. That would be a big improvement.
|
| There's not much we can do about dependencies on pytorch or
| other python libraries. Perhaps people can make more
| independent implementations. The redundancy in implementations
| would help.
| substation13 wrote:
| I'm surprised that some of the cutting edge AI work is not
| being done in pure Rust. Why even have a binding layer?
| ElFitz wrote:
| Python seems to have won over the Data / ML world.
|
| So I'd wager they use what they and their intended audience
| know.
| tough wrote:
| Someone implemented llama on Go recently,
| https://github.com/gotzmann/llama.go
|
| Wouldn't discard a rust implementation of some LLM
| architecture at some point
| Philpax wrote:
| Hi! I'm a maintainer of
| https://github.com/rustformers/llama-rs. We're planning
| to expand our model support soon.
| sdenton4 wrote:
| The pickles drive me crazy.
|
| Tensorflow saved models are a great way to solve the
| problem... Save the computation graph and weights, and drop
| all the crusty code dependencies. I think ONNX models are
| similar. I expect there should be a Jax equivalent at some
| point, as Jax is basically perfectly designed for this
| (everything is expressed in lax operations, which allows
| changing implementations for cpu/gpu/tpu freely... So just
| save the list of lax ops).
| seydor wrote:
| When will we see gglm-versions of these?
|
| Good job on openAI to sell out in 2022. It was truly the end of
| the line.
|
| No matter how bad these model releases are , they are certain to
| get awesome soon with everybody hacking around them. The
| surprising success of MiniGpt4 with images shows that openAI's
| GPTs don't have some magic secret sauce that we dont know of.
| sebzim4500 wrote:
| GPT-4 is still much, much better than any of the competition.
|
| I guess we'll see once we have a 175B version of StableLM
| though, presumably that will at least easily beat GPT-3.
| franzypants wrote:
| So, it can write very coherently but not sure it excels at coding
| tasks.
|
| For example:
|
| User: Write Python code for evaluating the integral of f(x)=x^2.
|
| Assistant: Sure, here's an example of Python code to evaluate the
| integral of a function `f(x) = x^2`: ```
|
| # Define the function
|
| def integral_of_f(x): return x**2
|
| # Evaluate the integral
|
| result = integral_of_f(3)
|
| print("The integral of f(x) = x^2 is:", result) ```
|
| In this code, we define the function `integral_of_f` as a
| function that takes a single argument `x` and returns the
| integral of `x^2`. We then evaluate the integral by calling the
| `integral_of_f` function and passing in the value of `x` as an
| argument.
| vikp wrote:
| It's fantastic that more orgs are releasing open-source models
| trained on more than 300B or so tokens. Here's my take from the
| details I could find.
|
| Pros - 4096 context width (vs 2048 for llama,
| gpt-j, etc) - 3B to 65B released or in progress - RL
| tuned models available - Trained on more tokens than
| existing non-llama models - 128 head dim, so can use flash
| attention (unlike GPT-J)
|
| Cons - No benchmarks released, or details about
| the model - Somewhat restrictive license on the base
| models, and NC license on the RL models - Small models only
| trained on 800B tokens, compared to 1T for llama-7B, and
| potentially more for other upcoming alternatives (RedPajama,
| etc). I'd like to see their loss curves to see why they chose
| 800B.
|
| High-level, this is likely to be more accurate than existing non-
| llama open source models. It's hard to say without benchmarks
| (but benchmarks have been gamed by training on benchmark data, so
| really it's just hard to say).
|
| Some upcoming models in the next few weeks may be more accurate
| than this, and have less restrictive licenses. But this is a
| really good option nonetheless.
| HarHarVeryFunny wrote:
| They mention 1.5T training tokens, perhaps for the largest
| model only ?
| vikp wrote:
| It's unclear which models will be trained to 1.5T tokens. The
| details of how many tokens each model saw in training are on
| Github - https://github.com/stability-AI/stableLM/ . But only
| for the ones that have been released.
| thunderbird120 wrote:
| I just asked a stability employee and they said the the
| current models ran into an overfitting issue probably due
| to some duplicated data somewhere in their dataset, which
| consists of 1.5T tokens. The 800B tokens is the number of
| tokens they've been trained on so far. The plan is to keep
| going and train on the rest of the data once the issue is
| resolved.
| HarHarVeryFunny wrote:
| I've asked this question in a few places, and never been
| able to get an answer, maybe you know...
|
| Q: Why are these LLMs trained on a single epoch, and
| perform worse if the dataset is repeated ?
|
| This seems maybe related to suspecting data duplication
| as a cause of overfitting.
|
| Why don't LLMs need multi-epoch training at a low
| learning rate to generalize? If they are managing to
| learn from a single epoch, that sounds more like they may
| be memorizing!
| sebzim4500 wrote:
| >- No [...] details about the model
|
| You can see the model architecture here
|
| https://github.com/Stability-AI/StableLM/blob/main/configs/s...
| lhl wrote:
| FYI, I'm running lm-eval now w/ the tests Bellard uses
| (lambada_standard, hellaswag, winogrande, piqa,coqa) on the
| biggest 7B an 40GB A100 atm (non-quantized version, requires
| 31.4GB) so will be directly comparable to what various LLaMAs
| look like: https://bellard.org/ts_server/
|
| (UPDATE: run took 1:36 to complete run, but failed at the end
| with a TypeError, so will need to poke and rerun).
|
| I'll place results in my spreadsheet (which also has my text-
| davinci-003 results):
| https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...
| guywithabowtie wrote:
| Do you also have results of GPT4 somewhere? or text-
| davinci-003-turbo
| lhl wrote:
| I'm still on the waitlist for GPT-4 API access. Note, that
| text-davinci-003 cost about $90 to benchmark at $0.02/1K
| tokens, so if you're able to use a GPT-4 model (for
| completion and not just instruction) that'll probably be
| $270-$540 in credits to benchmark...
| lunixbochs wrote:
| Are you using https://github.com/EleutherAI/lm-evaluation-
| harness?
| lhl wrote:
| Yeah, although looks like it currently has some issues with
| coqa: https://github.com/EleutherAI/lm-evaluation-
| harness/issues/2...
|
| There's also the bigscience fork, but I ran into even more
| problems (although I didn't try too hard)
| https://github.com/bigscience-workshop/lm-evaluation-
| harness
|
| And there's https://github.com/EleutherAI/lm-eval2/ (not
| sure if it's just starting over w/ a new repo or what?) but
| it has limited tests available
| sebzim4500 wrote:
| How possible is it that every other model suffers from
| dataset contamination and this model is being unfairly
| penalized for having properly sanitized training data?
| swyx wrote:
| > 128 head dim, so can use flash attention (unlike GPT-J)
|
| mind explaining why this is so attractive/what the hurdle is
| for the laypeople in the audience? (me)
| GaggiX wrote:
| Standard attention has memory quadratic in sequence length,
| whereas FlashAttention has memory linear in sequence length.
| Also FalshAttention is faster.
| WithinReason wrote:
| According to the paper Flash Attention also needs quadratic
| memory:
|
| Let N be the sequence length, d be the head dimension, and
| M be size of SRAM with d <= M <= Nd. Standard attention
| (Algorithm 0) requires Th(Nd+N2) HBM accesses, while
| FlashAttention (Algorithm 1) requires Th(N2d2M-1) HBM
| accesses.
| GaggiX wrote:
| https://github.com/HazyResearch/flash-attention#memory
|
| "standard attention has memory quadratic in sequence
| length, whereas FlashAttention has memory linear in
| sequence length."
|
| I guess you have just reported how many times the layer
| will need to access the memory, not how much memory usage
| scales with sequence length.
| sroussey wrote:
| So there must be a downside to FlashAttention. What is it?
| lhl wrote:
| https://arxiv.org/abs/2205.14135 - Section 5 suggests
| that the biggest limitation is that custom CUDA kernels
| need to be coded on a per-GPU architecture basis.
| kiraaa wrote:
| installing it is a nightmare
| fpgaminer wrote:
| FlashAttention is mathematically identical to standard
| attention, so in theory there's no downside. In practice,
| numerical inaccuracies of floating point mean that the
| results differ slightly. I don't know of any papers going
| in depth to analyze what impact those variances have in a
| range of real models, but generally speaking deep models
| handle slightly variances well. I've not noticed any
| difference in my applications training models. And tons
| of people use FlashAttention as a drop-in replacement on
| models trained on standard attention (e.g. using xformers
| in StableDiffusion).
|
| Also in practice FlashAttention is still relatively new
| so it isn't well supported in libraries yet. Until
| PyTorch 2.0 you had to either implement it yourself, or
| use something like xformers which comes with a bag of
| caveats. PyTorch 2.0 now has it built-in, and it's easy
| to use, but the implementation is incomplete so you
| can't, for example, use it with an attention mask (which
| is needed in LLMs, for example).
|
| tl;dr: Basically none, but it just isn't well supported
| yet.
| whimsicalism wrote:
| > Small models only trained on 800B tokens, compared to 1T for
| llama-7B
|
| LLaMA is trained far beyond chinchilla optimality, so this is
| not as surprising to me.
| anentropic wrote:
| According to this LLaMA still didn't go far enough:
| https://www.harmdevries.com/post/model-size-vs-compute-
| overh...
| whimsicalism wrote:
| Yep, it depends on what your goal is.
| dragonwriter wrote:
| But Chinchilla optimality, while an interesting result, is a
| strange target for most practical purposes. Training is one
| time, inference is many times; not training past the point
| where its cheaper to training a larger model for the same
| (proxy for) quality discounts to zero the import of the cost
| of inference.
| whimsicalism wrote:
| Yep, but if stability has the goal of training the best
| possible model then that would explain the choices they
| made.
| GaggiX wrote:
| I mean 800B tokens on a 3B model and 7B model is still
| way beyond the Chinchilla scale.
| burtonator wrote:
| Were you able to figure out if the RL models are going to be
| jailed? A 65B parameter model could be a bit frightening.
| That's 1/3rd the size of GPT3.
| sebzim4500 wrote:
| I'm sure there will be a bunch of different RL tuned versions
| of them, RLHF isn't that expensive. IIRC Microsoft has
| software that will do it for a few thousand dollars for a
| model that size. I'm sure someone will release a non-
| lobotomized version, maybe OpenAssistant.
| kiraaa wrote:
| its not alway about the size, but yeah its really good!
| beecafe wrote:
| [dead]
| DustinBrett wrote:
| I'm wondering what the sweet spot for parameters will be. Right
| now it feels like the Mhz race we had back in the CPU days, but
| 20 years later I am still using a 2-3GHz CPU.
| lhl wrote:
| I think "sweet spot" is going to depend on your task, but
| here's a good recent paper that may give you some more
| context on thinking about training and model sizes:
| https://www.harmdevries.com/post/model-size-vs-compute-
| overh...
|
| There have also been quite a few developments on sparsity
| lately. Here's a technique SparseGPT which suggests that you
| can prune 50% of parameters with almost no loss in
| performance for example: https://arxiv.org/abs/2301.00774
| version_five wrote:
| I was wondering if the longer training thing was a similar
| phenomenon to the double-descent we see in other deep
| learning models. Training for a really long time can
| improve generalization (as can adding more parameters) -
| but I don't know enough about LLM architecture to know if
| that's relevant here. My skim of the blog post led me to
| think it's proposing a different mechanism (scaling laws).
| capableweb wrote:
| > - 3B to 65B released or in progress
|
| Seems they want to do 3B to 175B, although 175B is not in
| progress yet.
| ipsum2 wrote:
| It's not efficient to do 175B. Training a smaller model (65B)
| on more data gives better performance for the same compute.
| tempaccount420 wrote:
| If you want it to just regurgitate training data, sure. But
| more parameters will always be better for more complex
| tasks.
| thewataccount wrote:
| > But more parameters will always be better for more
| complex tasks.
|
| I think you should checkout this paper which discusses
| the relationship of performance and the ratio of training
| tokens to parameter count.
|
| https://arxiv.org/abs/2203.15556
| tempaccount420 wrote:
| StableLM already has an optimal parameter number to
| tokens ratio, so what's your point? They should train the
| 65B model on even more tokens?
|
| > StableLM is trained on a new experimental dataset built
| on The Pile, but three times larger with 1.5 trillion
| tokens of content
| thewataccount wrote:
| If I understand correctly, based on their prediction in
| Table 3 on page 8, they do have enough tokens, but they
| also need over a magnitude more compute time.
|
| > It's not efficient to do 175B. Training a smaller model
| (65B) on more data gives better performance for the same
| compute.
|
| This is OP's comment you replied to - so I was responding
| under OP's context that the amount of compute time would
| be the same, which I apologize I didn't make clear, and
| my response was very poorly worded.
|
| My intent was to link the paper because I think it
| supports OP's statement that for the same amount of
| compute time and a token ratio, the performance of a
| smaller model will be better then a larger one (assuming
| they haven't converged yet which they haven't at this
| size).
|
| > If you want it to just regurgitate training data, sure.
|
| This paper was about showing Chinchilla performing with
| models many times larger then itself, showing you don't
| need to have a 175B size model for more performance then
| "regurgitating training data"
| sebzim4500 wrote:
| Depends on your compute budget.
| kiraaa wrote:
| and also easy to deploy
| GaggiX wrote:
| >Small models only trained on 800B tokens
|
| "These models will be trained on up to 1.5 trillion tokens." on
| the Github repo.
|
| https://github.com/stability-AI/stableLM/#stablelm-alpha
| youssefabdelm wrote:
| That's great news, but one would think that since they're
| behind Stable Diffusion, that they'd use the insights behind
| it and scale data even more than that to result in better
| quality at a smaller scale model that can run on most
| people's machines.
|
| Like... try 10 trillion or 100 trillion tokens (although that
| may be absurd, I never did the calculation), and a long
| context on a 7B parameter model then see if that gets you
| better results than a 30 or 65B parameter on 1.5 trillion
| tokens.
|
| A lot of these open source projects just seem to be trying to
| follow and (poorly) reproduce OpenAI's breakthroughs instead
| of trying to surpass them.
| GaggiX wrote:
| >try 10 trillion or 100 trillion tokens
|
| Computation is not free and data is not infinite.
| youssefabdelm wrote:
| You could've said the same to OpenAI when they were
| scaling GPT from 1 billion to 175 billion parameters.
| We're all grateful they didn't follow that line of
| thought.
|
| But Stability does have access to a pretty big cluster,
| so it's not paying cloud compute (I assume), so cost will
| be less, and data of course is not infinite...never
| stated that.
|
| But considering 3.7 million videos are uploaded to
| youtube everyday, 2 million scientific articles published
| every year, yada yada...that argument falls apart.
|
| At the very least implement spiral development... 1
| trillion... 3 trillion... (oh it seems to be getting WAY
| better! There seems to be a STEP CHANGE!)... 5
| trillion... (holy shit this really works, lets keep
| going)
| dragonwriter wrote:
| The training corpus is the problem. An extra trillion
| tokens is (ballpark) an extra million KJV bibles worth of
| text formatted for ingestion. And you probably picked all
| of the low hanging fruit in terms of quality prior
| vetting and being in a standard format for ingestion in
| your first trillion tokens of training data.
| dragonwriter wrote:
| > Like... try 10 trillion or 100 trillion tokens (although
| that may be absurd, I never did the calculation)
|
| But where's the corpus supposed ro come from?
| Taek wrote:
| Devs confirmed that the small ones use 800B, 1.5T is for the
| large ones
| GaggiX wrote:
| @thunderbird120 asked a Stability employee and say that the
| plan is going to keep training the models up to 1.5T. So I
| don't know where do you read this.
| nickthegreek wrote:
| https://github.com/Stability-AI/StableLM#stablelm-alpha
| shows that the 3b and 7B had 800b training tokens.
| Taek wrote:
| That may be, but the weights you can download today were
| trained on 800B
| sroussey wrote:
| I think they are "checkpoint" models in this case.
|
| Will be fun to compare when completed!
| GaggiX wrote:
| yes of course that's why they use "will be trained" on
| the GH repo.
| IronWolve wrote:
| Are the XL 1024x1024 enhanced models released for stable
| diffusion yet? That looks really good from the testing.
| nologic01 wrote:
| Is there something like Moore's law for LLM's that will
| eventually turn them into ubiquitus compute?
| matthasfun wrote:
| I've recently downloading diffusion bee to my Mac for image
| generation with stable diffusion.
|
| Excited for when I can do the same with the StableLM.
| beecafe wrote:
| [dead]
| transcriptase wrote:
| Missed opportunity for Stability to call themselves "ClosedAI"
| then being the opposite of what the name suggests.
| leaving wrote:
| Well that was a trip.
|
| I asked StableLM-Tuned-Alpha-7b to translate the Swedish phrase
| "Skaplig bogpulka" into English (because ChatGPT and New Bing
| refuse to do so as a result of their puritannical US censorship)
| and it confabulated wildly.
|
| It made up an Estonian folk song and somehow we ended up on the
| etymology of the well-known Swedish word "Skeletor".
|
| This one is not ready for prime-time, but I have hopes. Someone
| please make a model that doesn't censor. I won't be paying one
| thin dime for this stuff until it is censorship-free.
| [deleted]
| arpowers wrote:
| Why does every LLM use "Hope this email finds you well" to start
| virtually every letter and email?
| ShamelessC wrote:
| Have you tried something like:
|
| "Never start an email with 'Hope this email finds you well'"
|
| in your first prompt.
| cmsj wrote:
| For people wanting to run it locally, you can fit the 7b model
| (just) into a 24GB VRAM GPU (e.g. 3090/4090). The 3b model
| appears to be much more reasonable, but I would say the output
| is.... of limited quality based on the few tests I've run thus
| far.
| cmsj wrote:
| Gist that mostly just takes the notebook Stability AI have in
| the github repo, and turns it into a script you can run locally
| after installing a few dependencies from pip:
|
| https://gist.github.com/cmsj/2d6b247ad4fc8f15011105feeda763e...
| millimeterman wrote:
| I suspect the community will start creating lower
| precision/quantized versions of the model very quickly. LLaMa
| 30b quantized to 4 bits is runnable on a 3090/4090.
| janmo wrote:
| Don't need a GPU to run the model, you can use your RAM and
| CPU, but it might be a bit slow
| cmsj wrote:
| It's very slow, and for the 7b model you're still looking at
| a pretty hefty RAM hit whether it's CPU or GPU. The model
| download is something like 40GB.
| nwoli wrote:
| No way, I thought this was months away still. So excited
| MacsHeadroom wrote:
| Completed training is. These are early "Alpha" checkpoints.
| visarga wrote:
| It's released but still cooking in the oven as we speak.
| msoad wrote:
| I feel "The Pile" needs a lot more to match what OpenAI has
| trained their models on. The mysterious "Books2" and so on.
| nickthegreek wrote:
| Great to see Stability release this with a great license as well.
| Any idea on the usecases for the 3B model? Will a model that
| small suffer heavily from a 4bit quantization?
| wsgeorge wrote:
| > Will a model that small suffer heavily from a 4bit
| quantization?
|
| From my experience with quantized 7B llama models, avoid 3B if
| you can. Without benchmarks, I think this is a decent rule of
| thumb.
| capableweb wrote:
| Generating either a 0 or 1 in response to something? Maybe
| useful in some testing scenarios? Honestly don't know either.
| New lorem ipsum?
| visarga wrote:
| You can use it as the assistant model to a large model, it's
| called speculative sampling. You generate text with the small
| model and validate with the large one, ensuring no deviation
| occurs. Speedup of 2.5x
|
| https://arxiv.org/abs/2302.01318 (DeepMind)
| MacsHeadroom wrote:
| That would be 1bit quantization. In reality quantization
| under 8bits is done in smart ways which result in higher
| effective output quantization and lower effective memory size
| quantization.
|
| For example, bucketing identical groups of weights and then
| reusing one bucket for all the identical groups lowers the
| effective bit quantization at the memory level while
| retaining the uncompressed quantization quality.
|
| There is literature on effective quantization levels below 1
| ("So called 0bit quantization). But even then the actual
| weights are typically 2-4 bits and there is just a lot of
| reuse of weights going on.
|
| Another neat trick is to line bins of weights up in order of
| lowest to highest weights and compute a function to produce a
| zero offset throughout the bins of weights; such that 0010 in
| one bin and and 0010 in another bin are unsampled to 16bit
| and then have different offsets added, maintaining
| uncompressed 16bit performance without the memory overhead.
|
| There are many more tricks like this and many still to be
| found!
| jerrygoyal wrote:
| what's the cut-off knowledge for this?
| alexb_ wrote:
| Absolutely a giant fan of Stability staying to actually open
| source licenses and not licenses that impose restrictions on what
| you can use it for. This is the future of AI! Beware of any org
| that uses "ethical" licenses - they are not open source.
| Stability is one of the few organizations that actually cares
| about free software, you love to see it.
| cjbprime wrote:
| > These fine-tuned models are intended for research use only
| and are released under a noncommercial CC BY-NC-SA 4.0 license,
| in-line with Stanford's Alpaca license.
|
| This is a no-commercial-use-allowed license; it is neither
| considered free software nor open source, the definitions of
| which disallow restrictions on what you can use the work for.
| renewiltord wrote:
| Interesting. A non-LLM hallucinating. And to think we used to
| believe that was only a property of LLMs.
| circuit10 wrote:
| Huh? Everything they said is true, isn't it?
| renewiltord wrote:
| StableLM, the model that's theirs is CC-BY-SA 4.0.
| [deleted]
| freedomben wrote:
| The two sentences prior are important:
|
| > _We are also releasing a set of research models that are
| instruction fine-tuned. Initially, these fine-tuned models
| will use a combination of five recent open-source datasets
| for conversational agents: Alpaca, GPT4All, Dolly, ShareGPT,
| and HH. These fine-tuned models are intended for research use
| only and are released under a noncommercial CC BY-NC-SA 4.0
| license, in-line with Stanford's Alpaca license._
|
| The snippet you quoted is not talking about the main model in
| the announcement. It's talking about fine-tuned models based
| on other models. Stability has to respect the license of the
| originals. They cannot change it.
|
| The main model is described higher up in the post and _is_
| permissible for commercial:
|
| > _Developers can freely inspect, use, and adapt our StableLM
| base models for commercial or research purposes, subject to
| the terms of the CC BY-SA-4.0 license_
| btown wrote:
| It also appears that CC BY-SA-4.0 is GPL-compatible. Not a
| lawyer and this is not legal advice, but it certainly seems
| like one could operate their own StableLM server/service
| and allow proprietary code to use it over a network
| interface, much like one could use a GPL-licensed database
| system.
|
| https://creativecommons.org/faq/#can-i-apply-a-creative-
| comm...
| seydor wrote:
| Not their fault, the instruct-tuned models depend on non-open
| data.... Which should be open however. Scraping chatGpt is
| legal
| sanxiyn wrote:
| Agreed. Scraping ChatGPT is against OpenAI terms of use and
| OpenAI is entitled to terminate your access immediately
| upon notice, but since ChatGPT output is not copyrighted
| (and copyrightable), output you acquired before termination
| should be freely redistributable. I am not sure why
| Stanford Alpaca authors think otherwise but they are wrong.
|
| https://openai.com/policies/terms-of-use
|
| Thank you for using OpenAI! These Terms of Use apply when
| you use the services of OpenAI, L.L.C. (snip) By using our
| Services, you agree to these Terms. (snip) You may not
| (iii) use output from the Services to develop models that
| compete with OpenAI. (snip) We may terminate these Terms
| immediately upon notice to you if you materially breach
| Sections 2 (Usage Requirements).
| whywhywhywhy wrote:
| "Ethics" will only ever be an excuse to lock this technology
| behind one companies paywall. The only ethical AI is actually
| free and open AI, how its trained is irrelevant imho as long as
| we can all benefit. The negatives of the work of individuals
| being used to train it outweigh the negatives of one company
| just doing that and holding the power within their walls.
| version_five wrote:
| Yeah I wish there was more real investigation / analysis into
| who is behind various "ethical AI" pushes and what they stand
| to gain from it. From what I can see, many of the people
| involved either are invested in companies that will somehow
| certify your AI is ethical, or just want to stifle
| competition so they can catch up. Of course there's also a
| sprinkling of "current thing" supporters.
| Xelynega wrote:
| I have to disagree. Especially in the case of LLMs where new
| API services are popping up all over the place, an "ethical"
| license like agpl that requires the source be shared for web
| services would would accelerate development of the space as a
| whole immensely.
| EamonnMR wrote:
| I think when they say ethical they're talking about RAIL
| which has clauses about not doing bad guy stuff with their
| models, not AGPL.
| shostack wrote:
| How is this sort of thing audited? I imagine there are all
| sorts of lifestyle AI businesses that won't give two shits
| about a license where people can't easily see or audit what is
| being used.
| burtonator wrote:
| I am definitely not in favor of OSS license for models >=
| GPT3...
|
| Not unless they're aligned well.
|
| There are all sorts of horrible use cases that these could be
| used for.
| circuit10 wrote:
| For the worst use cases, like scams, why would they respect
| the license? Though that probably applies for some legal but
| immoral use cases
| cheald wrote:
| "Alignment" is just a euphemism for "agrees with me", though.
| Humans aren't even aligned with each other. Demanding that AI
| models be "aligned" is essentially a demand that AI only be
| produced which agrees with your priors.
| risho wrote:
| it is true that there are concerns relating to open source
| and ai, but surely the having them be closed off, manipulated
| and controlled untrustworthy corporations is worse.
| wahnfrieden wrote:
| they like the particular techbro that runs the corporation
| currently. people will find all sorts of ways to lick the
| boot of private ownership.
| welshwelsh wrote:
| The ideal alignment for a computer program is to be aligned
| completely with the user, prioritizing their needs to the
| exclusion of all others.
| orra wrote:
| I am very happy to see them use a true FLOSS licence. However,
| it's a surprise to me, given Stable Diffusion is proprietary,
| using one of those "ethical" licences.
| r0l1 wrote:
| How will they generate revenue?
| earthboundkid wrote:
| I tooted this yesterday:
|
| Selling access to LLMs via remote APIs is the "stage plays on the
| radio" stage of technological development. It makes no actual
| sense; it's just what the business people are accustomed to. It's
| not going to last very long. So much more value will be unlocked
| by running them on device. People are going to look back at this
| stage and laugh, like paying $5/month to a cellphone carrier for
| Snake on a feature phone.
|
| Web apps:
|
| - Need data persistence. Distributed databases are really hard to
| do.
|
| - Often have network effects where the size of the network causes
| natural monopoly feedback loops.
|
| None of that applies to LLMs.
|
| - Making one LLM is hard work and expensive. But once one exists
| you can use it to make more relatively cheaply by generating
| training data. And fine tuning is more reliable than one shot
| learning.
|
| - Someone has to pay the price of computation power. It's in the
| interest of companies to make consumers pay for it up front in
| the form of a device.
|
| - Being local lets you respond faster and with access to more
| user contextual data.
| s3p wrote:
| What does "stage plays on the radio" mean?
| crucialfelix wrote:
| When radio first started, people read plays written for the
| stage, because that's what they knew and what they had. Later
| people learned to write for the medium and make radio native
| entertainment.
|
| Same thing happened when TV arrived. They did live versions
| of the radio entertainment on a set in front of a camera.
| [deleted]
| what_ever wrote:
| Trying to be fancy to say things won't be running locally.
| The orchestra/performers won't be live in front of you but on
| the radio.
| throwawayadvsec wrote:
| I don't think it's going to happen in the next few years
|
| the prices are gonna drop like hell, but ain't no way we run
| models meant to run on 8 nvidia A100 on our smartphones in the
| next 5 years
|
| just like you don't store the entirety of spotify on your
| iphone, you're not gonna run any decent LLM on phones any time
| soon(and I don't consider any of the small Llamas to be decent)
| w4ffl35 wrote:
| > but ain't no way we run models meant to run on 8 nvidia
| A100 on our smartphones in the next 5 years
|
| m$ has been working on an AI chip since 2019 so i think we
| will.
| throwawayadvsec wrote:
| An A100 is about the size of a brick, there is no way we're
| fitting those 8 bricks in a phone in the next five years,
| without even thinking about heat management
| asciimike wrote:
| An A100 HGX server is ~6kW of power consumption (and
| associated heat), while an iPhone is O(1W). I agree that
| a 6000x increase in energy density or 6000x decrease in
| power consumption is unlikely in this decade.
| d33 wrote:
| They still can't beat the laws of physics. It's just the
| kind of density that doesn't seem feasible to put in a
| smartphone.
| wewtyflakes wrote:
| Human brains seem to be of the same order of magnitude in
| terms of size, and seem more nuanced and capable than
| GPT4.
| what_ever wrote:
| And? Nvidia has been working on AI chips for years.
| davnicwil wrote:
| the only thing I can say to this is that Apple have seemed
| laser focused on tuning their silicon for ML crunching, that
| that focus is clearly now going to be amped up further still,
| and that in tandem the software itself will be tuned to Apple
| silicon.
|
| GPUs on the other hand are pretty general purpose. And 5
| years on a focused superlinear ramp up is a long time, lots
| can happen. I am not saying it's 100%, or even 80% likely.
| It'll be super impressive if it happens, but I see it as well
| within the realms of reason.
| brucethemoose2 wrote:
| The M2 and the 4090 are both very general purpose. In fact,
| the 4090 allocates proportionally more silicon area to the
| tensor cores than Apple allocates to the neural engine.
|
| The M series is basically the only "big" SoC with a
| functional, flexible NPU and big GPU right now, which is
| why it seems so good at ML. But you can bet actual ML
| focused designs are in the pipe.
| zamnos wrote:
| Well, and gobs of RAM, which only top end cards can
| compete with.
| tehsauce wrote:
| Apple's new M2 Max has a neural engine which can do 15
| trillion flops. Nvidias's A100 chip (released almost 3
| years ago) can do 315 trillion flops. Apple is not going to
| close this 20x gap in a few years.
| moffkalast wrote:
| > this 168x gap
|
| FTFY, remember it takes 8 of those to even load the
| thing. And when the average laptop has that much compute,
| GPT 4 will seem like Cleverbot in comparison to the state
| of the art.
| sroussey wrote:
| At some point, they will put the models in silicon. I'm
| curious as to when... 5yr?
| viraptor wrote:
| That doesn't sound likely with the current architectures.
| There may be some kind of specialisation, but NN is like
| the chip design nightmare. We can't do chips that that
| many crossed lines. It's going to have to keep the
| storage+execution engine pattern unless we have done
| breakthroughs.
|
| "More specialised than GPU" is the game for now.
| davnicwil wrote:
| right, it's a huge challenge.
|
| I think the tuning the models to the hardware piece is
| important, and of course there is much more incentive to
| do this for Apple than nvidia because of the distribution
| and ecosystem advantages Apple have.
|
| But also, I don't know... let's see what the curve looks
| like! It's only been a couple of years of these neural
| engines. Let's see how many flops M3 can hit this year.
| And then m4 the next. Again, 5 years is a long time
| actually when real improvement is happening. I am
| optimistic.
| refulgentis wrote:
| Curious, why do you think that? My knowledge is limited to
| marketing material and my M2 vs my 3090, and my conclusion
| so far would be that's in every hardware makers marketing
| claims the past couple years.
| viraptor wrote:
| > but ain't no way we run models meant to run on 8 nvidia
| A100 on our smartphones in the next 5 years
|
| When I leaned about neutral networks, the general advice at
| the time was "you'll only need one hidden layer, with
| somewhere between the number of your input and output
| neurons". While that was more than 5 years ago, my point is -
| both the approach and the architecture changes over time. I
| would not bet on what we won't have in 5 years.
| Analog24 wrote:
| This is the reason why they're not going to move on device
| anytime soon. You can use compression techniques, sure, but
| you're not going to get anywhere near the level of
| performance of GPT-4 at a size that can fit on most consumer
| devices
| mkaic wrote:
| I think we'll see completely new architectures dominate in
| the near future, ousting the transformer. I am strongly
| suspicious that, while impressive, transformers use several
| orders of magnitude more compute than is "needed" for the
| tasks they perform--if for no other reason because the
| human brain performs similarly and it only draws 20 watts!
| And it isn't even an engineered system, jus the product of
| a very, very long history of natural selection! I fully
| anticipate that we'll see AI in the near future that
| achieves human-level performance on sub-human power budgets
| like the ones you'd be constrained by on a phone :)
| Analog24 wrote:
| "neat future" is very ambiguous. At the moment there is
| nothing even close to transformers in terms of
| performance. I suspect you are right in general but I'm
| not sure about the "near future" part, there needs to be
| a pretty significant paradigm shift for that to happen
| (which is possible, of course, I just don't see any hints
| of it yet).
| brucethemoose2 wrote:
| Vicuna (LLama) 13B is pretty good IMO. A 20B model can
| definitely fit in RAM on future devices.
| WhiteNoiz3 wrote:
| I agree - I think for security and privacy we need it to be on-
| device (either that or there needs to be end to end encryption
| with gaurantees that data won't be captured for training).
| There are tons of useful applications that require sensitive
| personal information (or confidential business information) to
| be passed in prompts - that becomes a non issue if you can run
| it on device.
|
| I think there will be a lot of incentive to figure out how to
| make these models more efficient. Up until now, there's been no
| incentive for the OpenAI's and the Googles of the world to make
| the models efficient enough to run on consumer hardware. But
| once we have open models and weights there will be tons of
| people trying to get them running on consumer hardware.
|
| I imagine something like an AI specific processor card that
| just runs LLMs and costs < $3000 could be a new hardware
| category in the next few years (personally I would pay for
| that). Or, if apple were to start offering a GPT3.5+ level LLM
| built in that runs well on M2 or M3 macs that would be strong
| competition and a pretty big blow against the other tech
| companies.
| ForHackernews wrote:
| ?
|
| All software is sold as SaaS today, because it's more
| profitable. The same will be true for LLMs.
| coffeebeqn wrote:
| LLMs also need persistent storage for best performance.
| Otherwise you're always starting at square one
| circuit10 wrote:
| At least for now, good LLMs need GPUs that cost tens of
| thousands, so a cloud API is the only reason option. It
| definitely makes a lot of sense
| andrewcamel wrote:
| On this topic, Apple is the sleeping giant. Sleeping tortoise
| maybe. Everyone else has been fast out of the gates, but Apple
| has effectively already been positioning to leap frog everyone
| after a decade+ of M1 chip design. Ever since these chips
| launched, the M1 chips have felt materially underutilized,
| particularly their GPU compute. Have to believe something big
| is going on behind the scenes here.
|
| That said, wouldn't be surprised if the truth was somewhere in
| between cloud-deployed and locally deployed, particularly on
| the way up to the asymptotic tail of the model performance
| curve.
| ohgodplsno wrote:
| M1 GPUs are barely real-world tested, alright chips. They're
| far from being a sleeping giant.
| smoldesu wrote:
| What would a "leap frog" look like, in your mind? I'm
| struggling to imagine how they're better positioned than the
| competition, especially after llama.cpp showed us that
| inference acceleration works with everything from AVX2 to ARM
| NEON. Compared to Nvidia (or even Microsoft and ONNX/OpenAI),
| Apple is somewhat empty-handed here. They're not out of the
| game, but I genuinely see no path for them to dominate
| "everyone".
| yunwal wrote:
| My guess is a leapfrog would have more to do with how LLMs
| are integrated into an operating system, rather than just
| coming out with a better model. I don't think we're gonna
| get a substantially more capable LLM than GPT-4 anytime
| soon, but fine-tuning it to sit on top of the core of an
| operating system could yield results.
| smoldesu wrote:
| Feels like Microsoft already beat them to the punch.
| Their ONNX toolkit has better ARM optimization than
| Apple's own Pytorch patches, and their collaboration with
| OpenAI places them pretty far ahead of the research
| curve. I'm convinced Microsoft could out-maneuver Apple
| on local or remote AI functionality, if they wanted to.
| jrm4 wrote:
| I think it may be naive that people believe that the deciding
| factor on how these things are used is likely to be "chip
| speed." or "efficiency on the machine."
|
| I wish we were in that world; but it more likely seems like it
| would be "Which company jumps ahead quickest to get mindshare
| on a popular AI related thing, and then is able to ride scale
| to dominate the space?"
|
| REALLY hope I end up being wrong here; the fact that so many
| models are already out there does give me some hope.
| GavinB wrote:
| I'd suspect it will actually accelerate moving everything into
| the cloud.
|
| If your entire business is in the cloud, you can give an AI
| access to everything with a single sign or some passwords. If
| half is on the cloud and half is local, that's very annoying to
| have all in-context for your AI assistant. And there's no way
| we're getting everything locally stored again at this point!
| bugglebeetle wrote:
| Right, this is why StabilityAI is getting in bed with Amazon,
| so private, fine-tuned models can operate on all your data
| sitting out there in S3 buckets or whatever.
| fshbbdssbbgdd wrote:
| Most of your bullets sound like arguments that local models
| will win.
|
| > Often have network effects where the size of the network
| causes natural monopoly feedback loops.
|
| This one in particular sounds like an argument that remote
| models will win.
| max51 wrote:
| I don't that's true in the context of businesses because they
| won't want their data to be leaked and/or used for other
| clients. The more data from your company you can feed the AI,
| the more productive it will be for you. I'm not just talking
| about semi-public documentation, but also things like emails,
| meeting transcript, internal tools APIs, employee details,
| etc.
|
| If the AI service provider uses your data to help better
| train their AI, it will be blacklisted by most companies. If
| you keep them in silos, the centralisation will offer almost
| no benefit while still being a very high privacy risk. The
| only benefit they get is that it allows them to demo it and
| see it's potential, but no serious business will adopt it
| unless you also provide a self-hosted solution.
|
| I think the only people who will truly benefit from using
| cloud services as a long term solution are personal users and
| companies too small to afford the initial cost of the
| hardware.
| goatlover wrote:
| That seems hard to believe for businesses which already
| rely on Office, Teams and Sharepoint, since Microsoft will
| be making its version of ChatGPT available for all its
| products, and the integration will be too hard to pass up
| on.
| max51 wrote:
| Microsoft is in a different situation because everyone is
| already forced to trust them with their OS and o365. For
| better or for worse, there are no current alternatives to
| Windows and the office suite for most businesses. If you
| already login to your OS with a Microsoft account and
| process your data in Excel, adding an AI tool on top of
| it is not a big jump. Very few others are in this
| situation.
|
| For every other AI service providers, good fucking luck
| getting clients to trust you. I expect we will see a lot
| AI services that offer a cheap and easy to use cloud AI
| subsidized by a very expensive self-hosted version. I
| also expect a lot of data leaks and many high profile
| incidents where an AI creates a document or code that
| includes sensitive data from someone else (hard coded
| passwords, API keys, etc.).
|
| Even for a large company like Autodesk or Adobe, you
| might trust them with your engineering drawings and your
| new product design, but would you feel comfortable
| uploading your code base for internal tools, employee
| files, email communications, etc. to them? It's gonna be
| a hard no for a lot of businesses
| precompute wrote:
| This is mostly why the future of computation only makes sense
| monetarily if you have everyone shift to a thin client. So,
| banning GPUs is likely considered a "necessary evil" by the
| BigTech cognoscenti for accomplishing that goal.
| qingdao99 wrote:
| > But once one exists you can use it to make more relatively
| cheaply by generating training data.
|
| Is that a real technique? Why not just shrink down the model
| itself directly somehow, is that not possible?
| vsareto wrote:
| I think you'll be paying a cloud provider instead of an API
| company, but it'll still be a monthly service charge for the
| service
|
| I don't see small/medium companies getting into acquiring
| hardware for AI
| losvedir wrote:
| This doesn't seem technically feasible to me. The state of the
| art will for a long time require a lot more hardware to run
| than it's available on a consumer device.
|
| Beyond which, inference _also_ benefits from parallelization,
| not just training, so being able to batch requests is a
| benefit, and more likely when access is offered via an API.
| FL33TW00D wrote:
| This isn't correct. It's 100% feasible to run very capable
| models on consumer devices.
|
| I wrote up a feasibility investigation last year:
| https://fleetwood.dev/posts/a-case-for-client-side-
| machine-l...
| crazygringo wrote:
| But there's a big difference between running "very capable"
| models that use most available resources and consume tons
| of power and run slowly...
|
| ...versus running the best models available, in a few
| seconds, without using up the memory the main app you're
| using needs for running.
|
| These are all mainly going to be run remotely for general
| consumer usage for quite a while I think.
| FL33TW00D wrote:
| How many applications on your machine right now are GPU
| accelerated? There is a chunk of silicon that spends
| almost all its time doing nothing.
| moffkalast wrote:
| > very capable models
|
| Well that's the problem though, those models don't come any
| close to being _useful_ at all. At least not yet. And they
| also run much slower.
|
| As compute increases in general, there will be larger and
| more capable state of the art models and it'll make more
| sense to just use those instead of trying to run some local
| one that won't give you any useful answers. Data centers
| will always have a few orders of magnitude more horsepower
| than your average laptop, even with some kind of inference
| accelerator card.
| FL33TW00D wrote:
| Stable diffusion isn't useful at all?
| circuit10 wrote:
| "very capable" and "state of the art" aren't the same
| thing. In this case the gap is huge
| FL33TW00D wrote:
| This is true, but a good subset of tasks can be
| accomplished by much smaller models. Copilot is 12B!
| frozenlettuce wrote:
| I ran alpaca 7B in a 2015 Acer laptop (it was slow, but
| worked!)
| brucethemoose2 wrote:
| I can already run Vicuna(llama) 7B on my 2020, 14" PC laptop
| at ~3.5 tokens/sec, and more speed can definitely be squeezed
| out.
|
| Most future laptops and phones will ship with NPUs next to
| the CPU silicon. Once they get enabled in software, that
| means a 16GB machine can run a 13B model, or a 7B model with
| room for other heavy apps.
|
| As for the benefits of batching and centralization, that is
| true, but its somewhat countered by the high cost of server
| accelerators and the high profit margins of cloud services.
| burtonator wrote:
| We're stuck here for a while due to the size, and cost, of the
| larger models.
|
| The main reason I want a non-cloud LLM is that I want one
| that's unaligned.
|
| I know I'm not a criminal and I want to stop being reprimanded
| by GPT4.
|
| What I'm most interested here is fine tuning the model with my
| own content.
|
| That could be super valuable especially if we could get it to
| fact check itself, which you could with a vector database.
| shostack wrote:
| What's been so interesting with the explosion of this has
| been how prominently the corporately-driven restrictions have
| been highlighted in news and such.
|
| People are getting a good look in very easy to understand
| terms at the foundational stage at how limiting the future is
| to have this just be another big tech controlled thing.
| WhiteNoiz3 wrote:
| They have said that the alignment actually hurts the
| performance of the models. Plus for creative applications
| like video games or novels, you need an unaligned model
| otherwise it just produces "helpful" and nice characters.
| skybrian wrote:
| This is sort of like saying the world wide web is a fad. Many
| people made that argument, but a lot of desktop apps got
| replaced by websites even though they were supposedly inferior.
|
| ChatGPT works fine as a website and you don't need to buy a new
| computer to run it. You can access your chat history from any
| device. For many purposes, the only real downside is the
| subscription fee.
|
| If LLM's become cheaper to run, websites will be cheaper to
| run, and there will be lower-cost competition. Maybe even cheap
| enough to give away for free and make money from advertising?
| zirgs wrote:
| The other downside starts with an "As an AI language
| model..."
| ModernMech wrote:
| And ends with "...I'm afraid I can't do that."
| lairv wrote:
| This doesn't seem that obvious to me, serving LLMs through an
| API allows to have highly optimized inference with stuff like
| TensorRT and batched inference while you're stuck with batch
| size = 1 when processing locally.
|
| LLMs doesn't even require full real-time inference, there are
| applications like VR or camera stuff where you need real-time
| <10ms inference, but for any application of LLMs 200-500ms is
| more than fine
|
| For the users, running LLMs locally means more battery usage
| and significant RAM usage. The only true advantage is privacy
| but this isn't a selling point for most people
| pornel wrote:
| You're still thinking in terms of what APIs would be used
| for, rather than what local computation enables.
|
| For example, I'd like an AI to read everything I have on
| screen, so that I can ask at any time "why is that? Explain!"
| without having to copy paste the data and provide the whole
| context to a Google-like app.
|
| But without privacy guarantee (and I mean technical one, not
| a pinky promise to be broken when VC funding runs out)
| there's no way I'd feed _everything_ into an AI.
| brucethemoose2 wrote:
| We are very close to optimized ML frameworks on consumer
| hardware.
|
| And TBH most modern devices have way more RAM than they need,
| and go to great lengths to just find stuff to do with it.
| Hardware companies also very much like the idea of a heavy
| consumer applications.
| [deleted]
| thatwasunusual wrote:
| Is this a joke? I tried the HuggingFace demo, and it just spews
| out ridiculous replies for everything.
|
| Examples:
|
| >> What can you tell me about Therese Johaug?
|
| > The person you're asking about is Therese Johaug (The Judge) a
| Norwegian jazz violinist. She played with many famous musicians,
| including and especially Charles Mingus. She was born in 1954 in
| the town of Kolsas. She is married to the violinist Mark
| Sorensen.
|
| Everything in that statement is wrong. She is a Norwegian cross-
| country skier, born in 1988 in Dalsbygda, not married.
|
| Ok. Let's try something easier.
|
| >> What can you tell me about Edvard Munch?
|
| > -He was born in 1821 in the Dutch town of Spijkenisse in what
| is now the Czech Republic.
|
| > -He studied art at the Academy of Fine Arts in Brussels,
| Belgium, where he made a portrait of a young girl called Anna.
|
| > -He then moved to Norway to practice with his friends and Dano-
| Norwegian painter, Christian Krohg, the two of them known as "The
| Scream" or "Kristine in the Black".
|
| > -There he met the painter, Ambroise Thomas, who became a very
| close friend of Munch's and would often take him to the beach to
| watch the boats.
|
| > -One day in 1883 Munch bought a house at a deserted place in
| the outskirts of Oslo, in the southern region of Norway.
|
| > -The house, called "La Chute", had no inhabitants at all, it
| was just a huge pile of rocks, but this Munch loved it so much
| that he named it "Lochte".
|
| > -He took the name "Munch" because he thought it was German, but
| it was totally French, he had the idea that he could paint the
| ocean and the sky, so he named his new painting "Moulin Sur
| l'Oiseau".
|
| > -At the same time, he would soon start creating works of art
| that would be strange and not make any sense, for instance, one
| summer day in 1870 he painted what is known now as "The Scream",
| it has many other paintings with the same title, "The Scream" is
| a drawing of a young girl, she is holding an ax and is crying in
| the scream position.
|
| > -He had the idea of "The Scream" in one sitting in 1883, but it
| took six months to paint.
|
| > -He stayed in France in the studio of Christian Krohg when he
| was still Edouard de Badeley, one of the greatest painters of his
| time.
|
| > -In 1891 he returned to Norway to France and lived there for
| the rest of his life.
|
| > -He died in Paris in 1900.
|
| Christ on a bike.
|
| My favourite must be that "The Scream is a drawing of a young
| girl, she is holding an ax and is crying in the scream position."
| tmikaeld wrote:
| Agreed, it doesn't even do the basic stuff that GPT-2 did, how
| can anyone be impressed by this?
| franzypants wrote:
| I get the sense that any accurate portrayal of the model is
| getting downvoted. Sure, it's going to be a while before open-
| source models catch up with OpenAI. I'm glad that Stability is
| working on this, and I look forward to trying out their larger
| models that they're training right now, but we should be able
| to discuss accurately the capabilities of an LLM without just
| automatically jumping on the hype train.
| ShamelessC wrote:
| That's fair, but this:
|
| > Is this a joke? I tried the HuggingFace demo, and it just
| spews out ridiculous replies for everything.
|
| Is a shallow and uninformed criticism. The parameter count
| alone should inform people that this isn't going to match
| GPT-3.5 or GPT-4. Not everyone knows that of course, but
| starting a comment with "Is this a joke?" is precisely the
| wrong way to start such a discussion.
| adrianmonk wrote:
| Yeah, I've tried a few things, and so far I've gotten no
| answers I'd consider acceptable. Here's are some examples.
|
| -----
|
| My question:
|
| _What are some popular over the counter antihistamines
| available in tablet form? Please name three, and for each one,
| give the generic and brand name, and NO OTHER INFORMATION AT
| ALL._
|
| Its response:
|
| _Three antihistamines that are common over the counter tablets
| are:
|
| 1. Claritest
|
| 2. Zyrtec
|
| 3. Marzolast
|
| Brand names for these tablets are:
|
| 1. Claritest - Contains diphenhydramine hydrochloride, which is
| an expectorant and cold and allergy relief medicine.
|
| 2. Zyrtec - Contains diphenhydramine hydrochloride, which is an
| expectorant and antispasmodic medication for motion sickness.
|
| 3. Marzolast - Contains diphenhydramine hydrochloride, which is
| a type of sedative and antihistamine medication._
|
| Things wrong with this answer:
|
| * Claritest isn't an antihistamine. It's a non-invasive
| prenatal DNA screening test.
| (https://www.genpathdiagnostics.com/patients/womens-
| health/ge...)
|
| * Marzolast isn't an antihistamine. It isn't anything. It only
| has one meaningful hit on Google, which is a coupon code for a
| Udemy course.
|
| * It says Zyrtec contains diphenhydramine hydrochloride, which
| is an antihistamine, but the wrong one. Zyrtec contains
| cetirizine.
|
| * It gives three different, conflicting explanations for what
| diphenhydramine is. The last one is right, and the first two
| are totally wrong.
|
| * I asked it (pretty clearly!) to give no other information,
| but it gave a lot.
|
| -----
|
| My question:
|
| _What film made red staplers famous?_
|
| Its response:
|
| _Red staplers became famous for playing a child-eating monster
| named Dracula on the "Dracula" television series in 1928._
|
| Things wrong with this answer:
|
| * Television series in 1928?! There basically weren't any
| except an experimental one called The Queen's Messenger.
|
| * Red staplers isn't an actor. There's no playing anybody.
|
| * I'm pretty sure Dracula doesn't eat children. Wrong type of
| monster.
|
| -----
|
| My question:
|
| _What's the difference between a fork and a spoon?_
|
| Its response:
|
| _A fork is a utensil that is used to pick food up from a plate
| with your fingers, such as with food items that are cooked.
| Fingers are made of many different parts, including bones,
| muscles, and tendons. The two most basic kinds of utensils are
| spoons and forks. Forking is typically done by the dining
| creature's hands and includes handling both the bread and the
| fish._
|
| Things wrong with this answer: I'm not even going to bother
| this time.
| sroussey wrote:
| "also fine-tuned the model with Stanford Alpaca's procedure using
| a combination of five recent datasets for conversational agents:
| Stanford's Alpaca, Nomic-AI's gpt4all, RyokoAI's ShareGPT52K
| datasets, Databricks labs' Dolly, and Anthropic's HH. We will be
| releasing these models as StableLM-Tuned-Alpha."
|
| They have released the 3B and 7B of both the base and instruction
| tuned models. 30B and 65B in training and released later.
| dmm wrote:
| Issue to add support for these models to llama.cpp:
| https://github.com/ggerganov/llama.cpp/issues/1063
| aketchum wrote:
| Are they releasing the weights for download? The links to
| HuggingFace in the readme are giving me 404. This dataset they
| built on-top of "The Pile" sounds interesting - looking forward
| to evaluating their claim that 3-7 billion param models can
| perform on par with 175 billion param GPT-3
| artyomxyz wrote:
| It looks like it's available here
| https://huggingface.co/stabilityai/stablelm-base-alpha-7b/tr...
| turingfeel wrote:
| Did they claim this? I didn't see that claim made in the above
| post.
| aketchum wrote:
| "The richness of this dataset gives StableLM surprisingly
| high performance in conversational and coding tasks, despite
| its small size of 3 to 7 billion parameters (by comparison,
| GPT-3 has 175 billion parameters)."
|
| So they did not explicitly say it is comparable, but
| implicitly compared the two. I'm curious to evaluate what
| "surprisingly high performance" means exactly.
| mightytravels wrote:
| I love the installation of this! Just open a Python window and it
| downloads whatever it needs. Well done!
| dr_dshiv wrote:
| How can one get started adding RLHF to something like this?
| coolspot wrote:
| Open-assistant.io
| seydor wrote:
| Pretty soon there are going to be desktop apps for that
| jonplackett wrote:
| Anyone know if this will be added to the Stability API?
| lxe wrote:
| The model is very good. Looks like it's using GPT-NeoX structure,
| and the finetuned 7b version is very quick, and beats everything
| else in the category (vicuna, oasst, openchatkit, gpt4all,
| etc...) in my "write a poem about cheese" benchmarks.
| sebzim4500 wrote:
| I like how 'fun' benchmarks like yours seem to be so much more
| effective at judging model performance than all the traditional
| NLP benchmarks.
|
| I still think the best way to compare too models is to simulate
| a rap battle between them, then it's immediately obvious who
| wins.
| coolspot wrote:
| Yep, rap battle is how I knew Bard is nowhere close to
| ChatGPT.
|
| In the past whole world was watching Kasparov vs Deep Blue .
| This time we will do Eminem vs LLM.
|
| What a time to be alive!
| SeanAnderson wrote:
| I'm curious if this will give better results than llama 7B? Llama
| 7B felt like a toy that, while cool to be able to run locally,
| did not feel useful in any way when contrasted to the state of
| GPT. Here's hoping for better and/or release of larger parameter
| models with low performance requirements soon :)
|
| EDIT: my first question times out when ran online, seems like
| huggingface is getting hugged to death.
| nwoli wrote:
| They have on their team people from Anthropic which have stuff
| like Claude Instant which is likely running a very light model,
| ie the tricks from Anthropic have likely been incorporated into
| the model they're presenting here
| HarHarVeryFunny wrote:
| If you're comparing to ChatGPT performance then Vicuna 13B
| would be a best comparison point for something Llama-based.
| bugglebeetle wrote:
| Vicuna 13B performance is an order of magnitude below ChatGPT
| for all but gimmicky conversational stuff. Try giving both
| somewhat large, task-based prompts with steps and see what
| happens.
| HarHarVeryFunny wrote:
| Maybe I should have phrased that better! I didn't mean that
| Vicuna was comparable to ChatGPT, just that it's the best
| Llama-based comparison you can make (since it's at least
| been conversationally trained).
| dragonwriter wrote:
| > Vicuna 13B performance is an order of magnitude below
| ChatGPT for all but gimmicky conversational stuff.
|
| Until you connect it to external resources, I tend to think
| of anything you do with "brain-in-a-jar" isolated ChatGPT
| as gimmicky conversational stuff.
| goatlover wrote:
| ChatGPT is still going to be way more capable when you
| use it's API to connect to external resources.
| levesque wrote:
| Isn't ChatGPT a 165B parameter model?
| sanxiyn wrote:
| No. OpenAI haven't disclosed parameter count of GPT-3.5 or
| GPT-4, which are models used by ChatGPT. You may be
| thinking of GPT-3, which is indeed a 175B parameter model.
| levesque wrote:
| Ah, interesting. Thought GPT-3.5 had the same structure
| as GPT-3, for some reason. GPT-4 would obviously be
| different.
| pksebben wrote:
| Even if it doesn't initially, the fact that it's being released
| so permissively is massive - stable diffusion was made far more
| powerful by being hackable at all levels and I can't imagine we
| won't see the same here.
|
| I imagine things like control nets that restrict output to
| parsable types, LoRa style adaptations that allow mixable
| "attitudes", that sort of thing.
|
| Very different underlying architecture from diffusers, ofc. But
| the action of open source is the same - a million monkeys with
| a million xterms and so forth.
| edkennedy wrote:
| I'm really hoping for the ability to load in different sets
| of trained material as embeddings/textual inversions like in
| Stable Diffusion. Imagine scanning in some of your favorite
| philosophy and design books and throwing them with small
| weighting as a little flavor to your answer. The crossovers
| between LLM and Stable Diffusion type models (like Loras) is
| such a fascinating space to explore.
| cheald wrote:
| This has been around for GPT models for a while in the form
| of "soft prompts", which are rather approximate to textual
| inversion in the Stable Diffusion space.
| alex_sf wrote:
| This already exists for LLaMA, I'm sure it'll be
| implemented for StableLM as well.
|
| https://github.com/lxe/simple-llm-finetuner
| anononaut wrote:
| My prayers have been answered
| IceWreck wrote:
| StabilityAI is the real OpenAI. Thanks for this.
| diminish wrote:
| Well said. Openai is a promise unkept. Thanks StabilityAI for
| existing.
| nashashmi wrote:
| when has opensource ever spearheaded independent innovation?
| they usually follow along.
|
| Fred Wilson once did a take on all trends in SV. First some
| firm comes out with a product that changes the landscape and
| makes a massive profit. Then some little firm comes along and
| does the same for a cheaper price. Then some ambitious group
| out of college comes out with an open-source version of the
| same.
|
| Open source has never been a trailblazer of innovation. Open
| "research" was the original mantra for open ai. And an
| entrepreneur in residence put together a great product. If
| they were any more open, it would not make sense.
| bugglebeetle wrote:
| > Open source has never been a trailblazer of innovation.
|
| Except for, you know, all the major programming languages
| and Linux, which make all that innovation possible in the
| first place. Also, everything OpenAI is doing is based on
| open source stuff from Google and others, so...
| nashashmi wrote:
| True. Should have added open research has given lots of
| puzzle pieces for innovation.
|
| And open source products has led to many individual
| contributions.
|
| But again it's never been a trailblazer for innovation.
| sainez wrote:
| This could not be more untrue.
|
| The world is littered with businesses that operate as
| commercial wrappers around open source technology. Ever
| heard of GitHub? What about MacOS? AWS?
| capableweb wrote:
| Unfortunately, due to the law of names, StabilityAI will in
| the future hit the same issue as OpenAI and do a 180,
| unleashing very unstable AI to the world.
| mike_hearn wrote:
| More like Stability will turn out to be an unstable
| company. Last we heard they were struggling to raise more
| funding and might lose their CEO due to unclear business
| models:
|
| https://www.semafor.com/article/04/07/2023/stability-ai-
| is-o...
| paxys wrote:
| The company can cease operations tomorrow, but the model
| they open sourced (and all of its derivatives built by
| the community) will continue to exist. If OpenAI
| disappears then all of the work they have done goes with
| it.
| IceWreck wrote:
| I'd rather have that than closed AI models controlled by
| afew corporations
| seydor wrote:
| then they both have ironic names now
| jacooper wrote:
| Not fully. The instruction tuned model is CC-BY-NC-SA
| alex_sf wrote:
| That's a limitation of the dataset used for that particular
| tuned model. Probably not a great choice on their part given
| that people aren't reading past the headline, but the actual
| base model is not restricted.
| dragonwriter wrote:
| Its CC-BY-NC-SA because of the upstream sources used for
| instruction training. There's open resources being developed
| for that that I've seen, but probably nothing ready.
| Kelamir wrote:
| What does this mean in context of https://futureoflife.org/open-
| letter/pause-giant-ai-experime... , where CEO of Stability AI had
| left a signature on the petition(it's 8th in the list)? Do they
| want time to catch up?
|
| Regardless of that, I'm glad that StabilityAI enters the field as
| well and releases models for public use.
| sanxiyn wrote:
| It doesn't mean anything. The letter says:
|
| > Therefore, we call on all AI labs to immediately pause for at
| least 6 months the training of AI systems more powerful than
| GPT-4.
|
| StableLM is not an AI system more powerful than GPT-4, so the
| pause does not apply.
| samstave wrote:
| To whom, specificlally does the "pause apply"
|
| Because, I can tell you that no state-agent is going to
| pause, but amplify.
|
| Israel, Iran, China, Russia and any self-respecting NATO
| country is secretly pushing their AI as fast as fn possible.
|
| You think the US is pausing anything with a 1 trillion dollar
| defense budget, especially when this AI has surfaced?
|
| The NSA has been projecting these capabilities forever....
|
| Look at the movie "enemy of the state" as a documentary on
| capabilities as early as 1998... now look at the fractal
| spiral that we are witness (and victim) of.
| moonchrome wrote:
| All these state actor stories imply that state actors are
| anywhere close to GPT4 levels. Do you have any evidence for
| that ?
| samstave wrote:
| >> _Do you have any evidence for tha_
|
| I do not, yet I am a _SUPER SKEPTIC_ -- > means I am a
| conspiracy weirdo that doesnt believe a gosh darn thing
| any government says, but I am also a technologist who is
| not ignorant to things which have been built in secrecy.
|
| Thus ;; I summize that some crazy shit is going on with
| AI behind the scenes that we are not privy to -- and if
| one persons reality of _" you cannot believe that _they*
| are doing anything with AI that we dont know about"* ...
| to paraphrase a few "A nuke is literally about to fall on
| our heads"
|
| --
|
| We are moments away from realizing that it ALREADY
| happened....
| goatlover wrote:
| As far as we know, OpenAI has the most advanced LLM in the
| world right now. Just because governments can fund all
| sorts of expensive military contracts or spying networks
| doesn't mean they have the cutting edge AI research. Lot of
| times it's companies that make the breakthroughs in tech.
| samstave wrote:
| Just so I understand how to come to that conclusion ; how
| did you come to that conclusion?
|
| Do we have _any_ transparent measure?
| AppleBananaPie wrote:
| I'd love to hear from someone knowledgeable in the area but my
| gut instinct was the majority of people / organizations signing
| on to that 'pause' just wanted to catch up.
| sebzim4500 wrote:
| Either that or the guy didn't actually sign it. It's not like
| anyone actually checked who it was.
| timestretch wrote:
| He's mentioned signing it in interviews. He discusses it
| here: https://youtu.be/SKoYhcC3HrM?t=1344
| huseyinkeles wrote:
| This is amazing. They even let the developers use it for
| commercial purposes;
|
| "Developers can freely inspect, use, and adapt our StableLM base
| models for commercial or research purposes, subject to the terms
| of the CC BY-SA-4.0 license."
|
| You can use this link to interact with the 7B model;
|
| https://huggingface.co/spaces/stabilityai/stablelm-tuned-alp...
| IceWreck wrote:
| Is there a way to check the queue for this ?
|
| I sent it one small text (actually a task) five minutes ago.
| Its still loading.
| pksebben wrote:
| same. Probably hugged to death.
| convery wrote:
| It just finished building.
| IceWreck wrote:
| Just tested it. I tried to get it to write a conclusion
| after giving it my report and while is was correct, it
| was kind of silly, a far cry from chatgpt. But again,
| this is the 7B variant and its open source.
| ilaksh wrote:
| Amazing but the instruction fine-tuning is still a huge challenge
| for businesses since what is released cannot be used for
| commercial purposes. Instructions are much more useful.
|
| I have a feeling that there are probably some people who will
| look at the "commercial okay" license for the first part and in
| their mind that will somehow make it okay to use the instruction-
| tuned ones for commercial purposes.
|
| Maybe we don't really need Instruct stuff? Because it seems like
| its a huge amount of redoing work. I wonder if the OpenAssistant
| people will start building off of these models.
| dougmwne wrote:
| The instruct tuning can be done with several open datasets at
| minimal cost. Should be easy for someone to create their own
| open model.
| jacooper wrote:
| How?
| dougmwne wrote:
| https://github.com/tatsu-lab/stanford_alpaca
___________________________________________________________________
(page generated 2023-04-19 23:00 UTC)