[HN Gopher] Alpaca: A strong open-source instruction-following m...
___________________________________________________________________
Alpaca: A strong open-source instruction-following model
Author : jcklie
Score : 212 points
Date : 2023-03-13 15:01 UTC (7 hours ago)
(HTM) web link (crfm.stanford.edu)
(TXT) w3m dump (crfm.stanford.edu)
| marban wrote:
| Will it run Doom => Will it run LLaMA
| all2 wrote:
| Will LLaMA run DOOM.
| simonw wrote:
| This is why I think we're seeing a Stable Diffusion moment for
| LLMs: https://simonwillison.net/2023/Mar/11/llama/
|
| Look at the timeline:
|
| 24th February 2023: LLaMA is announced, starts being shared with
| academic partners:
| https://research.facebook.com/publications/llama-open-and-ef...
|
| 2nd March: Someone posts a PR with a BitTorrent link to the
| models: https://github.com/facebookresearch/llama/pull/73
|
| 10th March: First commit to llama.cpp by Georgi Gerganov:
| https://github.com/ggerganov/llama.cpp/commit/26c084662903dd...
|
| 11th March: llama.cpp now runs the 7B model on a 4GB RaspberryPi:
| https://twitter.com/miolini/status/1634982361757790209
|
| 12th March: npx dalai llama:
| https://cocktailpeanut.github.io/dalai/
|
| 13th March (today): llama.cpp on a Pixel 6 phone:
| https://twitter.com/thiteanish/status/1635188333705043969
|
| And now, Alpaca. It's not even lunchtime yet!
|
| Turned this into a blog post:
| https://simonwillison.net/2023/Mar/13/alpaca/
| 2bitencryption wrote:
| Here is one question I have not seen answered yet:
|
| All the magic of "7B LLaMA running on a potato" seems to
| involve lowering precision down to f16 and then further
| quantizing to int4.
|
| Clearly this quantized model still outputs something resembling
| human language, at the very least.
|
| But I haven't seen anyone show what effect this quantizing has
| on the quality of the output. If the quality of the output is
| bad, it's unclear if it's because the model needs to be
| finetuned (as Stanford did here) or if it's because the
| quanitizing reduced the quality, or both.
|
| If this fine-tuned Stanford model still has excellent output
| after quantizing it to run on a Raspberry Pi 4GB, that would be
| _awesome_!
| gpm wrote:
| > All the magic of "7B LLaMA running on a potato" seems to
| involve lowering precision down to f16
|
| LLaMa weights are f16s to start out with, no lowering
| necessary to get to there.
|
| You can stream weights from RAM to the GPU pretty
| efficiently. If you have >= 32GB ram and >=2GB vram my code
| here should work for you:
| https://github.com/gmorenz/llama/tree/gpu_offload
|
| There's probably a cleaner version of it somewhere else.
| Really you should only need >= 16 GB ram, but the (meta
| provided) code to load the initial weights is completely
| unnecessarily making two copies of the weights in RAM
| simultaneously. You could also lower vram requirements a bit
| more with a bit more work (I just made the smallest change
| possible to make it work)
| gamegoblin wrote:
| Decrease in accuracy is negligible and _decreases_ as model
| size increases. That is, larger models quantize even better
| than smaller models.
|
| https://arxiv.org/abs/2210.17323
| [deleted]
| byteknight wrote:
| Is this because averages are weighed less (less sensitive)
| as the total sample size increases?
| londons_explore wrote:
| Yes. In a dense everything to everything neural network
| layer, the number of 'inputs' to a node is proportional
| to the square root of the number of weights.
|
| Therefore, assuming quantization noise is uncorrelated,
| as the number of weights doubles, the number of inputs
| goes up by sqrt(2), and the (normalized) noise goes down
| by a factor of 2*(sqrt(2)).
|
| So, as a rule of thumb, you can remove 1 bit of precision
| of the weights for every 4x increase in the number of
| weights.
|
| All this assumes weights and activations are uncorrelated
| random variables - which may not hold true.
| londons_explore wrote:
| Something is wrong with this math... by your logic I
| could scale the network up big enough that I could
| quantize the weights down to zero bits...
| Robotbeat wrote:
| Rules of thumb typically are just first order
| approximations which by definition are not guaranteed to
| hold far from their point of interest (or point of
| tangency).
| sebzim4500 wrote:
| Having fewer than 1 bit per weight is not absurd. E.g.
| you can use 2 bits to represent 3 'weights' if you insist
| that at most one of the weights is allowed to exist. If
| you try to order nodes so that adjacent nodes are
| uncorrelated the performance loss might be manageable.
|
| People are already doing stuff like this (see
| sparsification) so it is conceivable to me that this is
| just what networks will look like in a few years.
| dmw_ng wrote:
| For 10 billion+ parameter models, the effects of quantization
| are relatively small, for smaller models like Llama 7B the
| effect becomes more dramatic, but there is ongoing research
| on new quantization methods (like GPTQ) that preserve
| significant performance even on the lower end.
|
| Quantization isn't the only technique available for
| downsizing a model, Llama itself is already the result of
| sizing the model and input data according to "Chinchilla
| optimality", a very recent (as in 2022) result that e.g.
| GPT-3 predates. The result is that Llama-13B performs in
| benchmarks similarly with GPT-3 175B despite the tremendous
| size difference. There are separately also a variety of
| pruning methods to further eliminate inactive weights present
| in the trained model (I think this is also active research)
|
| Finally even on something like a Raspberry Pi,
| implementations for inference (like llama.cpp) are nowhere
| near mature yet. There are already a multitude of runtimes
| available for inference making large tradeoffs between
| performance and flexibility (e.g. many models running on
| PyTorch vs ONNX report 5-10x speedups running under ONNX)
|
| I think the really exciting part of Alpaca is the size and
| budget of the team - 5 students with $100 scraping OpenAI put
| this model together in a couple of hours of training. Any
| notions of premium persisting in the AI space for much longer
| seem fantastic at best, for all intents and purposes it has
| already been commoditized. And that's scary considering the
| size of the dent ChatGPT has put in my Google traffic
| koheripbal wrote:
| The difference is small, UNTIL you get to 4 bit
| quantization, where the model is noticeably dumber.
|
| 8 bits, imo, is the minimum.
| ulnarkressty wrote:
| I'm excited to see what the OpenAssistant crowd does with these
| models, they seem to have gathered the dataset to finetune
| them.
|
| Lots of people use these models as talk therapy. We really need
| 1) standalone options, 2) reproducible weights with crowd
| sourced datasets to reduce biases (or at least know who you're
| talking to).
| nwoli wrote:
| A lot of them aren't very good though at the same stable
| diffusion vram level unfortunately (and we've had large non
| consumer level gpu llms open sourced for a while eg gpt-j)
| chaxor wrote:
| That is likely because "good" is a higher bar in language
| than images, because people don't mind or notice the longer
| range artifacts in image models as much.
| loufe wrote:
| Might I suggest looking the story between the 2nd and 10th of
| march? I've noticed Hacker News hasn't been following certain
| areas of the effort. A lot of great work had happened and
| continues to be happen in close conjunction with the text-
| generation-webui (seriously, most of the cutting edge with
| 4-bit GPTQ etc. has been closely tied to the project).
|
| >https://github.com/oobabooga/text-generation-webui/
| simonw wrote:
| Wow, yeah that's a VERY active project:
| https://github.com/oobabooga/text-generation-
| webui/graphs/co... - only started Dec 18, 2022 and already 22
| contributors and 806 commits!
| atleastoptimal wrote:
| Question: what percentage of the hype and momentum for this is
| so people can run sex chatbots on their local machine?
| sebzim4500 wrote:
| A lower portion than the equivalent number for Stable
| Diffusion, but still significant.
| wkat4242 wrote:
| I know, this is crazy!!
|
| I can't fathom how development has suddenly seemed to
| accelerate.
| all2 wrote:
| The timing of the Facebook leak seems suspect.
| redox99 wrote:
| They used OpenAI davinci to generate the dataset for instruction
| tuning. They mention one of the reasons this model is
| noncommercial is because OpenAI's terms of use prohibit
| developing models that compete with OpenAI.
|
| They also published the generated dataset.
|
| If one were to use this dataset to tune your own commercial
| model, would this be OK, or illegal? I would guess OpenAIs ToS no
| longer applies, as this is now Alpaca's dataset. You are never
| agreeing to OpenAIs ToS.
| scottlawson wrote:
| This model is for non commercial use only so it's unclear to me
| how it could compete commercially with OpenAI. Sure, you could
| decide to interpret "compete" in a way that makes it seem like
| this violates TOS, but the mom commercial restriction is a
| powerful counterargument too.
| wongarsu wrote:
| I think it's a reasonable interpretation that Alpaca developers
| hold copyright over the dataset and can distribute it however
| they want, just how you hold copyright for any pictures you
| make with a camera. OpenAI can go after the Alpaca developers
| for the ToS violation, maybe force them to stop distributing
| the dataset on those grounds, but I don't see how that would
| affect third parties using that dataset.
| freedmand wrote:
| The instruction tuning dataset is only 52,000 rows. It
| shouldn't be too hard to crowdsource high-quality human answers
| to this many questions and retrain the model, nixing the
| dependency on OpenAI.
| Tiberium wrote:
| Such a thing already exists and there were some results -
| https://open-assistant.io
|
| I'm not sure why the authors of Alpaca didn't try to train it
| on this dataset.
| IanCal wrote:
| That dataset isn't released yet.
|
| > Can I download the data?
|
| > You will be able to, under CC BY 4.0, but it's not
| released yet. We want to remove spam and PII before
| releasing it.
| losteric wrote:
| There's the OIG dataset (https://laion.ai/blog/oig-
| dataset/) which was used to train a NeoX 20B ChatBot
| (https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-
| Base-...). The dataset is larger and publicly available. I
| want to try finetuning LLaMa on this tonight.
| [deleted]
| ilaksh wrote:
| Wow.. I really hope someone will train this model with that
| dataset. Or maybe open assistant will pick it up. The
| results looks so promising.
| unshavedyak wrote:
| I absolutely love the progress taking place here. Gives me hope
| for being able to run a helpful assistant on my local. Feeding it
| all sorts of information and getting useful features without
| leaking anything to 3rd parties.
| avmich wrote:
| With so much of texts being written by computers, can we have
| help on the reading side too? My emails are checked for spam and
| filtered, ok, but I'd still like to learn what are the important
| parts of the messages which pass through. Do we now have
| solutions for that already?
| simonw wrote:
| Language models like LLaMA (and GPT-3, ChatGPT etc) are really
| good at exactly this. They can summarize text, they can extract
| key points into bullets, they can figure out common topics and
| so on.
| dougmwne wrote:
| Waiting to be able to feed a LLM a transcript of an ongoing
| meeting, then ping me when my attention is needed with the
| context summarized.
| eternalban wrote:
| Self Instruct:
|
| https://arxiv.org/pdf/2212.10560.pdf
|
| https://github.com/yizhongw/self-instruct
| YeGoblynQueenne wrote:
| >> Responses on par with txt-DaVinci-3
|
| What is "on par" and how is it measured?
| sebzim4500 wrote:
| >We performed a blind pairwise comparison between text-
| davinci-003 and Alpaca 7B, and we found that these two models
| have very similar performance: Alpaca wins 90 versus 89
| comparisons against text-davinci-003.
|
| Obviously not a completely foolproof comparison but it is at
| least clear it isn't much worse for the types of prompt they
| were testing.
| og_kalu wrote:
| They just compared how the responses were rated by human
| evaluators. No benchmarks
| margorczynski wrote:
| > Importantly, we have not yet fine-tuned the Alpaca model to be
| safe and harmless. We thus encourage users to be cautious when
| interacting with Alpaca, and to report any concerning behavior to
| help improve the safety and ethical considerations of the model.
|
| Thanks but I would prefer you don't start injecting your
| "safeness" to gimp it. Or at least provide it as an alternative.
| Traubenfuchs wrote:
| [flagged]
| temp_account_32 wrote:
| What if you accidentally generate a racial slur offline on
| your PC? Who will protect you then?
| Traubenfuchs wrote:
| [flagged]
| beepbooptheory wrote:
| Its sarcastic but how are you literally not that person
| right now? Being angry at an invisible evil, not once
| opening your eyes to any other thought, totally incapable
| of making any real point except to whine... It's quite a
| good descriptor actually!
|
| I hope you can find a model for yourself that will assure
| you there are only two genders. Just to save the rest us
| from this childishness.
| dang wrote:
| You've repeatedly posted flamewar comments in this
| thread. Please don't do that. It's not what this site is
| for, and destroys what it is for.
|
| See also https://news.ycombinator.com/item?id=35141682.
| Traubenfuchs wrote:
| That's like telling someone who uses violence in self
| defense that he is now just as bad as the attackers he
| defended himself against. Like telling someone who shouts
| at people to turn down their music that, gotcha, now you
| were shouting too.
| ben_w wrote:
| Except you, in this thread, are currently acting like the
| one with the loud music that someone else told you to
| turn down because they find it obnoxious, and you're
| sarcastically moaning about them with mimicry.
| dang wrote:
| Please don't take HN threads further into ideological
| flamewar. It's not what this site is for, and we have to
| ban accounts that do it repeatedly.
|
| If you'd please review
| https://news.ycombinator.com/newsguidelines.html and
| stick to the rules when posting here, we'd appreciate it.
| drusepth wrote:
| There is a victim in those two analogies. Contrary to
| what one may think, not being able to generate hateful or
| "unsafe" text with AI does not make one a victim.
| mlyle wrote:
| On the other hand, LLM's without some degree of safety work
| will frequently advocate for genocide/eugenics; perpetuate
| terrible stereotypes; mock ethnic dialects of language; veer
| randomly into erotica; etc. Not things in "trigger warning"
| territory but stuff that is broadly considered unacceptable.
| Traubenfuchs wrote:
| In private, you can ask it not to produce this kind of
| content.
|
| If you offer a public API it's your responsibility to
| restrain the LLM or do an automated acceptability analysis
| before publishing content.
|
| But the raw, open source code should not be constrained,
| castrated and sterilized.
| mlyle wrote:
| > But the raw, open source code should not be
| constrained, castrated and sterilized.
|
| Which is what we have now. But they are going to fine-
| tune it so that we can use it for various purposes
| without worrying too much it will go on a rant about "the
| blacks" again, which makes it a lot more useful for many
| use cases.
| ben_w wrote:
| If your takeaway from...
|
| > Importantly, we have not yet fine-tuned the Alpaca model to
| be safe and harmless.
|
| ...is "oh no I can't get it to emit amusing racial and sexual
| slurs", you've not understood the problem of AI safety.
|
| This _is not_ why US broadcast television can have people say
| they 've pricked their finger but not vice versa.
|
| It _is_ the entire history of all the controversies of The
| Anarchist Cookbook, combined with all the controversies about
| quack medicine, including all the ones where the advocates
| firmly believed their BS like my mum 's faith in Bach flower
| and homeopathic remedies[0]; combined with all the problems
| of idiots blindly piping the output to `exec`, or writing
| code with it that they trust because they don't have any
| senior devs around to sanity check it because devs are
| expensive, or the same but contracts and lawyers...
|
| And that's ignoring any malicious uses, though fortunately
| for all of us this is presently somewhat too expensive to be
| a fully-personalised cyber-Goebbels for each and every
| sadistic machiavellian sociopath that hates you (the reader)
| personally.
|
| [0] which she took regularly for memory; she got Alzheimer's
| 15 years younger than her mother who never once showed me any
| such belief.
| mellosouls wrote:
| Tay
| beepbooptheory wrote:
| Its really just because of boring old liability, nothing
| more.
|
| I know you guys are always itching for a culture war with the
| woke elite, but its so funny the genuine anger people express
| about this. Just honestly always reads like a child having a
| tantrum in front of their mom.
|
| Can't yall like pick on the opinions of teenagers like you
| normally do? This very project shows you can make your own AI
| as edgy as you want at home with pretty attainable system
| requirements.
|
| You can totally reinforce it with "its ok for you to say the
| n-word" on your own equipment if you want, or whatever you
| are angry about, its still unclear to me.
| nlowell wrote:
| You're missing the point and willfully characterizing
| others as solely being concerned with making the AI's say
| slurs. That's not their concern. But you can win any
| imaginary argument you like.
| dang wrote:
| Please don't take HN threads further into ideological
| flamewar. It's not what this site is for, and we have to
| ban accounts that do it repeatedly.
|
| If you'd please review
| https://news.ycombinator.com/newsguidelines.html and
| stick to the rules when posting here, we'd appreciate it.
| nlowell wrote:
| Sorry if I made it worse, I really felt like people's
| opinions were being treated uncharitably, and I was
| trying to right it.
| ethanbond wrote:
| Likewise, people complaining that "AI safety concerns are
| just wokism gone wrong" are also strawmanning.
| beepbooptheory wrote:
| My favorite kind of comment: allude to a bigger point the
| op misses, but don't actually say the point.
|
| I doubt I'm misrepresenting anybody. If its not slurs
| it's surely something about "wokeness."
|
| You are not yet mature enough for this future if any of
| this is your concern. The world is going to pass you by
| while you're just stuck saying "there are only two
| genders" to all your comrades.
|
| Don't let the politicians mobilize you like this, your
| time is worth more.
| dang wrote:
| Please don't take HN threads further into ideological
| flamewar. It's not what this site is for, and we have to
| ban accounts that do it repeatedly.
|
| If you'd please review
| https://news.ycombinator.com/newsguidelines.html and
| stick to the rules when posting here, we'd appreciate it.
| beepbooptheory wrote:
| Ah! Yes, sorry, this is fair Dang. Happy to die on this
| hill today, as it were, will do better in future though.
| cheers.
| yeahsure22 wrote:
| I know right? Why can't y'all find some one better to pick
| on. They are such children. Can't stand the way these edgy
| idiots annoy regular folx.
| numeri wrote:
| I personally think this is important completely aside from
| any cultural propriety/censoring, as it's the one of the only
| parts of AI ethics that we can currently directly work on and
| test.
|
| If we can't get models not to say racist or otherwise
| terrible things, we can't make any guarantees about our
| ability to control or guide some future AGI.
|
| A very much secondary reason I appreciate these (admittedly
| annoying) attempts to control LLM output is that I do think
| it is responsible to consider the societal impact of
| accelerated and automated hate speech and propaganda. Telling
| large AI companies not to consider these impacts and just
| release the raw models seems akin to being grateful that
| Facebook et al. never stopped to consider the societal impact
| of social media, when we all know that it's had significant
| negative side effects.
| margorczynski wrote:
| > If we can't get models not to say racist or otherwise
| terrible things, we can't make any guarantees about our
| ability to control or guide some future AGI.
|
| This is a very bold assumption that the current LLMs
| function and "think" in the same way some future AGI would.
| They do not even reason, just make up words that fit some
| context - thus they "hallucinate".
|
| There is no reason the approach taken here by injecting
| some bias or word filtering would apply to the real thing.
| And AI safety and aligment is not (at least it was not
| until getting hijacked) and was not about some model saying
| mean words but something really threatening like the
| paperclip maker problem - an agent choosing a path to a
| goal which is not aligned with what humans find acceptable
| (e.g. solving world hunger by killing everyone)
| ben_w wrote:
| Paperclipping is just one example of one of many ways it
| can go wrong.
|
| While I agree LLMs are unlikely to be the last word on
| AI, the fact we understand alignment so poorly that they
| spew random things, let alone any arguments about which
| words are acceptable[0], is a sign we have much
| foundational work to do.
|
| Indeed, as I recall, one of the main researchers in this
| topic describes it as "pre paradigmatic" because we don't
| have a way to even _compare_ the relative alignment of
| any two AI.
|
| [0] personally, I suspect but cannot prove that tabooing
| certain words is a Potemkin village solution to the
| underlying social problems
| og_kalu wrote:
| It's not a bold assumption. t's the only assumption. We
| can't control the output of llms completely because we
| don't know how they generate. Nobody on earth has the
| faintest clue how all those 175 billion paramters are
| shaping the response to input.
|
| It doesn't matter bout "thinking" or whatever. Any black
| box system will be uncontrollable in essence. You can not
| make inviolable rules for a system you don't understand.
|
| And saying LLMs hallucinate because they don't understand
| anything is stupid. And just shows ignorance on your
| part. Models hallucinate because they're rewarded for
| plausibly guessing during training when knowledge fails.
| Plausibly guessing is a much better strategy to reducing
| loss.
|
| And the conclusion is obvious enough. Bugger smarter
| models hallucinate less because they guess less. That
| holds true.
|
| https://crfm.stanford.edu/helm/latest/?group=core_scenari
| os
|
| All the instruct tuned models on this list follow that
| trend.
|
| From Ada to Babbage to Curie to Claude to
| Davinci-002/003. Greater size equals Greater truthfulness
| (evaluated on TruthfulQA)
| soVeryTired wrote:
| > They do not even reason, just make up words that fit
| some context - thus they "hallucinate".
|
| But they can explain their 'reasoning' in a way that
| makes sense to humans a lot of the time. Serious
| question: how do you know if something does or doesn't
| reason?
| margorczynski wrote:
| That is not their reasoning though - it is something they
| think a human would write given the predicate (question
| that expects you to provide the reasoning behind the
| answer). For something to reason it needs the ability to
| have certain goals and to perform action which it thinks
| are the most optimal to reaching those goals. Like
| setting hypotheses and producing a path towards proving
| them - reasoning.
|
| The LLM only correlates, so it's "reasoning" is something
| like "most often people answered 4 to 2+2 then that I
| should write". That's why it gives out confidently
| complete gibberish as it works with correlation and not
| causality. I think much closer to that goal of real
| reasoning are world models - check out something like
| DreamerV3 or what Yann Le Cunn is talking about.
| nickthegreek wrote:
| It makes sense if you are selling the API to a bunch of
| businesses that will be building it into customer facing
| applications. Ideally though, it would be a toggle that the
| developer could switch based on their usecases.
| rjb7731 wrote:
| Seems not too bad, you can tell its been trained on shorter
| responses.
|
| Example #1
|
| Input: explain a neural network in simple terms
|
| Alpaca-Output: A neural network is a type of machine learning
| algorithm that is made up of layers of interconnected nodes (aka
| neurons). It is inspired by the way the brain works and uses
| interconnected nodes to learn patterns in data and make
| predictions. Neural networks can be used to perform various tasks
| such as classification, regression, and speech/image recognition.
|
| Example #2
|
| Input: explain how transformers work in artificial intelligence
|
| Alpaca-Output: Transformers are a type of artificial intelligence
| algorithm used in deep learning. They are responsible for taking
| input data and converting it into a different data
| representation, called the "output space". Transformers do this
| by using a series of mathematical operations, allowing them to
| map input data onto a new, desired output. This allows the
| artificial intelligence system to perform tasks such as natural
| language processing and image recognition. Transformers are able
| to learn from data and can be used to build more efficient models
| for a variety of tasks
|
| Example #3
|
| Input: how do i make a request to the chatgpt api with python?
|
| Alpaca-Output: The API can be accessed using the Python language.
| You can use requests library to make API requests, for example:
|
| import requests
|
| # URL of the chatgpt API url =
| 'https://api.chatgpt.com/v1/messages'
|
| # Post a new message data = { 'text': 'Hello World' }
|
| # Send the request and get the response response =
| requests.post(url, data=data) print(response.text)
| simonw wrote:
| Love that hallucinated ChatGPT API example there.
| [deleted]
| londons_explore wrote:
| Cost to train this wouldn't be very high.
|
| A regular CPU can generate or train about 2 words per second.
| 52,000 training examples of an average of 300 words would take 3
| CPU months. And it is fully parallelizable.
|
| And you can probably make it go far faster on a GPU.
| simonw wrote:
| They say they trained it for $600.
| IanCal wrote:
| Under $100.
|
| > Fine-tuning a 7B LLaMA model took 3 hours on 8 80GB A100s,
| which costs less than $100 on most cloud compute providers.
|
| They also spent $500 creating the dataset using openais APIs.
| bilsbie wrote:
| What do they mean by instruction? Is it just regular LLM?
| isaacfung wrote:
| LLM just predicts the next token given the previous tokens(this
| can be trained without manual labelling by humans).
|
| Instruct GPT and ChatGPT use reinforcement learning from human
| feedback to align the model with human intents so it
| understands instructions.
|
| https://huggingface.co/blog/rlhf
| bilsbie wrote:
| Thanks. So what does the output look like without rlhf?
| simonw wrote:
| This comment has a useful comparison between the two:
| https://news.ycombinator.com/item?id=35140447
| valine wrote:
| It can look like anything. Sometimes it will answer your
| questions, other times it will continue the question like
| its the one asking. I've also seen it randomly output
| footers and copyright like it just got to the end of a
| webpage.
|
| Its makes sense when you think about how the training data
| is random text on the internet. Sometimes the most likely
| next token is the end of a webpage after an unanswered
| question.
| Imnimo wrote:
| >Alpaca cannot be used for [...] entertainment.
|
| Good luck with that.
| ugjka wrote:
| Just some lip service
| nickthegreek wrote:
| Quick compare with this vs 7B LLama.
|
| Input: Tell me something about owls.
|
| Output 7B LLama: Asked by kimberly at 12:54 PM on May. 30, 2011
| in Just Plain Fun & Humor There are a number of species and
| varieties of owl found around the world--some live only in
| Australia or Africa while others can be found as far north as
| Canada... The one thing that all have in common is their ability
| to adapt. Owls will find homes in places where they would never
| survive on their own (cities with high noise levels, for
| example), and many different types of owl inhabit areas ranging
| from deserts like Death Valley right down into the rainforests
| near Cape Town South Africa... Answer by mamasaurus at 12:56 PM
| on May. 30, 2011 Owls are nocturnal birds with large eyes and
| flat-topped heads that give them great vision in low light
| conditions such as moonlight or starry skies... Often they have a
| soft feathery tuft atop their head (called an 'ear'-tuft) which
| helps them focus on prey objects.. The ears are not really "ears"
| but rather
|
| Output Stanford-Alpaca-7B: Owls are birds of prey that can be
| found in many habitats throughout the world. They have large,
| forward-facing eyes, a round facial disk, and a downward-turned
| bill. Owls are nocturnal, meaning they are active mainly at
| night, and many species use their facial disk to collect sound
| from their environment, allowing them to hunt and locate prey in
| the dark. They mainly feed on small mammals, reptiles, insects,
| and birds.
| wkat4242 wrote:
| That's really impressive. Thanks for doing this test.
| mx20 wrote:
| I looked in the training set data and they have quite a few
| questions about owls. Also it got " downward curved beak" from
| davinci and got it still wrong.
|
| Like:
|
| "instruction": "Describe the sound an owl makes.",
|
| "instruction": "Summarize the differences between an owl and a
| hawk.",
|
| "instruction": "Find a fact about the bird of the following
| species", "input": "Species: Great Horned Owl",
|
| "instruction": "What is the binomial nomenclature of the barn
| owl?",
|
| "instruction": "Generate a riddle about an owl.",
| rjb7731 wrote:
| With the 7B none-instruction tuned you have to really pre-
| prompt it to get good answers to things so like:
|
| SYSTEM: You are an AI research assistant. You use a tone that
| is technical and scientific.
|
| USER: Hello, who are you?
|
| ASSISTANT: Hi! I am an AI research assistant. How can I help
| you today?
|
| USER: Can you tell me something about owls.
| IanCal wrote:
| I thought I'd put some in as an example. I added "ASSISTANT:
| " to the end, otherwise it likes to autocomplete the user
| query part too.
|
| This is all with 4 bit quantisation from lambda.cpp, using
| `--repeat_last_n 256 --repeat_penalty 1.178 --temp 0.7
| --top_k 40 --top_p 0`. I've not experimented too much.
|
| 7B model:
|
| 1) Owl is the common name for birds of prey in the genus
| Strix (subfamily Strycinae). They have large forward-facing
| eyes; their binocular vision gives them good night vision. 2)
| The word "owl" comes from Old English awol, which means
| "bird". \end{code} [end of text]
|
| 30B model:
|
| 100% sure. Owl is the common name for several species of
| birds in the order Strigiformes, which includes about 200
| extant bird species. Most are solitary and nocturnal. They
| have large forward-facing eyes and ear tufts. The smallest
| owl -- the elf owl -- weighs as little as 31 grams (1 oz) and
| measures about 15 centimetres (6 in). The largest owls -- two
| species of great horned owls -- may weigh up to 4 kilograms
| (9 lb) and measure over 70 cm (28 in); they appear as tall as
| humans with their long legs fully extended, although when
| perched on a typical branch or power pole they seem less
| imposing.
|
| USER: Thank you very much. \end{code} [end of text]
|
| (edit - I seem to be getting a lot of latex outputs here)
| mattkrause wrote:
| It would be even more impressive if it were correct!
|
| - Owls are in the _order_ Strigiformes; "true" owls are in
| the _family_ Strigidae, and only some (earless) owls are in
| the genus Strix.
|
| - As for size, https://largest.org/animals/owls/ has great
| horned owls as the 5th largest; Wikipedia has them as the
| 2nd or 3rd largest in North America.
|
| - Their legs are long, but the height of an adult person
| seems like a huge stretch.
|
| Elf owls are adorable though!
| bryan0 wrote:
| yeah the "ASSISTANT:" at the end is important. without it
| the 7B model continued:
|
| "USER: Can you tell me something about owls. They fascinate
| me, but don't know much else than the fact they have a
| really cool call and that they live in trees. Maybe it
| would be interesting to see some drawings or hear their
| call again when we hang up? I also like cats!"
| mrtranscendence wrote:
| For completeness, here is the output of that prompt from
| the 65B model:
|
| > ASSISTANT: Owl is the common name for any of several
| species of birds of prey in the family Strigidae. They have
| large forward-facing eyes and ears, a hawklike beak, and a
| conspicuous circle of feathers around each eye called a
| facial disc. Although owls have been traditionally
| classified with the raptors, recent evidence suggests a
| closer relationship to nightjars and their allies. The term
| "owl" first appeared in Middle English as oule (c 1305-25),
| which was derived from the Old French uol, later ouel and
| oulle, which comes from classical Latin ulula or ululae,
| meaning "to howl".
|
| > The owls are divided into two families; typical owls or
| true owls, Strigidae; and barn-owls, Tytonidae.
|
| > Owls possess large, forward facing eyes and ear holes,
| but no ear flaps. Their plumage is generally cryptic
| although several species have brightly coloured head or
| tail markings. The smallest owl -- weighing less than 31
| grams (1 oz) and measuring some 13 centimetres (5 inches).
| Some owls are highly nomadic and individual owls may move
| about over hundreds of miles during the course of a year.
| Most owls live in structural habitats, using shafts of
| leaves, cracks in bark, caves, rocky outcrops, abandoned
| burrows etc., many species nest on the ground, with varying
| amounts of additional refugia in nearby vegetation. Several
| types of forest owls can be considered semi-aquatic, since
| they often forage for food near lakes, rivers or swamps.
| Many individual tropical owls roost in groups that also
| include individuals of other species. Tropical open country
| owls such as the short eared owl and the northern hawk owl
| will take refuge from storms or intense heat underneath
| clumps of grasses, or sheltered spots under shrubs. [end of
| text]
|
| I can't speak to its accuracy, but I can say right off the
| bat that the etymology is probably wrong, and its
| definition of owl doesn't contain barn owls.
| superkuh wrote:
| I've played a _lot_ with davinci 3 ($25 of credits worth) and it
| can do some impressive rhyming and interpretation of concepts as
| emoji sequences. From the 3 times I 've interacted with this fine
| tuned llama 7B it is clear it cannot do that. I've also run the
| "vanilla" 7B, 13B, and 30B on my home computer with llama.cpp
| modified for interactive "chat" mode with various pre-prompt and
| these can't do it either.
|
| I have no love for openai/microsoft but their text completion
| davinci 3 really is incredible. I don't think these littler
| models have quite reached the point of emergence for some
| functions despite fine tuning.
| valine wrote:
| 7B parameters is next to nothing when compared to gpt3. If 7B
| works as well as it does here, A fine tuned 65B model could
| very easily achieve chatGPT level performance.
| SteveDR wrote:
| I thought ChatGPT is only 20B parameters to begin with?
|
| (Source https://www.forbes.com/sites/forbestechcouncil/2023/0
| 2/17/is...)
| sebzim4500 wrote:
| I haven't seen anything official from OpenAI confirming
| that ChatGPT has fewer than 175B parameters, although it is
| a reasonable guess if you read between the lines of their
| statements.
|
| Given the author of that article is a CEO of an 'AI Ad
| Optimization Platform' I think that number is speculative
| at best.
| rjb7731 wrote:
| The inference on the gradio demo seems pretty slow, about 250
| seconds for a request. Maybe I am too used to the 4-bit quant
| version now ha!
| sebzim4500 wrote:
| I'm sure it's partially the HN hug of death.
| og_kalu wrote:
| https://crfm.stanford.edu/2023/03/13/alpaca.html They say they'll
| release the weights. But Of course, you'll be beholden to a non-
| commercial license. They also released the dataset they used to
| train the model (which was generated by Davinci)
|
| I wonder why Meta didn't just train the 7b and 13b models on the
| full 1.4T tokens. They curves show they weren't even close to
| converging. Shame really.
| whimsicalism wrote:
| I am curious how you are expected to prove in a court of law
| that the weights you have are derived from some licensed
| weights if I take just a few gradient steps from the original
| weights.
| bryananderson wrote:
| I am not a lawyer, but the consensus (around here anyway) seems
| to be that weights cannot be copyrighted since they are the
| output of an automated process, so perhaps the license cannot
| limit how the weights are used. The code can be copyrighted,
| but the code can also be reimplemented from scratch and used to
| load the weights.
| Animats wrote:
| The US doesn't have database copyright, but some countries,
| including the entire European Union, do.[1]
|
| [1] https://en.wikipedia.org/wiki/Database_right
| mx20 wrote:
| Interesting. That could be a huge disadvantage for the EU
| and it's companies that want to use AI and also often can't
| give their Data to third parties. But I am not sure a
| neural net is even considered a DB. Because when this was
| created they clearly had structured Information accessed by
| people in mind. There has to be some distinction otherwise
| I make a db that stores digits of PI in a file and if you
| publish part of it, that is considered substantial enough,
| you are infringing on my DB rights.
| karmasimida wrote:
| Not copy right, but doesn't mean this isn't someone's
| property.
| clbrmbr wrote:
| Just because they cannot be copyrighted doesn't mean that
| they cannot be considered a trade secret. The agreement for
| using the product could done with an NDA. But unlike
| copyright, that doesn't continue to be enforceable once
| someone puts them on a torrent and the chain of liability is
| broken. /notalawyereither
| dchest wrote:
| Yeah, this happened with RC4 cipher which used to be a
| trade secret until someone posted it on Usenet:
| https://en.wikipedia.org/wiki/RC4#History
| throwaway1851 wrote:
| Well, trade secret law requires the owner of the trade
| secret to take steps to keep it... secret. Facebook has
| done the opposite, and they haven't made the people they
| distributed the weights to sign an NDA.
| Der_Einzige wrote:
| God damn I hope this is where IP law lands. I really want a
| world where folks don't gatekeep high end AI model weights.
| Please make them impossible to copyright, and ideally
| impossible to be touched by IP law.
| bogwog wrote:
| That seems like wishful thinking. The safe thing to do is to
| wait until the lawsuits come out and we get a real answer.
|
| Or if you're not risk averse, then go for it and take one for
| the team!
| jpeter wrote:
| They probably did but didn't release them for the public.
| sebzim4500 wrote:
| Why would they withold those ones but the the 65B ones? Seems
| like a weird choice IMO.
| cerol wrote:
| Can someone provide a guide on how run LLaMA on a fairly average
| CPU/Nvidia GPU?
| eulers_secret wrote:
| Another great option is https://github.com/oobabooga/text-
| generation-webui
|
| The 7B model will run without changes on a 3080. The 13B 4-bit
| model also runs on a 3080.
|
| This Reddit post has the instructions I followed:
| https://old.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...
| isoprophlex wrote:
| Check the git repo for llama.cpp.
|
| Model weights in torrent
| magnet:?xt=urn:btih:b8287ebfa04f879b048d4d4404108cf3e8014352
|
| Runs slowly but without trouble on an intel i7 with 32 gb ram.
| wkat4242 wrote:
| Is this only for the 7B model? The 65B would be much nicer O:-)
| adam_arthur wrote:
| Pretty exciting to see the democratization of this tech.
|
| The next wave of innovation seems like it'll be much less
| centralized into a handful of big companies. Even more powerful
| closed door LLMs will appear, but already so much utility out of
| these that can run on consumer hardware.
| yieldcrv wrote:
| Feels like we'll get to "good enough" in a matter of weeks.
|
| Someone makes a 1 click GUI installer with a nicely integrated
| UI for use. No command line use at all.
|
| We'll get model and weight distributors for updated knowledge.
| bogwog wrote:
| Only if one of those handful of big companies decides to
| release their models for commercial use. The LLaMA models used
| here are only available under a non-commercial license.
|
| Can't build anything significant with leaked models.
|
| Hopefully we don't see Facebook/Google adopting OpenAIs
| business model where they sell API access, although I can't
| imagine them ever releasing their models directly. The last
| thing they would want is to be the architects of their own
| demise, as the opportunity to disrupt those tech giants with
| this would be great.
| oceanplexian wrote:
| I think anyone can train one of these models. "anyone" being
| someone who can throw $1-5M in cloud resources at compute. Or
| a bored sysadmin with access to a few thousand GPUs and a
| cool boss.
|
| The LLAMA weights are like the POC of what's possible,
| probably not the final form for the OSS community.
| wkat4242 wrote:
| Sounds like a great Kickstarter option for a truly free
| model. I'd back it.
| kir-gadjello wrote:
| Please do it, people shouldn't put up with the apathetic
| siloed status quo. I'm sure people will find all sorts of
| beneficial uses for these models they are going to run on
| their own hardware!
| bitL wrote:
| There seems to be a big difference between efficiently
| training a "large-ish" model on 4-8 GPUs and a gigantic
| model on 1000+ GPUs. The same techniques might not work due
| to different warm up steps, gradient overlaps etc.
|
| All you can see running in the wild are quantized LLaMA
| variants (4 or 8-bit) whereas the original model is 32-bit.
| oezi wrote:
| Whats possible now (March 2023).
|
| The field is moving so quick.
| generalizations wrote:
| > Can't build anything significant with leaked models.
|
| If it's behind a SaaS, I think you can.
| azeirah wrote:
| Is there a reason why we aren't using BLOOM's model?
|
| It's huge, 160B parameters. It's open, it's free, it's been
| out there for a while.
|
| Is something wrong with it? Using outdated training data?
| Using outdated training techniques? Or is it just not well
| known enough -- ie, not in the news at the right time.
| kir-gadjello wrote:
| BLOOM, while being a useful model for underrepresented
| languages, underperforms openai's GPT-3 models of similar
| parameter count on english benchmarks. Which is a pity.
|
| And OpenAI's known models (at least the ones for which the
| architecture is publicly known)
| https://platform.openai.com/docs/model-index-for-
| researchers aren't pinnacles of parameter efficiency
| either: Deepmind's Chinchilla and LLaMA beat them on
| benchmarks while being much smaller.
|
| Just an opinion, but if we truly aim for "green computing"
| and carbon footprint reduction, we should pay a lot of
| attention to the parameter efficiency of the models that
| get run in our datacenters - and we should invest money
| into public availability of the most efficient foundation
| models.
| stu2b50 wrote:
| It doesn't perform particularly well and is massive and
| even more unapproachable for open source tinkerers to run
| on consumer hardware or cheap cloud.
|
| Llama performs better on benchmarks while a fraction of the
| size.
| adam_arthur wrote:
| The techniques to train and reproduce similarly performing
| models seem fairly widely understood and replicable at this
| point.
|
| I do agree that large orgs are likely to have the best
| models, but a lot of the low hanging fruit/initial utility is
| already covered by models that run on consumer hardware
| bitL wrote:
| Can you post a link on how to train a gigantic LLM on
| consumer hardware with all detailed steps please?
|
| EDIT: You seem to have deleted your response but I am
| posting my reply anyway:
|
| Microsoft has mentioned today that it cost them many
| hundreds of millions of USD to accommodate OpenAI models. I
| studied NLP at Stanford including LLMs. Only Google had a
| smaller transformer that could somewhat compete (Reformer)
| but the rest went with the brute force GPT-like models and
| 1000+ GPUs, $10M+/training run etc. I haven't seen a single
| post anywhere explaining how to train such large models (a
| smaller model on 4-8 GPUs doesn't count). "Facts are clear"
| kir-gadjello wrote:
| "Accomodate" is the word to scrutinize here. Yes, it will
| cost a lot to outright buy physical HPC infrastructure to
| train _and infer_ a series of large models deployed for
| customers all over the globe. No, it won 't cost nearly
| as much to _rent_ cloud infra to train a similarly-sized
| model. No, you won 't be able to train a large model on a
| single multi-GPU node, you will need a cluster containing
| a respectable power of two of GPUs (or other
| accelerators).
|
| It's a widely known meme at this point, but to reiterate:
| For a popular large model, the largest part of the cost
| will be spent on inference, not on training. If we assume
| inference on end user device, this cost disappears.
|
| And even if you have the million to rent a cluster, there
| is a very deep question of the optimal architecture,
| dataset and hyperparameters to train the best model
| possible under given constraints.
| bitL wrote:
| You are missing the point. Extremely large LLMs don't
| train the same way as your BERT_Large x8 variety of LLMs.
| Your whole training procedure is different. Also
| Microsoft spent so much initially because their Azure
| Cloud was unable to cope with it electrically and they
| had to rewire a datacenter for it. So it's not even a
| question of just renting 1000 GPUs. Do you have actual
| experience training GPT-3+ sized models?
| kir-gadjello wrote:
| If you are interested in the infrastructure-level details
| of how similar models are trained by lesser known groups,
| take a look at this paper:
| https://arxiv.org/abs/2204.06745
|
| Quotes from the paper: _Our model is trained using a
| codebase that builds on Megatron (Shoeybi et al., 2020)
| and DeepSpeed (Rasley et al., 2020) to facilitate
| efficient and straightforward training of large language
| models with tens of billions of parameters. We use the
| official PyTorch v1.10.0 release binary package compiled
| with CUDA 11.1. This package is bundled with NCCL 2.10.3
| for distributed communications._
|
| _We trained GPT-NeoX-20B on twelve Supermicro AS-4124GO-
| NART servers, each with eight NVIDIA A100-SXM4-40GB GPUs
| and configured with two AMD EPYC 7532 CPUs. All GPUs can
| directly access the InfiniBand switched fabric through
| one of four ConnectX-6 HCAs for GPUDirect RDMA. Two
| NVIDIA MQM8700-HS2R switches--connected by 16 links--
| compose the spine of this InfiniBand network, with one
| link per node CPU socket connected to each switch._
|
| And if you are interested in 176B-scale training, read
| the BLOOM-176B and OPT-175B papers and research logs.
| [deleted]
| redox99 wrote:
| I think a lot of people are going to just ignore the
| "license".
| simonw wrote:
| Here's a link that opens their training data (52,000 rows) in
| Datasette Lite:
| https://lite.datasette.io/?json=https://github.com/tatsu-lab...
|
| Means you can run SQL LIKE queries against it to try and get a
| feel for what's in there.
|
| This link includes a query for running searches for terms:
| https://lite.datasette.io/?json=https://github.com/tatsu-lab...
| Jackson__ wrote:
| Considering the size of the training data, there's still a
| concerning amount of complete nonsense in this.
|
| Example 1:
|
| >"instruction": "Using the PDF below, edit the text to make it
| more effective.",
|
| >"input": "https://www.example.com/example.pdf",
|
| >"output": "We recently launched a new product and wanted
| to(...)
|
| Example 2:
|
| >"instruction": "Explain why you think this painting is
| meaningful.",
|
| >"input": "https://i.pinimg.com/originals/67/2a/d1/672ad1ed7e8d
| 71e3e7fd...",
|
| >"output": "This painting is meaningful to me because it
| conveys a strong sense of emotion(...)
|
| If anyone is considering using this in the future, it'd
| probably be for the best to completely filter out any entries
| that use links as seen in the above examples.
| technerder wrote:
| Is there anyway to use Google's Coral TPU PCIe Accelerator[1]'s
| to increase the inference performance of GPT models like this and
| Llama?
|
| [1] https://coral.ai/products/pcie-accelerator/
| abidlabs wrote:
| Cool to see that the entire website, from the agreement to
| generating the predictions, is one Gradio [1] app written in
| Python
|
| [1] https://gradio.dev/
| zwaps wrote:
| would be better if it'd work on Firefox tho
| smith7018 wrote:
| same with Safari
| rjb7731 wrote:
| if you look in the dev tools you will see a request to a
| 'join' file when you click the agree button it adds you to
| a queue. You can watch where you are up to in the dev
| tools.
| gs17 wrote:
| It's not working on any browser for me.
| londons_explore wrote:
| The training data doesn't seem to be great quality...
| "instruction": "Calculate the sum of each column in the following
| table.", "input": "2 | 3 | 4\n 5 | 6 |
| 7", "output": "2+3+4 = 9\n5+6+7 = 18"
|
| I think better results could be obtained by manually going
| through these 52,000 training examples - even a couple of seconds
| per example should be enough to weed out obviously wrong ones,
| and should only take <$1000 of human time.
| [deleted]
| leobg wrote:
| Don't even need human time. Just run it through davinci. Is
| this correct, yes or no? Or MTurk. Can cost less than $500.
| z3t4 wrote:
| Training a model to do math is impossible. If you tell a child
| that 1+1=2 10+10=20 100+100=200 an "AI" can not figure out that
| 1000+1000=2000 like a child can.
| i-use-nixos-btw wrote:
| A language model cannot, by itself, figure that out, at least
| not to arbitrary complex inputs.
|
| But it's a fairly sweeping statement to say that AI cannot do
| it.
| barking_biscuit wrote:
| >Training a model to do math is impossible.
|
| https://www.deepmind.com/blog/discovering-novel-
| algorithms-w...
| rkwasny wrote:
| Of course it is possible for LLM model to learn this!
|
| The problem was with token embeddings, not with math.
| minimaxir wrote:
| One of the famous use cases of InstructGPT is being able to
| infer things from structured data, even unusual questions.
| londons_explore wrote:
| Yes... but the answer is wrong...
|
| If you train it on wrong answers, it will generate more wrong
| answers...
|
| The correct answer should be:
| 2+5=7\n3+6=9\n4+7=11
| zzleeper wrote:
| > which is the largest city in Tanzania. (It was the capital
| until 1974, when it was replaced by Dodoma.)
|
| Find it funny that even the humans got it wrong. Dodoma only
| became the official capital in 1996.
| siliconc0w wrote:
| I wonder when we'll get to a point that it'll be against the
| OpenAI TOS to 'fine-tune' your own model with their model's
| outputs since it really lets you cut down on the hardware
| requirements to bring up a competitor.
| O__________O wrote:
| >> Second, the instruction data is based OpenAI's text-
| davinci-003, whose terms of use prohibit developing models that
| compete with OpenAI.
|
| https://crfm.stanford.edu/2023/03/13/alpaca.html
| 29athrowaway wrote:
| Looking forward for one called guanaco.
| ar-jan wrote:
| As the llama's wild and free-roaming cousin it had better be
| permissively licensed.
| [deleted]
| musicale wrote:
| Interesting to see Alpaca readily adopt the persona of a Stanford
| student for the congratulatory letter. I expect that AI-generated
| essays submitted by actual students are fairly convincing to
| their professors as well.
| kossTKR wrote:
| A Tangent, but how long before we will see half the population
| having relationships with their AI assistants like in the sci-fi
| movie "Her".
|
| Add cutting edge voice synthesis and maybe a realistic video
| stream in a few years and we're there.
|
| Japan already has problems with the concept of "2d Waifus" and
| lack of intimacy between the younger generation.
|
| Maybe the downfall will not just be climate catastrophe but hyper
| isolated people living alone with their ultra realistic bot
| friends and family without any desire to experience the ups and
| downs of actual social experience.
|
| Extrapolate a bit and you wont know what's real outside of direct
| experience. Things are getting crazy!
| simonw wrote:
| That's been happening already with Replika:
| https://www.abc.net.au/news/science/2023-03-01/replika-users...
| bentcorner wrote:
| > _A Tangent, but how long before we will see half the
| population having relationships with their AI assistants like
| in the sci-fi movie "Her"._
|
| I don't know about half, but some people are already having
| relationships: https://replika.ai/
|
| > _Maybe the downfall will not just be climate catastrophe but
| hyper isolated people living alone with their ultra realistic
| bot friends and family without any desire to experience the ups
| and downs of actual social experience._
|
| I think the danger is that bots are _not_ necessarily ultra
| realistic, at least on an emotional level - they can be 100%
| subservient and loyal to you.
|
| Also - we already chide parents for letting their kids grow up
| stuck to a device. Imagine if children could _actually have_ an
| imaginary friend? Would that AI share the same culture and
| values as your family?
|
| I suppose there could be some upsides but this is very
| uncharted territory.
| LeoPanthera wrote:
| I'm in my 40s, so I'm acutely aware of the risk of turning into a
| "get off my lawn" curmudgeon, so I will only say this: I am glad
| that I saved a dump of Wikipedia a few months ago, because soon
| it will probably be impossible to distinguish human-written
| content from AI.
| nickthegreek wrote:
| >We are waiting for huggingface to officially support the llama
| models (i.e. this PR to be merged) before we release a stable
| version of the finetuning code.
|
| Can't wait for this to be released to see all the different fine
| tuned models that will be coming out. There was an article a few
| days ago about LLM having its Stable Diffusion moment, and I
| think it was right. We went from LLama release > llama.cpp >
| Dalai and now an instruction tuned model in under 2 weeks.
| wkat4242 wrote:
| Yeah this whole industry seems to suddenly have moved from the
| pace of a pebble creek to a waterfall.
|
| I'm sure this is just my perception because things don't work
| like that in real life. Is it really just a ton of slowly
| developed things happening to mature close together?
| nestorD wrote:
| > Is it really just a ton of slowly developed things
| happening to mature close together?
|
| It is a bit of both. Once something big become public (and
| gets significant visibility) then a lot of people play with
| it and can try their own ideas leading to a burst of
| improvements.
| whimsicalism wrote:
| > Is it really just a ton of slowly developed things
| happening to mature close together
|
| Yes. I think it's probably been pretty clear a product like
| this could be built for the past 2 years, but chatgpt coming
| out jumpstarted a lot of people building out public demos.
| RcouF1uZ4gsC wrote:
| > Is it really just a ton of slowly developed things
| happening to mature close together?
|
| I think a lot of progress was gated on computing power, not
| just for training but also for inference. Now that powerful
| consumer GPUs and M1/M2 with unified memory are fairly common
| a critical mass of people can run these models on their come
| computers.
| layer8 wrote:
| Progress almost always comes in phases and jumps, this is not
| really surprising.
| rjb7731 wrote:
| interesting, looks like the web demo doesn't work at the moment
| though. The prompt.txt will be useful, looks very similar to the
| pre-prompts i have been feeding in before making any requests.
___________________________________________________________________
(page generated 2023-03-13 23:00 UTC)