[HN Gopher] How to Finetune GPT-Like Large Language Models on a ...
___________________________________________________________________
How to Finetune GPT-Like Large Language Models on a Custom Dataset
Author : T-A
Score : 380 points
Date : 2023-05-25 10:06 UTC (12 hours ago)
(HTM) web link (lightning.ai)
(TXT) w3m dump (lightning.ai)
| sandGorgon wrote:
| has anyone here used EasyLM ? it seems the most used for the best
| finetuned models out there.
| zhwu wrote:
| It seems training the Vicuna on custom dataset could be quite
| easy as well, according to the following:
| https://github.com/skypilot-org/skypilot/tree/master/llm/vic...
| quickthrower2 wrote:
| When is fine tuning worth it, rather than just prompt
| engineering?
| messe wrote:
| When you're starting to run into context limits.
| tstrimple wrote:
| From what I've seen, it's when embeddings get too large for the
| token limit or the embeddings drive the cost up too much
| because you're always operating near the max token limit. In
| those cases, it may be worth the up front training cost and
| slightly higher per-token cost to dramatically reduce the
| amount of tokens in the average request. If you're building a
| higher throughput solution, the difference in cost can be quite
| large.
| snovv_crash wrote:
| If you want to teach it eg. all of the text in your private
| training manuals and internal documentation, which wouldn't fit
| in the input token size.
| oddthink wrote:
| It's worth it whenever you have a reasonable amount of training
| data. You can get substantial quality improvements
| automatically. Unless you're doing some kind of prompt-
| optimization, prompt-tuning is a lot of random guessing and
| trial-and-error. It's also most necessary when you have a
| smaller base model, as opposed to one of the big ones.
| heliophobicdude wrote:
| I think these are two very separate concepts.
|
| What we are mostly seeing when it comes to fine-tuning is
| making a model promptable. Models like LLaMA or the original
| GPT3 weren't promptable. They were fine-tuned with
| demonstration data that looks like a prompt input, prompt
| output.
|
| See below: { "instruction": "What would be the output of the
| following JavaScript snippet?", "input": "let area = 6 *
| 5;\nlet radius = area / 3.14;", "output": "The output of the
| JavaScript snippet is the radius, which is 1.91." }, [1]
|
| Prompt engineering is really just carefully designing what
| inputs and outputs on a prompt-ready model work best.
|
| I highly recommend skimming this RLHF article and looking for
| the parts where it talks about demonstration data [2]
|
| 1:
| https://github.com/sahil280114/codealpaca/blob/master/data/c...
|
| 2: https://huyenchip.com/2023/05/02/rlhf.html
| baobabKoodaa wrote:
| Prompt engineering and fine tuning are in many cases
| alternative ways to achieve the same goal. You claim that the
| "original GPT3" wasn't promptable. I'm unsure which version
| you refer to, but I'm guessing you refer to text-davinci-003
| and it was definitely promptable. For one app I used prompt
| engineering to make it behave like a spirit talking through a
| ouija board. For another, I used prompt engineering to make
| it act like a dystopian search engine from the future. So,
| yeah, it's promptable.
| quickthrower2 wrote:
| Thanks for link 2 - it is worth a proper read! Read half of
| it already and it is very interesting and useful for
| understanding this.
| heliophobicdude wrote:
| Cheers!
| nomagicbullet wrote:
| Is there are Dreambooth equivalent for fine-tuning ChatGPT as
| there is for Stable Diffusion? I have to imagine that if we can
| add custom data to a DL text-to-image model, we should be able to
| do the same with a text-to-text one.
|
| Edit to add: There are a number of Google Colabs for fine-tuning
| SD and I wonder if there are (or if it is technically feasible)
| to accomplish the same with other txt2txt models.
| SparkyMcUnicorn wrote:
| These aren't for ChatGPT, but work on LLaMA, Vicuna, etc.
|
| https://github.com/oobabooga/text-generation-webui/blob/main...
|
| https://github.com/zetavg/LLaMA-LoRA-Tuner
|
| https://github.com/h2oai/h2o-llmstudio
|
| https://github.com/rhulha/lora
| a5huynh wrote:
| If you're running the text-generation-webui
| (https://github.com/oobabooga/text-generation-webui) it has the
| ability to train LoRAs.
|
| It'll require a beefy GPU but I've seen some fun examples like
| someone training a LoRA on Skyrim books.
| artembugara wrote:
| Have a question to the Generative AI experts here.
|
| So, I can use smthg like GPT-4 to label data and then use that as
| a train set for my own LLM, right?
|
| EDIT: adding this from OpenAI Restriction TOS: "(iii) use output
| from the Services to develop models that compete with OpenAI;"
| montenegrohugo wrote:
| Yup, totally. This is a form of knowledge distillation. Openai,
| or other foundational model providers, can't really do anything
| about it.
| cookieperson wrote:
| Well they can sue you and bankrupt you by delaying trial for
| a decade. That's how the US patent system works anyways...
| sanxiyn wrote:
| Sue on what grounds? It will be quickly dismissed.
| foobarbecue wrote:
| Is "ca" "can" or "can't"?
| artembugara wrote:
| can
| wodenokoto wrote:
| It is my understanding that this is how "alignment" works.
|
| That is, openAI paid people to chat with their LLM to fine tune
| it and then other LLMs use chatgpt to generate training data to
| align their models.
| visarga wrote:
| There are three ways
|
| 1. make your own RLHF dataset - like OpenAI and Open
| Assistant
|
| 2. exfiltrate data from a bigger/better LLM - Vicuna & family
|
| 3. use your pre-trained LLM to generate RLAIF data, no
| leeching - ConstitutionalAI, based on a set of rules instead
| of labelling examples
| cubefox wrote:
| I wonder whether these approaches fit into the above
| categories:
|
| https://arxiv.org/abs/2305.13735
|
| https://arxiv.org/abs/2305.11206
| notpublic wrote:
| not an AI expert but from a talk I recently heard... if there
| is a mismatch in training data between the "teacher" LLM and
| "student" LLM, you risk teaching the student to hallucinate or
| to ignore information
| chaxor wrote:
| Yes, and in fact that's the best method available if you want
| good performance. I would suggest using a local open source
| model to do this however, to cut down on costs and make it far
| simpler to deal with than the unwieldy OpenAI systems.
|
| https://arxiv.org/pdf/2305.02301.pdf
| moffkalast wrote:
| > I can use smthg like GPT-4 to label data and then use that as
| a train set for my own LLM, right?
|
| Yes, almost all improved LLama models are tuned exactly that
| way (trained on examples of questions and answers from say GPT
| 4). If OpenAI stole copyrighted works to train their models it
| is morally fair game to do the same to them regardless of their
| TOS. It's not like they can prove it anyway.
|
| Plus there's the other point where they also say that
| everything generated by their models is public domain, so which
| one is it eh?
| sirsinsalot wrote:
| This ... but we all know business is corrupt.
|
| The current attempts to spur on regulation by OpenAI is moat
| building
| Fgehono wrote:
| Because by training it they created something new.
|
| I don't mind just making a point.
|
| But I don't think they mind. I don't believe that this type
| of model training is able to be bleeding edge which should
| guarantee that openai has enough motivation to continue the
| development and having a healthy competition
| fnordpiglet wrote:
| Use of copyrighted material in such a way that it's
| aggregated into statistical properties is almost certainly
| fair use. Use of the model to produce reproductions of
| copyrighted material then consuming or distributing it is
| almost certainly violating the copyright. But it was the
| facsimile of the material that's the violation, not the
| abstract use of it to generate an aggregate model.
| tsunamifury wrote:
| You understand these things have a very very wide
| interpretation scope here that has yet to be tested in
| court. I wouldn't make these statements so confidently as
| courts tend to reinterpret the law significantly for the
| balance of societal factors when serious technology changes
| occur.
| fnordpiglet wrote:
| This is true - afaik there's been no specific rulings on
| whether training models on copyright material is a
| violation. But to my mind it harkens back to stuff like
| xerox and such where the tool itself isn't the violating
| thing it's the use of the tool. Likewise, derivative
| works are often largely reproductions with minor
| variations and are protected under fair use. A model that
| takes enormous amounts of data and distills it into a
| tiny vector representation way below the information
| theoretic levels for any meaningful fidelity and mixes
| and overlaps data in a way that the original data isn't
| plausibly stored in the model... I'm definitely not going
| to wager my life that's fair use, but I would wager my
| company on it.
| tsunamifury wrote:
| In the history of media law I've seen judged lean into
| whatever interpretation balances the ecosystem more than
| what is "literally the law". The law is meant to serve
| people not the other way around. I hope judges will
| understand the contribution and theft can't just be "haha
| fuck humanity love, openAI"
| fnordpiglet wrote:
| Ok, what about the open source and research models? I
| wouldn't wager much on openai keeping a lead
| indefinitely. Certainly not to establish case law on
| what's a pretty new technology (at least in its current
| use)
| jjoonathan wrote:
| Yes, laws are about politics and dispute resolution more
| than reasoning or correctness. Focusing on the pure logic
| is a trap for the computationally inclined.
| itake wrote:
| AI generated work is not copyright-able. I guess the
| courts later could disagree though.
|
| https://www.copyright.gov/ai/
| belter wrote:
| If the AI generates a new Eric Clapton album, with the
| same similar voice and guitar playing style?
| itake wrote:
| your example doesn't have to be AI generated. Human
| cover-bands play Song X in the style of Y all the time.
| jrm4 wrote:
| I'm a lawyer so one should never break the law.
|
| Nonethless, I can observe and predict that non-consensual
| "open sourcing" of these models would likely end up probably
| the best and safest way to do all of this stuff.
| sp332 wrote:
| It's against the terms of service to do the generation, but
| the generated text is not copyrighted. Those are different
| things.
| cameldrv wrote:
| GPT-4 is trained on a large number of web pages, some of
| which will have had their own terms of service.
| svaha1728 wrote:
| Not only web sites, full books from scribd and other
| sources.
| asah wrote:
| see LinkedIn vs HiQ (which HiQ won) covering fair use of
| logged-out web pages.
| snickmy wrote:
| Indeed, fine tuning with either synthetic data (as you are
| proposing) or human review works like that. you can read more
| here: https://huggingface.co/blog/rlhf
| fallingmeat wrote:
| That is against their ToS though if you use your new LLM
| commercially.
| artembugara wrote:
| As far as I remember, I fully own all the right to the output
| of OpenAI (for example).
| dingledork69 wrote:
| I wonder how they reconcile naming themselves "Open"AI,
| telling people that generated works can be used however
| they please, except for training a potential competitor.
| ramesh1994 wrote:
| It prohibits anything that competes with OpenAI services i.e
| as long as you're not literally providing an LLM API
| commercially you should be fine
| bagels wrote:
| Does it compete with them if you stop paying for their API?
| [deleted]
| vlovich123 wrote:
| And yet they trained theirs on commercial content on the
| internet. If that's legal I doubt their argument holds up in
| court right?
| dragonwriter wrote:
| They trained on publicly-available (no signup with TOS
| agreement) data, on the theory that training is fair use.
|
| You signed up and agreed to their TOS to use GPT-4.
|
| The legal situations are not similar.
|
| OTOH, lots of people _are_ openly using GPT-4 in one way or
| another to develop models, though they might generally be
| at arm's length from people intending to sell services.
| flangola7 wrote:
| > They trained on publicly-available (no signup with TOS
| agreement) data, on the theory that training is fair use.
|
| They openly state they used thousands of books from a
| pirate site as a training source. Go look up the datasets
| listed in the GPT-3 paper.
| snovv_crash wrote:
| So set up a shell company that uses GPT4 to make public
| domain examples of what RLHF data would look like, and
| then the parent company takes that data afterwards since
| it's public domain. Shell company didn't break TOS.
| sanxiyn wrote:
| Of course it will hold up in court, it's their service and
| their terms of service.
| pmoriarty wrote:
| So what are they going to do about it?
| jstummbillig wrote:
| That escalated quickly.
| fallingmeat wrote:
| Great question! I don't know the end game there. Maybe if
| they suspected their model was used they would sue, and in
| discovery find you used their model for training?
| visarga wrote:
| Maybe we don't need to worry, OpenLLaMA is under training
| right now. It will be the commercial version of LLaMA.
|
| > Update 05/22/2023
|
| > We are happy to release our 700B token checkpoint for
| the OpenLLaMA 7B model and 600B token checkpoint for the
| 3B model. We've also updated the evaluation results. We
| expect the full 1T token training run to finish at the
| end of this week.
|
| https://github.com/openlm-research/open_llama
|
| So we could develop on LLaMA for now and switch to
| OpenLLaMA later.
| dragonwriter wrote:
| > So what are they going to do about it?
|
| If they think they can prove you used it to develop a
| competing service, sue you for breaking the TOS and recover
| the greater of the harm it did to their business or the
| amount of your profits from the service that are due to the
| uae of GPT-4 in violation of the agreement.
| pmoriarty wrote:
| Have companies managed to get awarded damages in lawsuits
| against their customers who merely broke their terms of
| service?
|
| Is there existing case law here?
| sanxiyn wrote:
| They can terminate your account.
| postsantum wrote:
| MS lawyers have a good track record at sending out those
| scary cease&desist letters
| sanxiyn wrote:
| I don't think that works. LLM-generated contents are not
| copyrightable.
| dragonwriter wrote:
| Breach of contract for violating the TOS agreed to when
| signinf uo for the service doesn't depend on copyright.
| aix1 wrote:
| What I don't understand - is there anything that would
| prevent Alice from publishing ChatGPT prompts and outputs
| for anyone to use, with no T&C attached?
|
| Once Alice has done that, is there anything to prevent
| Bob, who has never agreed to ChatGPT ToS, to use those
| prompts and outputs to train his own models to compete
| with OpenAI's?
|
| (Purely from a contractual/legal/IP angle rather than
| ML/technical.)
| nightski wrote:
| Right but cease and desist usually relates to
| intellectual property or copyright matters, typically not
| TOS violations. Please correct me if I am mistaken.
| dragonwriter wrote:
| Cease and desist can be used for any issues where the
| person or entity issuing the C&D thinks they have a legal
| right that is being violated and wants to put the
| violator on notice in the hopes of securing a change in
| behavior short of legal action.
| [deleted]
| pmoriarty wrote:
| Is a terms of service considered a contract?
| bottled_poe wrote:
| Nothing until it's worth their while.
| hospitalJail wrote:
| Has anyone tried to use this?
|
| The guide obv didn't make usable code and the github looks nearly
| unrelated.
|
| I'm somewhat surprised there isnt a parameter for 'input_data'
| and 'output_data' and it returns a trained model. I can't figure
| out why there is so much boilerplate when that stuff could be
| contained as parameters.
| swalsh wrote:
| How does this compare to fine tuning something like BERT?
| theaniketmaurya wrote:
| I would say similar since the building block is the transformer
| for both. In this blog post, the fine-tuning strategy used is
| Adapter. It basically adds a learnable layer to the Transformer
| block.
| jpe90 wrote:
| Would it be feasible to fine-tune a large, capable model (like
| the recent LIMA) on the source code (and maybe a few high quality
| libraries) of a niche language, such that it's much better at
| helping you write and understand it?
|
| Imagine how many doors it would open if you could fine-tune
| models capable of writing language bindings for you and keeping
| them up to date.
| tazjin wrote:
| Totally. GPT-4 can already do this, untuned, on niche languages
| and libraries. One of the main problems is still that you don't
| know when it's hallucinating a function or whatever though.
| Obscurity4340 wrote:
| This looks like the Orion broswer logo
| nico wrote:
| What is the main difference between training and fine tuning?
|
| Can you start with a model trained only in producing the letter
| a, and then fine tune it to learn b, then c, then words,
| sentences, etc?
| swalsh wrote:
| Not an expert, but my high level understanding is this: If a
| model is a set of inputs, some middle layers, and a set of
| outputs. Fine tuning concentrates on only the output layers.
|
| Useful for taking a generic model with a base level of
| knowledge, and tuning it so the output is more useful for an
| application specific use case.
| ajb117 wrote:
| I think that's more in line with transfer learning, a variant
| of fine-tuning. If I'm reading this article correctly,
| they're fine-tuning the LMs end-to-end.
| worldsayshi wrote:
| Yeah, since fine tuning seems to be so much more cheaper than
| training why haven't OpenAI fine tuned ChatGPT on data past
| 2021?
| heliophobicdude wrote:
| One argument is that it can contaminate training data from
| output of itself or other models.
|
| We have already documented evidence of the effect of this. In
| the GPT-4 technical report [1], they reported contamination
| of humaneval data in the training data.
|
| They did measure against a "non-contaminated" training set
| but no idea if that can still be trusted.
|
| Why would this matter? We can have seemingly strong
| benchmarks for containments but measures poorly against new
| and quarantined information. Classic over fitting.
|
| Another argument is that data being put out there could very
| much be wrong and the amounts of it amplified by other
| models. Take a look at this sample of demonstration data for
| codealpaca [2]. Not only is its output wrong but bad
| practices like,making up a random computation without it
| having access to a place to run a calculation, teaches the
| model these type of responses are ok.
|
| { "instruction": "What would be the output of the following
| JavaScript snippet?", "input": "let area = 6 * 5;\nlet radius
| = area / 3.14;", "output": "The output of the JavaScript
| snippet is the radius, which is 1.91." }
|
| 1: https://cdn.openai.com/papers/gpt-4.pdf 2: https://github.
| com/sahil280114/codealpaca/commit/0d265112c70...
| ajb117 wrote:
| My guess is that it's because they've already done RLHF on
| top of the standard next token prediction. In other words,
| they can't cheaply fine tune ChatGPT without undoing the RLHF
| objective by training on next token prediction with post-2021
| data, and then retraining with RLHF to make sure it still
| gives good human-like output.
|
| I mention the "undoing RLHF" since it's not uncommon for
| fine-tuned models to increase in error in the original
| training objective after being fine-tuned with a different
| one. I think people saw this happen in BERT.
|
| Also ChatGPT is almost certainly huge.
| londons_explore wrote:
| Ideally you train a model right to begin with, and no fine
| tuning is necessary.
|
| However, sometimes you can't do that. For example, perhaps you
| want your model to always talk like a pirate, but you don't
| have billions of words spoken like a pirate to train on.
|
| So the next best thing is to train a model on all english text
| (which you have lots of), and then _finetune_ on your smaller
| dataset of pirate speech.
|
| Finetuning is simply more training, but with a different
| dataset and often a different learning rate.
|
| Typically, finetuning uses far far far less data and compute,
| and can be done by individuals with a home PC, whereas training
| a large language model from scratch is in the $1M - $1B range.
| [deleted]
| stoptrlling wrote:
| Anyone knows the computational cost of training with these LoRa
| designs? Given that we are talking about rates of token per
| seconds, it seems training a bigger dataset could be extremely
| expensive
| t-vi wrote:
| The adapter and LoRa have a drastically fewer parameters, so
| one might expect that forward + backward is roughly 2x the cost
| of forward.
|
| Then (as far as I know), in contrast to generation, training is
| done on the entire output of the transformer (so all tokens of
| the full input) rather than serially token-by-token (in the RNN
| days, this was called teacher-forcing), so that may give you a
| significant boost in the tokens per second rate over
| generation.
| akrymski wrote:
| These NanoGPT based models are great, thank you for contributing
| to OS. Would love to see this ported to CPUs ala llama.cpp. Any
| plans in that direction?
| mercurialsolo wrote:
| While the fine-tuning pipeline is fairly straightforward for
| tuning and building custom models, the RLHF pipeline doesn't look
| to be as straightforward. Creating a dataset for RLHF seems like
| a fairly labour intensive exercise especially if your model is
| tuned to do work like code generation ?
|
| What about the Replit Ghostwriter? Did it have a RLHF phase?
| slenocchio wrote:
| Can someone explain why I'd want to use fine-tuning instead of a
| vector database (or some other way of storing data/context)?
| morgango wrote:
| I asked ChatGPT this question, and asked it to simplify as much
| as possible.
|
| Fine-tuned Models: Imagine you have a super-smart robot that
| can talk about anything. But you want it to be really good at
| talking about, say, dinosaurs. So, you teach it more about
| dinosaurs specifically. That's what fine-tuning is - you're
| teaching the robot (or model) to be really good at a specific
| topic.
|
| Vector Databases and Embeddings with LLM: This might be a
| little tricky, but let's think of it this way. Imagine you have
| a huge library of books and you want to find information on a
| specific topic, say, ancient Egypt. Now, instead of reading
| every book, you have a magical index that can tell you which
| books talk about ancient Egypt. This index is created by
| magically converting each book into a "summary dot" (that's the
| embedding). When you ask about ancient Egypt, your question is
| also converted into a "summary dot". Then, the magical index
| finds the books (or "summary dots") that are most similar to
| your question. That's how the vector database and embeddings
| work.
|
| So, if you want your super-smart robot to be really good at one
| specific topic, you use fine-tuning. But if you want it to
| quickly find information from a huge library of knowledge, you
| use vector databases and embeddings. Sometimes, you might even
| use both for different parts of the same task!
| mgfist wrote:
| First reason that comes to mind is you can make much smaller
| models, which helps with latency, cost and may enable you to
| run the model locally.
| pid-1 wrote:
| I've been playing with using documents as OpenAI embeddings for
| the past weeks and, at least for my use case, the results are
| meh. It seems sometimes just using context is not enough.
|
| My next step is to play with fine tunning, but I have no
| results to report yet.
| akiselev wrote:
| Try using InstructXL for embeddings. It's got a more complex
| prompt structure for generating embeddings which might be
| more useful
| deforciant wrote:
| have you tried other models to generate embeddings? I am
| going to that direction too to create an additional layer of
| helpers for search. Also, thinking if the document is not too
| big, it might fit into the initial context with the prompt
| santiagobasulto wrote:
| I'd be very interested in knowing the outcome. Do you blog
| anywhere (or post on social)?
| oddthink wrote:
| Wouldn't a vector database just get you nearest-neighbors on
| the embeddings? How would that answer a generative or
| extractive question? I can see it might get you sentiment, but
| would it help with "tell me all the places that are mentioned
| in this review"?
| superchink wrote:
| i think the point is that you use the vector database to
| locate the relevant context to pass to the LLM for question
| answering. here's an end-to-end example:
|
| https://www.dbdemos.ai/demo.html?demoName=llm-dolly-chatbot
| heliophobicdude wrote:
| Assuming you would want to fine-tune over a codebase or set of
| documents, I would argue vector databases and fine-tuning are
| completely different tools.
|
| I would strongly recommend against fine-tuning over a set of
| documents as this is a very lossy information system retrieval
| system. LLMs are not well suited for information retrieval like
| databases and search engines.
|
| The applications of fine-tuning that we are seeing have a lot
| of success is making completion models like LLaMA or original
| GPT3 become prompt-able. In essence, prompt-tuning or
| instruction-tuning. That is, giving it the ability to respond
| with a user prompt, llm output chat interface.
|
| Vector databases, for now, are a great way to store mappings of
| embeddings of documents with the documents themselves for
| relevant-document information retrieval.
|
| I would highly recommend skimming this RLHF paper for how
| demonstration data was used to make a model prompt-able [1].
| Keep in mind RLHF is another concept all together and we might
| be seeing a revolution where it might become optional (thanks
| to LIMA)!
|
| 1: https://huyenchip.com/2023/05/02/rlhf.html
| mountainriver wrote:
| I think it probably works a lot better, but I would love to see
| some research validating this
| chadash wrote:
| I've read in a few places that it actually works worse in
| most cases. Much better to put the context in your prompt.
| CuriouslyC wrote:
| Fine tuning + context will outperform context alone, and
| it's cheaper to burn cycles fine tuning then use a smaller
| context than to use a larger context in production.
| Guillaume86 wrote:
| Fine tuning + same context will probably outperform
| context alone, but if you use a smaller context that does
| not seem to work that well as GP stated.
| swalsh wrote:
| Fine Tuning = Output
|
| Embeddings = Input
|
| Fine-tuning is like a chef modifying a general pizza recipe to
| perfect a specific pizza, such as Neapolitan. This
| customization optimizes the result. In AI, fine-tuning adjusts
| a pre-existing model to perform better on a specific task.
|
| Embeddings are like categorizing ingredients based on
| properties. They represent inputs so that similar inputs have
| similar representations. For instance, 'dog' and 'puppy' in an
| AI model have similar meanings. Like ingredients in a pizza,
| embeddings help the model understand and interpret the inputs.
| So, fine-tuning is about improving the model's performance,
| while embeddings help the model comprehend its inputs.
|
| It turns out, you can search a vector space of embeddings to
| find similar embeddings. If I turned my above post into 2
| embeddings, and you searched for "golden retreiver" though
| neither paragraph has that exact phrase, the model should know
| a golden retreiver is most similar to the second paragraph that
| compares puppy to dog.
| SparkyMcUnicorn wrote:
| I like to think of an LLM as a literal human. Not sure if
| it's the best analogy.
|
| Fine tuning = Adding years of experience, in a set
| environment. e.g. Raise them in a home that only speaks in
| old english, learn pig latin, send them to a bootcamp.
|
| Embedding = Giving them a book to reference information.
|
| Just like a human, memory might fade a bit through the years
| but old habits die hard. You might not perfectly recollect
| what you learned years ago, but you still get the general
| idea, and if you took a class on the referenced book you'll
| be better at relaying information from it.
|
| Edit: Asked ChatGPT to create the analogy.
|
| A language model is like an intelligent person.
|
| - Pre-training is their broad education and general
| knowledge.
|
| - Fine-tuning is their years of specialized experience in a
| specific field.
|
| - Embedding is like giving them a comprehensive book on a
| particular subject.
|
| Just as a person gains knowledge, expertise, and specialized
| resources, the language model develops its understanding and
| performance through pre-training, fine-tuning, and embedding.
___________________________________________________________________
(page generated 2023-05-25 23:00 UTC)