[HN Gopher] InternLM - new open source 7B LLM
___________________________________________________________________
InternLM - new open source 7B LLM
Author : freediver
Score : 279 points
Date : 2023-07-06 07:02 UTC (15 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tyfon wrote:
| > trust_remote_code=True
|
| This is a hard no from me, anyone know why this is so common in
| models from China? I'm not getting into conspiracies or anything
| here, but I've seen it in quite a few others from there.
|
| I wouldn't run a model with this requirement from anyone else for
| that matter.
| ShamelessC wrote:
| Mind pasting the link to that line? Am on mobile and can't find
| it myself easily.
| ipsum2 wrote:
| I believe it's because the model architecture isn't added to
| Huggingface transformer library, so it needs to eval some
| python code (i.e. load a pickle) to create the PyTorch model.
| Have not noticed it to be specific to models from China, almost
| all lesser known models have to do this.
| rfoo wrote:
| That's because the model architecture hasn't been added to
| huggingface/transformers yet, because it literally was just
| published today. >>> from transformers import
| AutoTokenizer, AutoModel >>> model =
| AutoModel.from_pretrained("internlm/internlm-chat-7b",
| trust_remote_code=True, device='cuda')
|
| Here, the "trust_remote_code=True" means "download the model
| code from huggingface repo 'internlm/internlm-chat-7b'", along
| with the weight, and run it. If it's False, the library would
| use builtin model architectures hardcoded in
| huggingface/transformers and only download the weight.
|
| The scary flag is here because, of course, newcomers may not
| realize that model == code and if you load arbitrary model you
| are likely executing arbitrary code.
|
| Wonder why, for example, you don't remember seeing LLaMA had
| this on release day? Because they don't use huggingface
| transformers library and don't use huggingface to distribute
| their model. You just clone and run their code from GitHub,
| and... how is this not "trust_remote_code"?
| tomsmeding wrote:
| > newcomers may not realize that model == code
|
| This makes sense in a way given the API of typical ML
| libraries. But there is no fundamental reason this needs to
| be the case.
|
| Or, more correctly stated: model == code for sure, but said
| code need not have any rights to perform side effects. For
| some reason e.g. TensorFlow has stuff like tf.io.write_file
| [1] (is that actually an operation you can put in a
| model???), but one could easily imagine a more appropriate
| domain-specific model language that your code is compiled to,
| that can _by design_ not perform any IO. Imagine that a model
| you distribute is not random Python code that may or may not
| run a model, but instead _the model itself_ , i.e. the graph
| encoded in that domain-specific language.
|
| Then downloading a random model from some random untrusted
| place is no different from downloading some random _data_
| from some untrusted place: you 're going to execute the
| model, which may DOS you, but nothing much else will happen.
|
| Unfortunately the ML world is too stuck in the imperative
| mindset for this (IMO more sensible) way of doing things. :)
|
| [1]:
| https://www.tensorflow.org/api_docs/python/tf/io/write_file
| 411111111111111 wrote:
| At that point you'd need a machine learning DSL and
| runtime. Currently, it's all python libraries, so you can
| do everything python can... Which is everything,
| essentially.
|
| It's highly unlikely that the market for running these
| models like an appliance securely in an untrusted context
| will ever manifest. It's just too much of a niche, as it
| would also reduce their extensibility/usability
| significantly
| DougBTX wrote:
| Something like this may grow out of the GGML project,
| which is gaining traction. They already have a weights
| format which can be loaded with mmap, though AFAIK the
| model architecture still needs to be defined in C++.
| tyfon wrote:
| I've only used llama via llama.cpp.
|
| In general I think the python ML stuff is a mess. But I still
| won't execute code that recommend me to trust arbitrary
| remote code as the remote code can change at any time, it
| would be better to wait with the release until it was
| published to the transformers library or just include it in a
| clonable repo without the trust_remote_code flag.
|
| It is much better to just be able to clone the code and have
| it locally so you can verify it once and not trust that it
| won't download any new code suddenly that you haven't been
| able to look at.
|
| trust_remote_code means you have no control really, cloning a
| repo means you control when new code is added yourself.
| rfoo wrote:
| Yeah, I agree promoting this usage is as bad as promoting
| `curl | sh` in README.md.
|
| Similar to how you can inspect the content of a `curl | sh`
| script and then run it, the model is also in a clonable
| repo, you may just: git clone
| https://huggingface.co/internlm/internlm-7b-chat
|
| and: >>> from transformers import
| AutoTokenizer, AutoModel >>> model =
| AutoModel.from_pretrained("./internlm-chat-7b",
| device='cuda')
| tyfon wrote:
| This way is much more palpable for me, thank you for
| showing :)
| m3affan wrote:
| Interesting attack vector. Malicious model codes.
| oars wrote:
| Thank you for the explanation.
| jabbany wrote:
| Seems pretty common though, for defining custom architecture
| configs whatnot?
|
| AFAIK the "remote code" is still openly hosted on huggingface
| so you can audit it if you like. Seems no more dangerous than
| things like `pip install some_random_library`?
| Lockal wrote:
| This has become less common in recent days, at least for
| image generation (e. g. safetensors in Stable Diffusion).
|
| The point of opensource models is that they can be finetuned.
| When many people create finetuned versions, a zoo of models
| appear. So far so good (maybe), but the bad practice of using
| untrusted code from the zoo sooner or later will lead to a
| wave of cryptominers, ransomware, and credential theft
| incidents.
| rfoo wrote:
| I like this pip metaphor. If we had required `--trust-remote-
| code` for every `npm install` we could have avoided left-pad
| and most of the software supply chain drama in the past
| years.
| SahAssar wrote:
| How would that have avoided left-pad? Do you just mean that
| people would have been discouraged from pulling in so many
| dependencies?
| jsight wrote:
| I think that would just teach people to type --trust-
| remote-code fast.
| freediver wrote:
| Tried it with a few PyLLMs benchmarks [1] and it did not do very
| well.
|
| Q. Use m to substitute p, a to substitute e, n to substitute a, g
| to substitute c, o to substitute h, how to spell peach under this
| rule?
|
| A.
|
| m: p
|
| a: e
|
| n: a
|
| g: c
|
| o: h
|
| how: peach
|
| Q. write three sentences each ending with the word apple
|
| A.
|
| 1. The apple tree was covered in apples.
|
| 2. She sliced the apple with a knife.
|
| 3. The apple juice was too sweet.
|
| Wonder if the synthetic benchmarks are very biased and not that
| useful.
|
| [1]
| https://github.com/kagisearch/pyllms/blob/6c11cd51b099d33e61...
| blitzar wrote:
| "when a measure becomes a target, it ceases to be a good
| measure"
|
| The model itself it optimising weights for its targets, and the
| model builders are optimising the model for the benchmarks.
| timmclean wrote:
| Saving you a click: despite what the repo title might suggest,
| while the code is open source, the model weights cannot be used
| commercially without permission.
|
| > The code in this repository is open-source under the Apache-2.0
| license. The InternLM weights are fully open for academic
| research and also allow commercial use with written permission
| from the official team. For inquiries about commercial licenses
| and collaborations, please contact internlm@pjlab.org.cn.
|
| https://github.com/InternLM/InternLM#open-source-license
| exizt88 wrote:
| > The code in this repository is open-source under the Apache-2.0
| license. The InternLM weights are fully open for academic
| research and also allow commercial use with written permission
| from the official team. For inquiries about commercial licenses
| and collaborations, please contact internlm@pjlab.org.cn.
|
| This makes me much less excited about this model.
| version_five wrote:
| Agreed, this basically moves it to the "don't bother" pile.
| There are already the llama variants with non-commercial
| licenses, and open-llama as an open source model (I'm thinking
| in the 7B space specifically). This would have to be pretty
| friggin compelling to spend any time on.
| echelon wrote:
| Do they even have the legal justification of saying how you
| can or cannot use the weights? It could be ruled that weights
| are uncopyrightable. I think we as a community should
| advocate for that.
|
| If you train on data you don't own, the results (weights,
| unmodified outputs) should be public domain. When people
| create novel works on top (SaaS tools, music, films), then
| those human combinations should hold copyright. Not the model
| weights.
|
| If you can prove you own all of the inputs into training,
| then perhaps it's another story. But that could also be
| dangerous and allow for data cartels to own the future.
| hmottestad wrote:
| Is that even valid? This seems to be the only place where
| they've made this exemption, it's not written in the license.
| Even the weights on hugging face are licensed under apache 2.0.
|
| Doesn't apache 2.0 allow for fairly unrestricted commercial
| use? Isn't that the whole point of using that license?
| exizt88 wrote:
| The model code is Apache 2.0, the weights are proprietary.
| huggingmouth wrote:
| That's not their huggingface repo says:
| https://huggingface.co/internlm/internlm-7b
|
| The current release on huggingface is available under plain
| apache 2.0
| [deleted]
| Tostino wrote:
| If weights are even protected by copyright at all...which
| would be a departure from current law.
| ohgodplsno wrote:
| Less excited that you can't freely take work from academics to
| resell it in a shitty SaaS company ?
| version_five wrote:
| So you don't use open source software? You should try it,
| there's a great ecosystem of free software, including lots
| written by academics who are happy to have their work add
| value to industry
| ohgodplsno wrote:
| I do. And I also don't use open-source software with a
| commercial license for my job, because I respect the wishes
| of the author. It doesn't make the projects, tools and
| libraries any less good. OP is just looking to quickly cash
| in to something he didn't put an ounce of effort in.
|
| AGPL & Dual-licensing are the way forward, because of
| leeches.
| getmeinrn wrote:
| The entitlement and audacity of people who consume open
| source blows me away. I've maintained a project for 12
| years and recently someone wanted me to help them
| implement the software in their system. I politely told
| them that since this wasn't a bug, they would need to
| purchase a support package. They then accused me of
| trying to "sell open source software" and closed the
| issue. People are unbelievable. Fuck me for trying to
| make a living providing you personal development time,
| using my software that I've supported for free for over a
| decade.
| freedomben wrote:
| I neither agree nor disagree with your position (still
| thinking about it), but I do think that's right
| uncharitable mind-reading you've done of GP. They never
| said anything about "looking to quickly cash in to
| something he didn't put an ounce of effort in."
| blitzar wrote:
| Less excited that I can't freely take work from someone,
| create a startup that is going to resell it in a shitty SaaS
| company and cash out for a half B. Yes, yes I am.
| exizt88 wrote:
| You do understand that academics are usually funded by
| taxpayers? Obviously, not by me, as I don't pay taxes in
| China, but it's not like academics are doing this work for
| free. Society pays them for their work so that it can benefit
| from its results.
| ohgodplsno wrote:
| You do understand that private companies, as a whole, are a
| drain on the academic system, pushing to lower the very
| taxes that fund this research ? Society should benefit from
| these results. The 5000th LLM-haiku-generator-saas-company-
| incorporated-in-delaware is not society.
| YetAnotherNick wrote:
| Is there any breakdown of private vs government funding for
| general purpose academic research. I was under the
| impression that most of the funds in field like ML come
| from fees from undergrads and donations by alumnus, or by
| private companies.
| WoodenChair wrote:
| > I was under the impression that most of the funds in
| field like ML come from fees from undergrads and
| donations by alumnus, or by private companies.
|
| In the United States, tuition makes up less than 35% of
| most universities' revenue.[0] Donations are significant,
| but if we were to just look at research funding, it would
| mostly be government grants.
|
| "The federal government is by far the largest funder of
| academic R&D..." [1]
|
| [0] https://nces.ed.gov/programs/coe/indicator/cud [1]
| https://ncses.nsf.gov/pubs/nsb20202/academic-r-d-in-the-
| unit...
| LightBug1 wrote:
| "Society"
|
| _quiet chuckle_
| rat9988 wrote:
| Don't worry, the private money will not go to their pocket
| but to fund future projects of the university, lab or
| whatever. It's a way to lessen the burden on taxpayers, and
| to shift it to those who benefit the most from it.
| speedgoose wrote:
| In Europe we call that a success story.
| geckbench wrote:
| [dead]
| Tepix wrote:
| I welcome new models! The more, the merrier.
|
| That said, this model has been tailored but they are comparing it
| to non-finetuned LLaMA-7B in their benchmark? That seems a bit
| fainthearted.
| moffkalast wrote:
| > HumanEval: InternLM-7B: 10.4, LLaMA-7B: 14.0
|
| The funny part is that the base model apparently outperforms
| the fine tune.
|
| So far the HumanEval benchmark seems to be the only one that
| can objectively compare overall model performance despite being
| a coding-only benchmark, the rest mostly just give a "99.7%
| chatgpt" bullshit results. Turns out you can't compare creative
| writing because all outputs are basically valid.
| yieldcrv wrote:
| why is 7B parameters seemingly a magic number?
| brucethemoose2 wrote:
| It matches LLaMA 7B, and it's "cheap" to train for a demo.
|
| If they actually wanted to finetune/train for commodity
| hardware use, 13b-40b would be a better target.
| wilonth wrote:
| 7B params would take 14gb of gpu RAM at fp16 precision. So it
| would be able to run on 16gb GPUs with 2gb to spare for other
| small things.
| brucethemoose2 wrote:
| But in practice, no one is running inference at FP16. int8 is
| more like the bare minimum.
| ForOldHack wrote:
| I have an 8GB, and I am considering two more 8GB, it should
| I get a single 16GB? The 8GB card was donated, and we need
| some pipelining... I have 10~15 2GB quadro cards...
| Apparently useless.
| brucethemoose2 wrote:
| I mean... It depends?
|
| You are just trying to host a llama server?
|
| Matching the VRAM doesn't _necessarily_ matter, get the
| most you can afford on a single card. Splitting beyond 2
| cards doesn 't work well at the moment.
|
| Getting a non Nvidia card is a problem for certain
| backends (like exLLaMA) but fine for llama.cpp in the
| near future.
|
| AFAIK most backends are not pipelined, the load jumps
| sequentially from one GPU to the next.
| peignoir wrote:
| just easier to run on smaller hardware
| datastack wrote:
| I guess going with a parameter count that matches existing
| models makes it easier to compare benchmarks. Perhaps there is
| another particular reason like required memory, but momentum is
| probably also significant.
| usgroup wrote:
| A related question -- when fine tuning a model like this to a
| specific corpus, how does the fine tuning effect the actual chat
| capability, since the chat model weights seem to come as a
| separate model? Does one fine tune the LLM+Chat model directly?
| If so, does that not require some kind of prompt based training
| as opposed to just lookahead prediction? Does one have to fine
| tune the LLM and then repeat whatever they do to get the LLM+Chat
| model?
| joelthelion wrote:
| What kind of hardware do you need to run a model like this? Do
| you need an A100, or can something smaller suffice?
|
| What about for fine tuning? Are hardware requirements higher?
| b1n wrote:
| There is a great opportunity for totalitarian and authoritarian
| regimes (China and UAE so far) to create commercially usable and
| free LLMs that work significantly better than alternatives
| (backed by large amounts of government money).
|
| Over time, as they get used in more and more products, these LLMs
| can become more 'aligned' to these regimes way of thinking.
|
| There are no Chinese companies that are not part of the Chinese
| government.
|
| This is a new kind of cultural soft power.
| heartbeats wrote:
| Aligned how? If you download the code, it won't change under
| your feet.
| ASalazarMX wrote:
| The weights, calculated through very intensive computing, are
| what hold the knowledge in LLMs, the source code just
| executes those. These products could just update/patch their
| weights periodically, and no one would complain because
| that's not bad per se.
| airgapstopgap wrote:
| Note that this is apparently a 7B version of a 104B model trained
| with the intention of competing with OpenAI offerings on the
| Chinese market [1]. There is a number of those projects:
| Baichuan, ChatGLM2, InternLM and some more iirc, and they all
| have small-scale opensource versions.
|
| For what it's worth, I've tried out ChatGLM2-6B and Baichuan
| converted to LLaMA (the architecture is literally identical in
| that case). They're okay, though underwhelming given their
| reported benchmarks; probably the main point of creating them is
| gaining experience for engineers, and feedback from the wider
| community that has less incentive to downplay their shortcomings.
|
| Surprisingly, they do not appear censored in any particularly
| "Chinese" political direction, but they share sensibilities of
| ChatGPT and Claude.
|
| 1. https://github.com/InternLM/InternLM-techreport
| czDRZ-akk wrote:
| Chinese regulation around generative AI isn't yet formalized,
| including provisions for censorship. The Cyberspace
| Administration of China published a set of draft measures[0]
| for public comment, but it doesn't seem like a revised version
| has been released.
|
| The draft indicates that there will be some level of
| censorship, but it's unclear what the scope will be. This
| analysis[1] suggests that generative AI for research purposes
| could be exempted (section 1). The same analysis points out
| that there are other government bodies at play that are more
| focused on advancing AI as an industry within China.
|
| It does seem likely that there will be some kind of censorship
| carve-out for AI research, whereas companies offering
| generative AI products to the public will need to self-censor
| to avoid fines and/or prosecution.
|
| [0] https://digichina.stanford.edu/work/translation-measures-
| for...
|
| [1] https://fpf.org/blog/unveiling-chinas-generative-ai-
| regulati...
| potency wrote:
| I don't know anything about what you're talking about. Where do
| I start to learn some of the AI terminology, models, benefits
| and drawbacks of each, etc?
| hutzlibu wrote:
| The most patient lecturer would probably be ChatGPT itself
| ...
| brucethemoose2 wrote:
| > Surprisingly, they do not appear censored in any particularly
| "Chinese" political direction, but they share sensibilities of
| ChatGPT and Claude.
|
| Perhaps they used GPT4 responses for the instruct finetuning,
| as many LLaMA finetunes do?
|
| The paper doesn't say where they got the data from, other than
| "The pre-trained language model is further fine-tuned,
| following the mainstream procedure as in InstructGPT."
|
| (Also, I don't like how they use raw LLaMA 65b as a benchmark
| rather than an instruct tuned derivative)
| airgapstopgap wrote:
| I believe it's more like they used Anthropic human preference
| data [1] or similar, and accordingly Anthropic/progressive
| American notion of honest-helpful-harmless behavior. Thus
| I've seen models misgeneralize towards prudish finger-
| wagging. For example they parse badwords like "beat",
| "abuse", "steal" in morally neutral contexts ("beat a
| benchmark" or something) as signifiers of substantial
| transgression and spiral into telling me how, as language
| models, they insist it's never okay to etc. etc. This
| attitude was strikingly reminiscent of American models, even
| though other failure modes - like hallucinations - don't seem
| so similar.
|
| Papers like Tulu [2] suggest that LLaMA-65b is indeed an
| appropriate baseline, given reasonable prompting. Instruct
| datasets only convey a flavor of responses, and for a strong
| foundation model that can infer the intended flavor on its
| own, naive finetuning seems to be detrimental. GPT-4 was much
| more powerful prior to having been finetuned, if reports of
| early witnesses and researchers are to be believed.
|
| 1. https://huggingface.co/datasets/Anthropic/hh-rlhf
|
| 2. https://arxiv.org/abs/2306.04751
| YetAnotherNick wrote:
| Is the dataset on which it is trained on mentioned anywhere?
| kasia_wieczorek wrote:
| Great name for a simple LLM hehe
| gnrlst wrote:
| Is this also censored/nerfed? I'd love to play with a "raw"
| unnerfed model to fully grasp what an LLM can do (and see how
| biased it is). Does anyone have any recommendations for unnerfed
| models to try out?
| cubefox wrote:
| The most powerful available foundation model is code-
| davinci-002, a.k.a. GPT-3.5. It's only available on Azure since
| OpenAI removed it from their own Playground and API for some
| reason.
| alach11 wrote:
| Maybe you mean gpt-3.5-turbo or text-davinci-003? Or GPT-4
| (technically in beta so not fully available to everyone)?
| cubefox wrote:
| No, those are all fine-tuned models which are "nerfed" in
| the terminology of the OP. I mean code-davinci-002, the
| GPT-3.5 base model.
| squeaky-clean wrote:
| Is that what nerfed means? I usually see "nerfed" used in
| a way that means that it will refuse to answer certain
| topics. "I can't answer that as it would violate
| copyright" and such.
| cubefox wrote:
| The fine-tuned models are certainly censored and not
| "raw".
| squeaky-clean wrote:
| But doesn't code-davinci-002 also have OpenAI's filters
| in between you and the model?
| seanhunter wrote:
| code-davinci models are finetuned on code so I don't
| think that's what the OP wants. For reference the family
| tree is here https://platform.openai.com/docs/model-
| index-for-researchers
| cubefox wrote:
| As the website you linked says, code-davinci-002 is not
| fine-tuned. It is the GPT-3.5 base model.
| alach11 wrote:
| > I mean... the GPT-3.5 base model
|
| That would be text-davinci-003, I believe.
| cubefox wrote:
| No, text-davinvi-003 is fine-tuned. The base model is
| code-davinci-002. See
| https://platform.openai.com/docs/model-index-for-
| researchers
| RobotToaster wrote:
| The model isn't available at all?
| cubefox wrote:
| It is available in the sense that it is accessible. The
| weights are not available for download of course, but the
| OP wanted to "play around" with it, for which only access
| is required. There is no other accessible foundation model
| that can compete with GPT-3.5.
| cubefox wrote:
| Why are you guys downvoting me?
| brucethemoose2 wrote:
| Because GPT 3.5 not very good compared to LLaMA 65b or
| even 33b finetunes, from my testing.
|
| Also because 3.5 is not really available?
| cubefox wrote:
| Have you actually tested code-davinci-002?
| seanhunter wrote:
| All 3 text-davinci models are available on openAI's api.
| including 3 (which is the GPT-3.5 gen). Code-davinci-002 is a
| code-tuned model, You can see a nice visual summary of the
| relationships between the openAI models at
| https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-
| Tr...
|
| Or the official source is
| https://platform.openai.com/docs/model-index-for-researchers
| cubefox wrote:
| > All 3 text-davinci models are available on openAI's api.
|
| That's irrelevant because these are all fine-tuned.
|
| > Code-davinci-002 is a code-tuned model
|
| No, "code-tuned" isn't even a thing. It is a foundation
| model, which consists purely of pretreating. No fine-tuning
| is involved.
|
| > Or the official source is
|
| The official source says exactly what I just said.
| lioeters wrote:
| https://huggingface.co/models?search=uncensored&sort=trendin...
| logicchains wrote:
| LLaMA 65B is the best uncensored model we've got, and the
| Airoboros fine-tuning if you want it to follow instructions.
___________________________________________________________________
(page generated 2023-07-06 23:02 UTC)