[HN Gopher] InternLM - new open source 7B LLM
       ___________________________________________________________________
        
       InternLM - new open source 7B LLM
        
       Author : freediver
       Score  : 279 points
       Date   : 2023-07-06 07:02 UTC (15 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tyfon wrote:
       | > trust_remote_code=True
       | 
       | This is a hard no from me, anyone know why this is so common in
       | models from China? I'm not getting into conspiracies or anything
       | here, but I've seen it in quite a few others from there.
       | 
       | I wouldn't run a model with this requirement from anyone else for
       | that matter.
        
         | ShamelessC wrote:
         | Mind pasting the link to that line? Am on mobile and can't find
         | it myself easily.
        
         | ipsum2 wrote:
         | I believe it's because the model architecture isn't added to
         | Huggingface transformer library, so it needs to eval some
         | python code (i.e. load a pickle) to create the PyTorch model.
         | Have not noticed it to be specific to models from China, almost
         | all lesser known models have to do this.
        
         | rfoo wrote:
         | That's because the model architecture hasn't been added to
         | huggingface/transformers yet, because it literally was just
         | published today.                   >>> from transformers import
         | AutoTokenizer, AutoModel         >>> model =
         | AutoModel.from_pretrained("internlm/internlm-chat-7b",
         | trust_remote_code=True, device='cuda')
         | 
         | Here, the "trust_remote_code=True" means "download the model
         | code from huggingface repo 'internlm/internlm-chat-7b'", along
         | with the weight, and run it. If it's False, the library would
         | use builtin model architectures hardcoded in
         | huggingface/transformers and only download the weight.
         | 
         | The scary flag is here because, of course, newcomers may not
         | realize that model == code and if you load arbitrary model you
         | are likely executing arbitrary code.
         | 
         | Wonder why, for example, you don't remember seeing LLaMA had
         | this on release day? Because they don't use huggingface
         | transformers library and don't use huggingface to distribute
         | their model. You just clone and run their code from GitHub,
         | and... how is this not "trust_remote_code"?
        
           | tomsmeding wrote:
           | > newcomers may not realize that model == code
           | 
           | This makes sense in a way given the API of typical ML
           | libraries. But there is no fundamental reason this needs to
           | be the case.
           | 
           | Or, more correctly stated: model == code for sure, but said
           | code need not have any rights to perform side effects. For
           | some reason e.g. TensorFlow has stuff like tf.io.write_file
           | [1] (is that actually an operation you can put in a
           | model???), but one could easily imagine a more appropriate
           | domain-specific model language that your code is compiled to,
           | that can _by design_ not perform any IO. Imagine that a model
           | you distribute is not random Python code that may or may not
           | run a model, but instead _the model itself_ , i.e. the graph
           | encoded in that domain-specific language.
           | 
           | Then downloading a random model from some random untrusted
           | place is no different from downloading some random _data_
           | from some untrusted place: you 're going to execute the
           | model, which may DOS you, but nothing much else will happen.
           | 
           | Unfortunately the ML world is too stuck in the imperative
           | mindset for this (IMO more sensible) way of doing things. :)
           | 
           | [1]:
           | https://www.tensorflow.org/api_docs/python/tf/io/write_file
        
             | 411111111111111 wrote:
             | At that point you'd need a machine learning DSL and
             | runtime. Currently, it's all python libraries, so you can
             | do everything python can... Which is everything,
             | essentially.
             | 
             | It's highly unlikely that the market for running these
             | models like an appliance securely in an untrusted context
             | will ever manifest. It's just too much of a niche, as it
             | would also reduce their extensibility/usability
             | significantly
        
               | DougBTX wrote:
               | Something like this may grow out of the GGML project,
               | which is gaining traction. They already have a weights
               | format which can be loaded with mmap, though AFAIK the
               | model architecture still needs to be defined in C++.
        
           | tyfon wrote:
           | I've only used llama via llama.cpp.
           | 
           | In general I think the python ML stuff is a mess. But I still
           | won't execute code that recommend me to trust arbitrary
           | remote code as the remote code can change at any time, it
           | would be better to wait with the release until it was
           | published to the transformers library or just include it in a
           | clonable repo without the trust_remote_code flag.
           | 
           | It is much better to just be able to clone the code and have
           | it locally so you can verify it once and not trust that it
           | won't download any new code suddenly that you haven't been
           | able to look at.
           | 
           | trust_remote_code means you have no control really, cloning a
           | repo means you control when new code is added yourself.
        
             | rfoo wrote:
             | Yeah, I agree promoting this usage is as bad as promoting
             | `curl | sh` in README.md.
             | 
             | Similar to how you can inspect the content of a `curl | sh`
             | script and then run it, the model is also in a clonable
             | repo, you may just:                  git clone
             | https://huggingface.co/internlm/internlm-7b-chat
             | 
             | and:                   >>> from transformers import
             | AutoTokenizer, AutoModel         >>> model =
             | AutoModel.from_pretrained("./internlm-chat-7b",
             | device='cuda')
        
               | tyfon wrote:
               | This way is much more palpable for me, thank you for
               | showing :)
        
           | m3affan wrote:
           | Interesting attack vector. Malicious model codes.
        
           | oars wrote:
           | Thank you for the explanation.
        
         | jabbany wrote:
         | Seems pretty common though, for defining custom architecture
         | configs whatnot?
         | 
         | AFAIK the "remote code" is still openly hosted on huggingface
         | so you can audit it if you like. Seems no more dangerous than
         | things like `pip install some_random_library`?
        
           | Lockal wrote:
           | This has become less common in recent days, at least for
           | image generation (e. g. safetensors in Stable Diffusion).
           | 
           | The point of opensource models is that they can be finetuned.
           | When many people create finetuned versions, a zoo of models
           | appear. So far so good (maybe), but the bad practice of using
           | untrusted code from the zoo sooner or later will lead to a
           | wave of cryptominers, ransomware, and credential theft
           | incidents.
        
           | rfoo wrote:
           | I like this pip metaphor. If we had required `--trust-remote-
           | code` for every `npm install` we could have avoided left-pad
           | and most of the software supply chain drama in the past
           | years.
        
             | SahAssar wrote:
             | How would that have avoided left-pad? Do you just mean that
             | people would have been discouraged from pulling in so many
             | dependencies?
        
             | jsight wrote:
             | I think that would just teach people to type --trust-
             | remote-code fast.
        
       | freediver wrote:
       | Tried it with a few PyLLMs benchmarks [1] and it did not do very
       | well.
       | 
       | Q. Use m to substitute p, a to substitute e, n to substitute a, g
       | to substitute c, o to substitute h, how to spell peach under this
       | rule?
       | 
       | A.
       | 
       | m: p
       | 
       | a: e
       | 
       | n: a
       | 
       | g: c
       | 
       | o: h
       | 
       | how: peach
       | 
       | Q. write three sentences each ending with the word apple
       | 
       | A.
       | 
       | 1. The apple tree was covered in apples.
       | 
       | 2. She sliced the apple with a knife.
       | 
       | 3. The apple juice was too sweet.
       | 
       | Wonder if the synthetic benchmarks are very biased and not that
       | useful.
       | 
       | [1]
       | https://github.com/kagisearch/pyllms/blob/6c11cd51b099d33e61...
        
         | blitzar wrote:
         | "when a measure becomes a target, it ceases to be a good
         | measure"
         | 
         | The model itself it optimising weights for its targets, and the
         | model builders are optimising the model for the benchmarks.
        
       | timmclean wrote:
       | Saving you a click: despite what the repo title might suggest,
       | while the code is open source, the model weights cannot be used
       | commercially without permission.
       | 
       | > The code in this repository is open-source under the Apache-2.0
       | license. The InternLM weights are fully open for academic
       | research and also allow commercial use with written permission
       | from the official team. For inquiries about commercial licenses
       | and collaborations, please contact internlm@pjlab.org.cn.
       | 
       | https://github.com/InternLM/InternLM#open-source-license
        
       | exizt88 wrote:
       | > The code in this repository is open-source under the Apache-2.0
       | license. The InternLM weights are fully open for academic
       | research and also allow commercial use with written permission
       | from the official team. For inquiries about commercial licenses
       | and collaborations, please contact internlm@pjlab.org.cn.
       | 
       | This makes me much less excited about this model.
        
         | version_five wrote:
         | Agreed, this basically moves it to the "don't bother" pile.
         | There are already the llama variants with non-commercial
         | licenses, and open-llama as an open source model (I'm thinking
         | in the 7B space specifically). This would have to be pretty
         | friggin compelling to spend any time on.
        
           | echelon wrote:
           | Do they even have the legal justification of saying how you
           | can or cannot use the weights? It could be ruled that weights
           | are uncopyrightable. I think we as a community should
           | advocate for that.
           | 
           | If you train on data you don't own, the results (weights,
           | unmodified outputs) should be public domain. When people
           | create novel works on top (SaaS tools, music, films), then
           | those human combinations should hold copyright. Not the model
           | weights.
           | 
           | If you can prove you own all of the inputs into training,
           | then perhaps it's another story. But that could also be
           | dangerous and allow for data cartels to own the future.
        
         | hmottestad wrote:
         | Is that even valid? This seems to be the only place where
         | they've made this exemption, it's not written in the license.
         | Even the weights on hugging face are licensed under apache 2.0.
         | 
         | Doesn't apache 2.0 allow for fairly unrestricted commercial
         | use? Isn't that the whole point of using that license?
        
           | exizt88 wrote:
           | The model code is Apache 2.0, the weights are proprietary.
        
             | huggingmouth wrote:
             | That's not their huggingface repo says:
             | https://huggingface.co/internlm/internlm-7b
             | 
             | The current release on huggingface is available under plain
             | apache 2.0
        
               | [deleted]
        
             | Tostino wrote:
             | If weights are even protected by copyright at all...which
             | would be a departure from current law.
        
         | ohgodplsno wrote:
         | Less excited that you can't freely take work from academics to
         | resell it in a shitty SaaS company ?
        
           | version_five wrote:
           | So you don't use open source software? You should try it,
           | there's a great ecosystem of free software, including lots
           | written by academics who are happy to have their work add
           | value to industry
        
             | ohgodplsno wrote:
             | I do. And I also don't use open-source software with a
             | commercial license for my job, because I respect the wishes
             | of the author. It doesn't make the projects, tools and
             | libraries any less good. OP is just looking to quickly cash
             | in to something he didn't put an ounce of effort in.
             | 
             | AGPL & Dual-licensing are the way forward, because of
             | leeches.
        
               | getmeinrn wrote:
               | The entitlement and audacity of people who consume open
               | source blows me away. I've maintained a project for 12
               | years and recently someone wanted me to help them
               | implement the software in their system. I politely told
               | them that since this wasn't a bug, they would need to
               | purchase a support package. They then accused me of
               | trying to "sell open source software" and closed the
               | issue. People are unbelievable. Fuck me for trying to
               | make a living providing you personal development time,
               | using my software that I've supported for free for over a
               | decade.
        
               | freedomben wrote:
               | I neither agree nor disagree with your position (still
               | thinking about it), but I do think that's right
               | uncharitable mind-reading you've done of GP. They never
               | said anything about "looking to quickly cash in to
               | something he didn't put an ounce of effort in."
        
           | blitzar wrote:
           | Less excited that I can't freely take work from someone,
           | create a startup that is going to resell it in a shitty SaaS
           | company and cash out for a half B. Yes, yes I am.
        
           | exizt88 wrote:
           | You do understand that academics are usually funded by
           | taxpayers? Obviously, not by me, as I don't pay taxes in
           | China, but it's not like academics are doing this work for
           | free. Society pays them for their work so that it can benefit
           | from its results.
        
             | ohgodplsno wrote:
             | You do understand that private companies, as a whole, are a
             | drain on the academic system, pushing to lower the very
             | taxes that fund this research ? Society should benefit from
             | these results. The 5000th LLM-haiku-generator-saas-company-
             | incorporated-in-delaware is not society.
        
             | YetAnotherNick wrote:
             | Is there any breakdown of private vs government funding for
             | general purpose academic research. I was under the
             | impression that most of the funds in field like ML come
             | from fees from undergrads and donations by alumnus, or by
             | private companies.
        
               | WoodenChair wrote:
               | > I was under the impression that most of the funds in
               | field like ML come from fees from undergrads and
               | donations by alumnus, or by private companies.
               | 
               | In the United States, tuition makes up less than 35% of
               | most universities' revenue.[0] Donations are significant,
               | but if we were to just look at research funding, it would
               | mostly be government grants.
               | 
               | "The federal government is by far the largest funder of
               | academic R&D..." [1]
               | 
               | [0] https://nces.ed.gov/programs/coe/indicator/cud [1]
               | https://ncses.nsf.gov/pubs/nsb20202/academic-r-d-in-the-
               | unit...
        
             | LightBug1 wrote:
             | "Society"
             | 
             |  _quiet chuckle_
        
             | rat9988 wrote:
             | Don't worry, the private money will not go to their pocket
             | but to fund future projects of the university, lab or
             | whatever. It's a way to lessen the burden on taxpayers, and
             | to shift it to those who benefit the most from it.
        
           | speedgoose wrote:
           | In Europe we call that a success story.
        
       | geckbench wrote:
       | [dead]
        
       | Tepix wrote:
       | I welcome new models! The more, the merrier.
       | 
       | That said, this model has been tailored but they are comparing it
       | to non-finetuned LLaMA-7B in their benchmark? That seems a bit
       | fainthearted.
        
         | moffkalast wrote:
         | > HumanEval: InternLM-7B: 10.4, LLaMA-7B: 14.0
         | 
         | The funny part is that the base model apparently outperforms
         | the fine tune.
         | 
         | So far the HumanEval benchmark seems to be the only one that
         | can objectively compare overall model performance despite being
         | a coding-only benchmark, the rest mostly just give a "99.7%
         | chatgpt" bullshit results. Turns out you can't compare creative
         | writing because all outputs are basically valid.
        
       | yieldcrv wrote:
       | why is 7B parameters seemingly a magic number?
        
         | brucethemoose2 wrote:
         | It matches LLaMA 7B, and it's "cheap" to train for a demo.
         | 
         | If they actually wanted to finetune/train for commodity
         | hardware use, 13b-40b would be a better target.
        
         | wilonth wrote:
         | 7B params would take 14gb of gpu RAM at fp16 precision. So it
         | would be able to run on 16gb GPUs with 2gb to spare for other
         | small things.
        
           | brucethemoose2 wrote:
           | But in practice, no one is running inference at FP16. int8 is
           | more like the bare minimum.
        
             | ForOldHack wrote:
             | I have an 8GB, and I am considering two more 8GB, it should
             | I get a single 16GB? The 8GB card was donated, and we need
             | some pipelining... I have 10~15 2GB quadro cards...
             | Apparently useless.
        
               | brucethemoose2 wrote:
               | I mean... It depends?
               | 
               | You are just trying to host a llama server?
               | 
               | Matching the VRAM doesn't _necessarily_ matter, get the
               | most you can afford on a single card. Splitting beyond 2
               | cards doesn 't work well at the moment.
               | 
               | Getting a non Nvidia card is a problem for certain
               | backends (like exLLaMA) but fine for llama.cpp in the
               | near future.
               | 
               | AFAIK most backends are not pipelined, the load jumps
               | sequentially from one GPU to the next.
        
         | peignoir wrote:
         | just easier to run on smaller hardware
        
         | datastack wrote:
         | I guess going with a parameter count that matches existing
         | models makes it easier to compare benchmarks. Perhaps there is
         | another particular reason like required memory, but momentum is
         | probably also significant.
        
       | usgroup wrote:
       | A related question -- when fine tuning a model like this to a
       | specific corpus, how does the fine tuning effect the actual chat
       | capability, since the chat model weights seem to come as a
       | separate model? Does one fine tune the LLM+Chat model directly?
       | If so, does that not require some kind of prompt based training
       | as opposed to just lookahead prediction? Does one have to fine
       | tune the LLM and then repeat whatever they do to get the LLM+Chat
       | model?
        
       | joelthelion wrote:
       | What kind of hardware do you need to run a model like this? Do
       | you need an A100, or can something smaller suffice?
       | 
       | What about for fine tuning? Are hardware requirements higher?
        
       | b1n wrote:
       | There is a great opportunity for totalitarian and authoritarian
       | regimes (China and UAE so far) to create commercially usable and
       | free LLMs that work significantly better than alternatives
       | (backed by large amounts of government money).
       | 
       | Over time, as they get used in more and more products, these LLMs
       | can become more 'aligned' to these regimes way of thinking.
       | 
       | There are no Chinese companies that are not part of the Chinese
       | government.
       | 
       | This is a new kind of cultural soft power.
        
         | heartbeats wrote:
         | Aligned how? If you download the code, it won't change under
         | your feet.
        
           | ASalazarMX wrote:
           | The weights, calculated through very intensive computing, are
           | what hold the knowledge in LLMs, the source code just
           | executes those. These products could just update/patch their
           | weights periodically, and no one would complain because
           | that's not bad per se.
        
       | airgapstopgap wrote:
       | Note that this is apparently a 7B version of a 104B model trained
       | with the intention of competing with OpenAI offerings on the
       | Chinese market [1]. There is a number of those projects:
       | Baichuan, ChatGLM2, InternLM and some more iirc, and they all
       | have small-scale opensource versions.
       | 
       | For what it's worth, I've tried out ChatGLM2-6B and Baichuan
       | converted to LLaMA (the architecture is literally identical in
       | that case). They're okay, though underwhelming given their
       | reported benchmarks; probably the main point of creating them is
       | gaining experience for engineers, and feedback from the wider
       | community that has less incentive to downplay their shortcomings.
       | 
       | Surprisingly, they do not appear censored in any particularly
       | "Chinese" political direction, but they share sensibilities of
       | ChatGPT and Claude.
       | 
       | 1. https://github.com/InternLM/InternLM-techreport
        
         | czDRZ-akk wrote:
         | Chinese regulation around generative AI isn't yet formalized,
         | including provisions for censorship. The Cyberspace
         | Administration of China published a set of draft measures[0]
         | for public comment, but it doesn't seem like a revised version
         | has been released.
         | 
         | The draft indicates that there will be some level of
         | censorship, but it's unclear what the scope will be. This
         | analysis[1] suggests that generative AI for research purposes
         | could be exempted (section 1). The same analysis points out
         | that there are other government bodies at play that are more
         | focused on advancing AI as an industry within China.
         | 
         | It does seem likely that there will be some kind of censorship
         | carve-out for AI research, whereas companies offering
         | generative AI products to the public will need to self-censor
         | to avoid fines and/or prosecution.
         | 
         | [0] https://digichina.stanford.edu/work/translation-measures-
         | for...
         | 
         | [1] https://fpf.org/blog/unveiling-chinas-generative-ai-
         | regulati...
        
         | potency wrote:
         | I don't know anything about what you're talking about. Where do
         | I start to learn some of the AI terminology, models, benefits
         | and drawbacks of each, etc?
        
           | hutzlibu wrote:
           | The most patient lecturer would probably be ChatGPT itself
           | ...
        
         | brucethemoose2 wrote:
         | > Surprisingly, they do not appear censored in any particularly
         | "Chinese" political direction, but they share sensibilities of
         | ChatGPT and Claude.
         | 
         | Perhaps they used GPT4 responses for the instruct finetuning,
         | as many LLaMA finetunes do?
         | 
         | The paper doesn't say where they got the data from, other than
         | "The pre-trained language model is further fine-tuned,
         | following the mainstream procedure as in InstructGPT."
         | 
         | (Also, I don't like how they use raw LLaMA 65b as a benchmark
         | rather than an instruct tuned derivative)
        
           | airgapstopgap wrote:
           | I believe it's more like they used Anthropic human preference
           | data [1] or similar, and accordingly Anthropic/progressive
           | American notion of honest-helpful-harmless behavior. Thus
           | I've seen models misgeneralize towards prudish finger-
           | wagging. For example they parse badwords like "beat",
           | "abuse", "steal" in morally neutral contexts ("beat a
           | benchmark" or something) as signifiers of substantial
           | transgression and spiral into telling me how, as language
           | models, they insist it's never okay to etc. etc. This
           | attitude was strikingly reminiscent of American models, even
           | though other failure modes - like hallucinations - don't seem
           | so similar.
           | 
           | Papers like Tulu [2] suggest that LLaMA-65b is indeed an
           | appropriate baseline, given reasonable prompting. Instruct
           | datasets only convey a flavor of responses, and for a strong
           | foundation model that can infer the intended flavor on its
           | own, naive finetuning seems to be detrimental. GPT-4 was much
           | more powerful prior to having been finetuned, if reports of
           | early witnesses and researchers are to be believed.
           | 
           | 1. https://huggingface.co/datasets/Anthropic/hh-rlhf
           | 
           | 2. https://arxiv.org/abs/2306.04751
        
       | YetAnotherNick wrote:
       | Is the dataset on which it is trained on mentioned anywhere?
        
       | kasia_wieczorek wrote:
       | Great name for a simple LLM hehe
        
       | gnrlst wrote:
       | Is this also censored/nerfed? I'd love to play with a "raw"
       | unnerfed model to fully grasp what an LLM can do (and see how
       | biased it is). Does anyone have any recommendations for unnerfed
       | models to try out?
        
         | cubefox wrote:
         | The most powerful available foundation model is code-
         | davinci-002, a.k.a. GPT-3.5. It's only available on Azure since
         | OpenAI removed it from their own Playground and API for some
         | reason.
        
           | alach11 wrote:
           | Maybe you mean gpt-3.5-turbo or text-davinci-003? Or GPT-4
           | (technically in beta so not fully available to everyone)?
        
             | cubefox wrote:
             | No, those are all fine-tuned models which are "nerfed" in
             | the terminology of the OP. I mean code-davinci-002, the
             | GPT-3.5 base model.
        
               | squeaky-clean wrote:
               | Is that what nerfed means? I usually see "nerfed" used in
               | a way that means that it will refuse to answer certain
               | topics. "I can't answer that as it would violate
               | copyright" and such.
        
               | cubefox wrote:
               | The fine-tuned models are certainly censored and not
               | "raw".
        
               | squeaky-clean wrote:
               | But doesn't code-davinci-002 also have OpenAI's filters
               | in between you and the model?
        
               | seanhunter wrote:
               | code-davinci models are finetuned on code so I don't
               | think that's what the OP wants. For reference the family
               | tree is here https://platform.openai.com/docs/model-
               | index-for-researchers
        
               | cubefox wrote:
               | As the website you linked says, code-davinci-002 is not
               | fine-tuned. It is the GPT-3.5 base model.
        
               | alach11 wrote:
               | > I mean... the GPT-3.5 base model
               | 
               | That would be text-davinci-003, I believe.
        
               | cubefox wrote:
               | No, text-davinvi-003 is fine-tuned. The base model is
               | code-davinci-002. See
               | https://platform.openai.com/docs/model-index-for-
               | researchers
        
           | RobotToaster wrote:
           | The model isn't available at all?
        
             | cubefox wrote:
             | It is available in the sense that it is accessible. The
             | weights are not available for download of course, but the
             | OP wanted to "play around" with it, for which only access
             | is required. There is no other accessible foundation model
             | that can compete with GPT-3.5.
        
               | cubefox wrote:
               | Why are you guys downvoting me?
        
               | brucethemoose2 wrote:
               | Because GPT 3.5 not very good compared to LLaMA 65b or
               | even 33b finetunes, from my testing.
               | 
               | Also because 3.5 is not really available?
        
               | cubefox wrote:
               | Have you actually tested code-davinci-002?
        
           | seanhunter wrote:
           | All 3 text-davinci models are available on openAI's api.
           | including 3 (which is the GPT-3.5 gen). Code-davinci-002 is a
           | code-tuned model, You can see a nice visual summary of the
           | relationships between the openAI models at
           | https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-
           | Tr...
           | 
           | Or the official source is
           | https://platform.openai.com/docs/model-index-for-researchers
        
             | cubefox wrote:
             | > All 3 text-davinci models are available on openAI's api.
             | 
             | That's irrelevant because these are all fine-tuned.
             | 
             | > Code-davinci-002 is a code-tuned model
             | 
             | No, "code-tuned" isn't even a thing. It is a foundation
             | model, which consists purely of pretreating. No fine-tuning
             | is involved.
             | 
             | > Or the official source is
             | 
             | The official source says exactly what I just said.
        
         | lioeters wrote:
         | https://huggingface.co/models?search=uncensored&sort=trendin...
        
         | logicchains wrote:
         | LLaMA 65B is the best uncensored model we've got, and the
         | Airoboros fine-tuning if you want it to follow instructions.
        
       ___________________________________________________________________
       (page generated 2023-07-06 23:02 UTC)