[HN Gopher] Viking 7B: open LLM for the Nordic languages trained...
       ___________________________________________________________________
        
       Viking 7B: open LLM for the Nordic languages trained on AMD GPUs
        
       Author : reqo
       Score  : 96 points
       Date   : 2024-05-15 16:05 UTC (6 hours ago)
        
 (HTM) web link (www.silo.ai)
 (TXT) w3m dump (www.silo.ai)
        
       | dmichulke wrote:
       | Is there something similar for romance or Germanic languages?
       | 
       | And how did they decide that, e.g., German or Dutch would make
       | the model worse?
        
         | frodo8sam wrote:
         | I don't think they decided that, they included Finish which is
         | completely unrelated to the other nordic languages. If they
         | just picked languages that are related for cross learning
         | including Dutch or German would have made more sense indeed.
        
           | coffeebeqn wrote:
           | The root is unrelated but Finnish has certainly been shaped
           | by Swedish and Russian and most recently English languages in
           | the last 200 years
        
           | Jensson wrote:
           | Including Finnish was probably just a political choice, since
           | Finland and Sweden are very close politically, much closer
           | than Germany or other areas with more similar languages.
        
             | Asraelite wrote:
             | This was done by a Finnish company and university. They
             | would've included Finnish even without any political
             | motivation.
        
           | chymist wrote:
           | The company is based in Finland, they started with Finnish
        
             | frodo8sam wrote:
             | I understand, not saying they did something wrong just
             | pointing out the selection of languages was not because of
             | them belonging to the same family but rather to serve a
             | certain region.
        
         | KeplerBoy wrote:
         | The nordic languages are germanic except for finnish, but yeah
         | finnish is an exception and I'd expect most small LLMs to
         | struggle with it.
        
       | bangaladore wrote:
       | I have had this question. How much better would common LLMs
       | (Llama, GPTN) be if they were only trained in one language? I
       | have to assume they would perform better, but I might be wrong.
        
         | fermuch wrote:
         | Just like adding code to textual models helps the model develop
         | its reasoning capabilities, it seems like adding more languages
         | helps in other areas too. What is needed is more good quality
         | data to train on...
        
           | nickpsecurity wrote:
           | We also see humans get worse at specific things when they
           | learn too much in general. There is a cut-off point to how
           | many concepts we can learn with what skill. To be most
           | effective, we have to specialize in the right things while
           | continuing to acquire generalist knowledge. It's a balancing
           | act.
           | 
           | These architectures are _less_ capable than brains in many
           | ways. So, we should expect them to have such trade-offs. An
           | efficient one should work fine on English, mathematical
           | notation, and a programming language. Maybe samples of others
           | that illustrate unique concepts. I'm also curious how many
           | languages or concepts you can add to a given architecture
           | before its effectiveness starts dropping.
        
           | worldsayshi wrote:
           | I guess you mean non-textual data then because the amount of
           | text data they are being trained on ought to be enough for
           | agi by now?
           | 
           | Some kind of diminishing returns asymptote from text volume
           | alone must have been hit a long time ago.
        
             | imtringued wrote:
             | It's not the amount that is wrong, it's how the model is
             | trained. The model is trained for zero and few shot tasks.
             | It is not surprising that it is performing well when you
             | ask for that.
        
           | darby_eight wrote:
           | > its reasoning capabilities
           | 
           | To be clear, LLMs are not capable of reasoning.
        
             | whimsicalism wrote:
             | imo this is an uninteresting debate over
             | semantics/metaphysics
        
               | ganzuul wrote:
               | Would you say a deontologist reasons? Evolution survives,
               | but does it reason?
               | 
               | Is it reasonable to show interest in something you call
               | uninteresting?
               | 
               | Was Godel a reasonable man, starving to death in fear of
               | being poisoned?
        
         | coffeebeqn wrote:
         | Perform better how? Knowing more languages gives you more data
         | and different points of view rather than just using the English
         | corpus and culture. When I ask chatgpt for a translation it
         | seems to understand the meaning behind the words and finds the
         | closest thing in the other language. The datasets seem to merge
         | in some way
        
           | ClarityJones wrote:
           | Fair, but there may be overhead that doesn't need to exist.
           | Certainly - for the limited compute my brain can accomplish -
           | I could gain a deeper understanding of physics, if I focused
           | on learning physics and didn't also have to simultaneously
           | learn French.
        
             | NhanH wrote:
             | Human is not limited by computational power of brain (or
             | rather, it is not the limitation we encounter). We are
             | limited by time and the fact that our machinery degrades
             | with time (aging).
        
             | olddustytrail wrote:
             | In the short term. In the longer term you'll understand
             | concepts better when you're multilingual.
        
             | staticman2 wrote:
             | Wouldn't a better metaphor be if a child growing up in a
             | bilingual household would be worse at physics as an adult?
             | My guess would be growing up bilingual would have no
             | impact.
        
         | whimsicalism wrote:
         | they would perform worse, i promise you
        
           | rangerelf wrote:
           | "I promise you"?
           | 
           | This is Hackernews, I would have expected data, not promises.
        
           | ClarityJones wrote:
           | I think this makes sense to the extent that an understanding
           | of the differences between language helps separate out
           | language from the underlying meaning. However... the models
           | that are used receive input (i.e. translate from language),
           | and to learn / understand, and to output information (i.e.
           | re-encode into language), do not all have to be the same.
        
         | richdougherty wrote:
         | I can't track down the citation (either Google or DeepMind I
         | think), but I remember reading research from a year or two ago
         | how adding extra languages (French, German) improved English
         | language performance. There may have also been an investigation
         | about multi modality too, which found that adding vision or
         | audio helped with text as well.
        
       | ganzuul wrote:
       | Great talking points. These are highly relevant subjects and I'm
       | delighted we in the Nordics are keeping up with current
       | developments. This work is important for preserving our culture.
       | 
       | I hope to see this used to generate a customized curriculum for
       | each neurodiverse child so that we can live in a more equitable
       | society.
        
         | nwoli wrote:
         | No offense but this reply is giving me such "generated by an
         | LLM" vibes, I'm curious if it is
        
           | ganzuul wrote:
           | Who knows? Maybe I am an AI set to break encryption and I'm
           | just hallucinating this, a training environment.
        
       | jarbus wrote:
       | Would love to know more about their experience training on AMD
       | GPUs. Was it just as seamless as using Cuda?
        
         | ganzuul wrote:
         | > To leverage the capabilities of MI250X, ROCm enables the use
         | of GPU matrix cores through its rocBLAS and MIOpen library
         | implementations that, in turn,are leveraged by PyTorch.
         | 
         | - https://aclanthology.org/2023.emnlp-main.164.pdf
         | 
         | https://github.com/TurkuNLP/
        
         | imtringued wrote:
         | They probably got a lot of hand holding from AMD.
        
           | KeplerBoy wrote:
           | Having access to enterprise GPUs on one of the biggest HPCs
           | systems in Europe is probably enough.
           | 
           | AMDs bad rep in AI is mostly due to flaky support of its
           | consumer GPUs.
        
       | ChrisArchitect wrote:
       | double slash in the shared link probably not ideal (though
       | inconsequential)
       | 
       | https://www.silo.ai/blog/viking-7b-the-first-open-llm-for-th...
        
       | melenaboija wrote:
       | Although not nordic not including basque which I guess could also
       | be considered an European low-resource language.
        
         | ghnws wrote:
         | I got the impression they are focusing on the nordic culture as
         | much as the languages.
         | 
         | >Silo AI and TurkuNLP are dedicated to developing models that
         | not only excel in linguistic performance and inclusivity but
         | are also attuned to local values and cultures.
        
       | matsemann wrote:
       | Would an LLM trained on a smaller language have better cultural
       | awareness etc than one trained in English? Because English is
       | written all over the world by all kinds of people, an English LLM
       | will average that (and for instance feel a bit off for an
       | American). But a Norwegian LLM for instance, trained on a
       | language mostly written by Norwegians, would that feel more
       | natural to me in comparison?
        
       | smokracek wrote:
       | First thing I notice is that Finnish is part of a completely
       | different language family from the other Nordic languages and
       | English (Uralic vs. Indo-European). I wonder to what extent this
       | affects the effectiveness of their low-resource training. Finnish
       | is highly agglutinative, adding prefixes and suffixes to modify a
       | root. My (amateur) take is that the tokenization and attention
       | patterns may differ a lot? Would love to see more educated people
       | than I discuss this.
        
         | ghnws wrote:
         | Then again the culture of Finland is very similar to the other
         | nordics, which looks to be one of the reasons for the project.
        
         | sandworm101 wrote:
         | >> to what extent this affects the effectiveness of
         | 
         | The correct use of those words demonstrates that you are either
         | not an AI, all of them being trained on so much bad language,
         | or are an AI from a more perfect future.
        
         | anewhnaccount3 wrote:
         | Finnish is not _so_ different dispute having different
         | lineages. Even if we talk about morphology, sometimes it 's
         | simply that e.g. prepositions are affixed to the end of a word
         | big whoop. There are many dimensions to language vairation.
         | Finnish has a long history of contact with Scandi languages and
         | a lot of borrowed words and logic. It would be good to have
         | Estonian and possibly Baltic languages too.
         | 
         | ETA: It is differentof course just perhaps not as much as
         | people sometimes try to say. You can definitely ruffle some
         | feathers with this one given the uniqueness of Finnish is
         | pretty central to Finnish nationalism.
        
       | halgir wrote:
       | > extends to include Danish, Finnish, Norwegian, Icelandic,
       | Swedish
       | 
       | * cries in Faroese *
        
       | jug wrote:
       | If you're interested in this, don't miss AI Sweden's GPT-SW3 @
       | 126M to 40B trained on Nordic languages (not Finnish) and
       | English. It's funded by the Swedish government and partners, and
       | freely available with a pretty lively Discord for ongoing AI
       | research focusing on the Nordic languages. I think Viking is
       | called "first" because it includes Finnish, because otherwise,
       | GPT-SW3 was released earlier.
       | 
       | https://huggingface.co/AI-Sweden-Models
        
         | lostmsu wrote:
         | Why do they do training from scratch instead of starting off
         | LLAMA 3 or something else?
        
       | Bedon292 wrote:
       | I cannot seem to find a link to the actual model from this page
       | or anywhere on the website. This appears to be it:
       | https://huggingface.co/LumiOpen/Viking-7B
        
       | larodi wrote:
       | The fact it was trained on HPC which covers 20% heat consumption
       | in a city is absolutely wild and on par with how wild it is to
       | have English/Nordic model.
       | 
       | " Further emphasizing digital sovereignty, Viking is trained on
       | the EuroHPC supercomputer LUMI, utilizing up to 4096 AMD MI-250X
       | GPUs. LUMI is not only Europe's most powerful supercomputer and
       | the 5th most powerful in the world, but also the 3rd greenest
       | supercomputer among the top 500 supercomputers. LUMI's energy
       | consumption is covered with power produced 100% with
       | hydroelectricity, and the waste heat of LUMI will account for
       | about 20 percent of the district heating in the surrounding city
       | of Kajaani. "
        
       ___________________________________________________________________
       (page generated 2024-05-15 23:02 UTC)