[HN Gopher] Viking 7B: open LLM for the Nordic languages trained...
___________________________________________________________________
Viking 7B: open LLM for the Nordic languages trained on AMD GPUs
Author : reqo
Score : 96 points
Date : 2024-05-15 16:05 UTC (6 hours ago)
(HTM) web link (www.silo.ai)
(TXT) w3m dump (www.silo.ai)
| dmichulke wrote:
| Is there something similar for romance or Germanic languages?
|
| And how did they decide that, e.g., German or Dutch would make
| the model worse?
| frodo8sam wrote:
| I don't think they decided that, they included Finish which is
| completely unrelated to the other nordic languages. If they
| just picked languages that are related for cross learning
| including Dutch or German would have made more sense indeed.
| coffeebeqn wrote:
| The root is unrelated but Finnish has certainly been shaped
| by Swedish and Russian and most recently English languages in
| the last 200 years
| Jensson wrote:
| Including Finnish was probably just a political choice, since
| Finland and Sweden are very close politically, much closer
| than Germany or other areas with more similar languages.
| Asraelite wrote:
| This was done by a Finnish company and university. They
| would've included Finnish even without any political
| motivation.
| chymist wrote:
| The company is based in Finland, they started with Finnish
| frodo8sam wrote:
| I understand, not saying they did something wrong just
| pointing out the selection of languages was not because of
| them belonging to the same family but rather to serve a
| certain region.
| KeplerBoy wrote:
| The nordic languages are germanic except for finnish, but yeah
| finnish is an exception and I'd expect most small LLMs to
| struggle with it.
| bangaladore wrote:
| I have had this question. How much better would common LLMs
| (Llama, GPTN) be if they were only trained in one language? I
| have to assume they would perform better, but I might be wrong.
| fermuch wrote:
| Just like adding code to textual models helps the model develop
| its reasoning capabilities, it seems like adding more languages
| helps in other areas too. What is needed is more good quality
| data to train on...
| nickpsecurity wrote:
| We also see humans get worse at specific things when they
| learn too much in general. There is a cut-off point to how
| many concepts we can learn with what skill. To be most
| effective, we have to specialize in the right things while
| continuing to acquire generalist knowledge. It's a balancing
| act.
|
| These architectures are _less_ capable than brains in many
| ways. So, we should expect them to have such trade-offs. An
| efficient one should work fine on English, mathematical
| notation, and a programming language. Maybe samples of others
| that illustrate unique concepts. I'm also curious how many
| languages or concepts you can add to a given architecture
| before its effectiveness starts dropping.
| worldsayshi wrote:
| I guess you mean non-textual data then because the amount of
| text data they are being trained on ought to be enough for
| agi by now?
|
| Some kind of diminishing returns asymptote from text volume
| alone must have been hit a long time ago.
| imtringued wrote:
| It's not the amount that is wrong, it's how the model is
| trained. The model is trained for zero and few shot tasks.
| It is not surprising that it is performing well when you
| ask for that.
| darby_eight wrote:
| > its reasoning capabilities
|
| To be clear, LLMs are not capable of reasoning.
| whimsicalism wrote:
| imo this is an uninteresting debate over
| semantics/metaphysics
| ganzuul wrote:
| Would you say a deontologist reasons? Evolution survives,
| but does it reason?
|
| Is it reasonable to show interest in something you call
| uninteresting?
|
| Was Godel a reasonable man, starving to death in fear of
| being poisoned?
| coffeebeqn wrote:
| Perform better how? Knowing more languages gives you more data
| and different points of view rather than just using the English
| corpus and culture. When I ask chatgpt for a translation it
| seems to understand the meaning behind the words and finds the
| closest thing in the other language. The datasets seem to merge
| in some way
| ClarityJones wrote:
| Fair, but there may be overhead that doesn't need to exist.
| Certainly - for the limited compute my brain can accomplish -
| I could gain a deeper understanding of physics, if I focused
| on learning physics and didn't also have to simultaneously
| learn French.
| NhanH wrote:
| Human is not limited by computational power of brain (or
| rather, it is not the limitation we encounter). We are
| limited by time and the fact that our machinery degrades
| with time (aging).
| olddustytrail wrote:
| In the short term. In the longer term you'll understand
| concepts better when you're multilingual.
| staticman2 wrote:
| Wouldn't a better metaphor be if a child growing up in a
| bilingual household would be worse at physics as an adult?
| My guess would be growing up bilingual would have no
| impact.
| whimsicalism wrote:
| they would perform worse, i promise you
| rangerelf wrote:
| "I promise you"?
|
| This is Hackernews, I would have expected data, not promises.
| ClarityJones wrote:
| I think this makes sense to the extent that an understanding
| of the differences between language helps separate out
| language from the underlying meaning. However... the models
| that are used receive input (i.e. translate from language),
| and to learn / understand, and to output information (i.e.
| re-encode into language), do not all have to be the same.
| richdougherty wrote:
| I can't track down the citation (either Google or DeepMind I
| think), but I remember reading research from a year or two ago
| how adding extra languages (French, German) improved English
| language performance. There may have also been an investigation
| about multi modality too, which found that adding vision or
| audio helped with text as well.
| ganzuul wrote:
| Great talking points. These are highly relevant subjects and I'm
| delighted we in the Nordics are keeping up with current
| developments. This work is important for preserving our culture.
|
| I hope to see this used to generate a customized curriculum for
| each neurodiverse child so that we can live in a more equitable
| society.
| nwoli wrote:
| No offense but this reply is giving me such "generated by an
| LLM" vibes, I'm curious if it is
| ganzuul wrote:
| Who knows? Maybe I am an AI set to break encryption and I'm
| just hallucinating this, a training environment.
| jarbus wrote:
| Would love to know more about their experience training on AMD
| GPUs. Was it just as seamless as using Cuda?
| ganzuul wrote:
| > To leverage the capabilities of MI250X, ROCm enables the use
| of GPU matrix cores through its rocBLAS and MIOpen library
| implementations that, in turn,are leveraged by PyTorch.
|
| - https://aclanthology.org/2023.emnlp-main.164.pdf
|
| https://github.com/TurkuNLP/
| imtringued wrote:
| They probably got a lot of hand holding from AMD.
| KeplerBoy wrote:
| Having access to enterprise GPUs on one of the biggest HPCs
| systems in Europe is probably enough.
|
| AMDs bad rep in AI is mostly due to flaky support of its
| consumer GPUs.
| ChrisArchitect wrote:
| double slash in the shared link probably not ideal (though
| inconsequential)
|
| https://www.silo.ai/blog/viking-7b-the-first-open-llm-for-th...
| melenaboija wrote:
| Although not nordic not including basque which I guess could also
| be considered an European low-resource language.
| ghnws wrote:
| I got the impression they are focusing on the nordic culture as
| much as the languages.
|
| >Silo AI and TurkuNLP are dedicated to developing models that
| not only excel in linguistic performance and inclusivity but
| are also attuned to local values and cultures.
| matsemann wrote:
| Would an LLM trained on a smaller language have better cultural
| awareness etc than one trained in English? Because English is
| written all over the world by all kinds of people, an English LLM
| will average that (and for instance feel a bit off for an
| American). But a Norwegian LLM for instance, trained on a
| language mostly written by Norwegians, would that feel more
| natural to me in comparison?
| smokracek wrote:
| First thing I notice is that Finnish is part of a completely
| different language family from the other Nordic languages and
| English (Uralic vs. Indo-European). I wonder to what extent this
| affects the effectiveness of their low-resource training. Finnish
| is highly agglutinative, adding prefixes and suffixes to modify a
| root. My (amateur) take is that the tokenization and attention
| patterns may differ a lot? Would love to see more educated people
| than I discuss this.
| ghnws wrote:
| Then again the culture of Finland is very similar to the other
| nordics, which looks to be one of the reasons for the project.
| sandworm101 wrote:
| >> to what extent this affects the effectiveness of
|
| The correct use of those words demonstrates that you are either
| not an AI, all of them being trained on so much bad language,
| or are an AI from a more perfect future.
| anewhnaccount3 wrote:
| Finnish is not _so_ different dispute having different
| lineages. Even if we talk about morphology, sometimes it 's
| simply that e.g. prepositions are affixed to the end of a word
| big whoop. There are many dimensions to language vairation.
| Finnish has a long history of contact with Scandi languages and
| a lot of borrowed words and logic. It would be good to have
| Estonian and possibly Baltic languages too.
|
| ETA: It is differentof course just perhaps not as much as
| people sometimes try to say. You can definitely ruffle some
| feathers with this one given the uniqueness of Finnish is
| pretty central to Finnish nationalism.
| halgir wrote:
| > extends to include Danish, Finnish, Norwegian, Icelandic,
| Swedish
|
| * cries in Faroese *
| jug wrote:
| If you're interested in this, don't miss AI Sweden's GPT-SW3 @
| 126M to 40B trained on Nordic languages (not Finnish) and
| English. It's funded by the Swedish government and partners, and
| freely available with a pretty lively Discord for ongoing AI
| research focusing on the Nordic languages. I think Viking is
| called "first" because it includes Finnish, because otherwise,
| GPT-SW3 was released earlier.
|
| https://huggingface.co/AI-Sweden-Models
| lostmsu wrote:
| Why do they do training from scratch instead of starting off
| LLAMA 3 or something else?
| Bedon292 wrote:
| I cannot seem to find a link to the actual model from this page
| or anywhere on the website. This appears to be it:
| https://huggingface.co/LumiOpen/Viking-7B
| larodi wrote:
| The fact it was trained on HPC which covers 20% heat consumption
| in a city is absolutely wild and on par with how wild it is to
| have English/Nordic model.
|
| " Further emphasizing digital sovereignty, Viking is trained on
| the EuroHPC supercomputer LUMI, utilizing up to 4096 AMD MI-250X
| GPUs. LUMI is not only Europe's most powerful supercomputer and
| the 5th most powerful in the world, but also the 3rd greenest
| supercomputer among the top 500 supercomputers. LUMI's energy
| consumption is covered with power produced 100% with
| hydroelectricity, and the waste heat of LUMI will account for
| about 20 percent of the district heating in the surrounding city
| of Kajaani. "
___________________________________________________________________
(page generated 2024-05-15 23:02 UTC)