[HN Gopher] Release of Fugaku-LLM - a large language model train...
___________________________________________________________________
Release of Fugaku-LLM - a large language model trained on
supercomputer Fugaku
Author : gslin
Score : 44 points
Date : 2024-05-13 21:01 UTC (1 hours ago)
(HTM) web link (www.fujitsu.com)
(TXT) w3m dump (www.fujitsu.com)
| koito17 wrote:
| Does anyone know how this model compares to GPT-4 for Japanese
| output? Taking a look at GPT-4o today, the Japanese example in
| the landing page feels unnatural and represents a huge regression
| from the quality I would expect of GPT-4.
|
| With that said, both GPT-4 and GPT-4o seem to do a good job at
| understanding the semantics of prompts written in Japanese. I
| would like to see how this model compares, given that it seems
| like it's trained with more Japanese data (but that may not
| necessarily be useful if all they did was scrape affiliate blogs)
| rgrieselhuber wrote:
| I've gotten very good results with Japanese output in GPT-4 but
| it takes a little work.
| wahnfrieden wrote:
| You haven't tried the recent Japanese specialized gpt4
| variant? Hope it's updated for gpt4o.
| alexey-salmin wrote:
| > GPUs (7) are the common choice of hardware for training large
| language models. However, there is a global shortage of GPUs due
| to the large investment from many countries to train LLMs. Under
| such circumstances, it is important to show that large language
| models can be trained using Fugaku, which uses CPUs instead of
| GPUs. The CPUs used in Fugaku are Japanese CPUs manufactured by
| Fujitsu, and play an important role in terms of revitalizing
| Japanese semiconductor technology.
| LeoPanthera wrote:
| ARM CPUs, specifically. Fairly unusual in the Top500 list.
| apsec112 wrote:
| This honestly feels kind of silly. Back-of-the-envelope, this
| training run (13B model on 380B tokens) could be done in two
| months on a single 8x H100 node using off-the-shelf software, at
| a cost of around $35K from a cloud compute provider. They don't
| seem to list training time, but this cluster appears to use ~3.5
| MW of power, so it's going to burn something like $500/hr just in
| electricity costs.
| LeoPanthera wrote:
| But the whole point of this was to show that it could be done
| on CPUs during the worldwide shortage of GPUs.
|
| Doing it in the CPU was the whole point.
| fnordpiglet wrote:
| It could be done using abacuses too. A global shortage of
| GPUs is probably better addressed by producing more GPUs and
| making techniques more efficient.
| NKosmatos wrote:
| For those curious about Fugaku, it's currently the 4th fastest
| supercomputer in the TOP500 list:
| https://www.top500.org/system/179807/
___________________________________________________________________
(page generated 2024-05-13 23:00 UTC)