[HN Gopher] Release of Fugaku-LLM - a large language model train...
       ___________________________________________________________________
        
       Release of Fugaku-LLM - a large language model trained on
       supercomputer Fugaku
        
       Author : gslin
       Score  : 44 points
       Date   : 2024-05-13 21:01 UTC (1 hours ago)
        
 (HTM) web link (www.fujitsu.com)
 (TXT) w3m dump (www.fujitsu.com)
        
       | koito17 wrote:
       | Does anyone know how this model compares to GPT-4 for Japanese
       | output? Taking a look at GPT-4o today, the Japanese example in
       | the landing page feels unnatural and represents a huge regression
       | from the quality I would expect of GPT-4.
       | 
       | With that said, both GPT-4 and GPT-4o seem to do a good job at
       | understanding the semantics of prompts written in Japanese. I
       | would like to see how this model compares, given that it seems
       | like it's trained with more Japanese data (but that may not
       | necessarily be useful if all they did was scrape affiliate blogs)
        
         | rgrieselhuber wrote:
         | I've gotten very good results with Japanese output in GPT-4 but
         | it takes a little work.
        
           | wahnfrieden wrote:
           | You haven't tried the recent Japanese specialized gpt4
           | variant? Hope it's updated for gpt4o.
        
       | alexey-salmin wrote:
       | > GPUs (7) are the common choice of hardware for training large
       | language models. However, there is a global shortage of GPUs due
       | to the large investment from many countries to train LLMs. Under
       | such circumstances, it is important to show that large language
       | models can be trained using Fugaku, which uses CPUs instead of
       | GPUs. The CPUs used in Fugaku are Japanese CPUs manufactured by
       | Fujitsu, and play an important role in terms of revitalizing
       | Japanese semiconductor technology.
        
         | LeoPanthera wrote:
         | ARM CPUs, specifically. Fairly unusual in the Top500 list.
        
       | apsec112 wrote:
       | This honestly feels kind of silly. Back-of-the-envelope, this
       | training run (13B model on 380B tokens) could be done in two
       | months on a single 8x H100 node using off-the-shelf software, at
       | a cost of around $35K from a cloud compute provider. They don't
       | seem to list training time, but this cluster appears to use ~3.5
       | MW of power, so it's going to burn something like $500/hr just in
       | electricity costs.
        
         | LeoPanthera wrote:
         | But the whole point of this was to show that it could be done
         | on CPUs during the worldwide shortage of GPUs.
         | 
         | Doing it in the CPU was the whole point.
        
           | fnordpiglet wrote:
           | It could be done using abacuses too. A global shortage of
           | GPUs is probably better addressed by producing more GPUs and
           | making techniques more efficient.
        
       | NKosmatos wrote:
       | For those curious about Fugaku, it's currently the 4th fastest
       | supercomputer in the TOP500 list:
       | https://www.top500.org/system/179807/
        
       ___________________________________________________________________
       (page generated 2024-05-13 23:00 UTC)