hngopher.com

       [HN Gopher] Show HN: less than 650 LOC trainable GPT only using ...
       ___________________________________________________________________
        
       Show HN: less than 650 LOC trainable GPT only using NumPy
        
       Author : joennlae
       Score  : 74 points
       Date   : 2023-11-17 14:34 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | cuuupid wrote:
       | I think people are forgetting that transformer architectures are
       | a wider field from GPT and predate GPT3 by 3+ years. Referring to
       | transformer architectures using a branded commercial nomer (GPT)
       | is just going to help cement OpenAI's brand exposure and soon
       | regulatory capture.
       | 
       | For comparison this would be like referring to convonets as
       | Inception architectures back during the CV boom (or VGGnets
       | before that)
        
         | PartiallyTyped wrote:
         | The most interesting thing in this whole saga is that decoder
         | only models (aka causal transformers like GPT) are as effective
         | as they are.
        
         | tverbeure wrote:
         | FWIW: the GitHub project description says "GPT-like". It's the
         | title here that dropped the "like".
        
         | jimmyl02 wrote:
         | One small difference is that the GPT architecture is just the
         | decoder stack of the original transformer as opposed to the
         | full encoder decoder stack in the original.
         | 
         | I agree the branding play on GPTs in general is pretty smart
         | and strong from OpenAI though.
        
           | cchance wrote:
           | Honestly i feel like the fact that everyone is just calling
           | LLM's GPT at this point doesn't really help OpenAI, ChatGPT
           | would, but the fact is that unlike "googling" something
           | became synonymous for searching on the internet, GPT !=
           | OpenAI-ing something, GPT just became what people call LLM's
           | it seems like lately, the fact the term isn't the name of the
           | company or the full name "chatgpt-ing" sort of breaks that
           | hold i feel like.
        
         | __loam wrote:
         | Regarding regulatory capture, I listened to an interview with
         | Lena Khan, the current head of the FTC, and this exact thing
         | came up as something regulators are worried about. I think
         | regulators are aware of the danger of letting industry insiders
         | regulate their own industry, so I'm hopeful for some sensible
         | regulations that help promote rather than harm competition. The
         | FTC also exists to prevent monopoly.
        
       | p1esk wrote:
       | I wonder how easy it would be to port this library from numpy to
       | cupy.
        
         | eslaught wrote:
         | Or cuNumeric: https://developer.nvidia.com/cunumeric
        
       | gfaure wrote:
       | Nice! The README mentions `LayerNorm` is implemented here, but
       | while it's in the equivalence tests with PyTorch, I don't see it
       | in the implementation.
        
         | dauertewigkeit wrote:
         | It's part of the TensorLi definition where all the magic
         | happens.
        
       ___________________________________________________________________
       (page generated 2023-11-17 23:01 UTC)