[HN Gopher] Tests suggest clues of whose content was used to tra...
       ___________________________________________________________________
        
       Tests suggest clues of whose content was used to train OpenAI's
       Sora
        
       Author : kgwgk
       Score  : 20 points
       Date   : 2025-10-01 19:48 UTC (3 hours ago)
        
 (HTM) web link (www.washingtonpost.com)
 (TXT) w3m dump (www.washingtonpost.com)
        
       | DroneBetter wrote:
       | https://archive.is/ozjEb (note some of the gifs become static
       | images here)
        
       | codedokode wrote:
       | This vague situation with copyright plays against open-source AI
       | models who have to disclose the sources of training data, while
       | closed-source companies can freely use pirated material and get
       | advantage over open-source models.
        
       | smegma2 wrote:
       | I'm normally skeptical of claims like this, but looking at the
       | examples it seems that Sora is reproducing some of its training
       | data verbatim. I guess it's a case of overfitting? In particular
       | the Civ example seems like it must have been copied almost
       | verbatim.
        
       ___________________________________________________________________
       (page generated 2025-10-01 23:02 UTC)