[HN Gopher] Sweatshop Data Is Over
       ___________________________________________________________________
        
       Sweatshop Data Is Over
        
       Author : whoami_nr
       Score  : 41 points
       Date   : 2025-08-07 14:00 UTC (9 hours ago)
        
 (HTM) web link (www.mechanize.work)
 (TXT) w3m dump (www.mechanize.work)
        
       | jrimbault wrote:
       | > This meant that while Google was playing games, OpenAI was able
       | to seize the opportunity of a lifetime. What you train on
       | matters.
       | 
       | Very weird reasoning. Without AlphaGo, AlphaZero, there's
       | probably no GPT ? Each were a stepping stone weren't they?
        
         | phreeza wrote:
         | Transformers/Bert yes, alphago not so much.
        
         | vonneumannstan wrote:
         | >Very weird reasoning. Without AlphaGo, AlphaZero, there's
         | probably no GPT ? Each were a stepping stone weren't they?
         | 
         | Right but wrong. Alphago and AlphaZero are built using very
         | different techniques than GPT type LLMs. Google created
         | Transformers which leads much more directly to GPTs, RLHF is
         | the other piece which was basically created inside OpenAI by
         | Paul Cristiano.
        
         | msp26 wrote:
         | OpenAI's work on Dota was also very important for funding
        
         | jimbo808 wrote:
         | Google Brain invented transformers. Granted, none of those
         | people are still at Google. But it was a Google shop that made
         | LLMs broadly useful. OpenAI just took it and ran with it,
         | rushing it to market... acquiring data by any means
         | necessary(!)
        
           | 9rx wrote:
           | _> OpenAI just took it and ran with it_
           | 
           | As did Google. They had their own language models before and
           | at the same time, but chose different architectures for them
           | which made them less suitable to what the market actually
           | wanted. Contrary to the above claim, OpenAI seemingly "won"
           | because of GPT's design, not so much because of the data
           | (although the data was also necessary).
        
         | ethan_smith wrote:
         | Agreed - AlphaGo/Zero's reinforcement learning breakthroughs
         | were foundational for modern AI, establishing techniques like
         | self-play and value networks that influenced transformer
         | architecture development.
        
       | losteric wrote:
       | > Despite being trained on more compute than GPT-3, AlphaGo Zero
       | could only play Go, while GPT-3 could write essays, code,
       | translate languages, and assist with countless other tasks. The
       | main difference was training data.
       | 
       | This is kind of weird and reductive, comparing specialist to
       | generalist models? How good is GPT3's game of Go?
       | 
       | The post reads as kind of... obvious, old news padding a
       | recruiting post? We know OpenAI started hiring the kind of
       | specialist workers this post mentions, years ago at this point.
        
         | 9rx wrote:
         | _> This is kind of weird and reductive, comparing specialist to
         | generalist models_
         | 
         | It is even weirder when you remember that Google had already
         | released Meena[1], which was trained on natural language...
         | 
         | [1] And BERT before it, but it is less like GPT.
        
         | rcxdude wrote:
         | Also, the main showcase of the 'zero' models was that they
         | learnt with zero training data: the only input was interacting
         | with the rules of the game (as opposed to learning to mimic
         | human games), which seems to be the kind of approach the
         | article is asking for.
        
       | rob74 wrote:
       | It's kind of reassuring that the old adage "garbage in, garbage
       | out" still applies in the age of LLMs...
        
       | atrettel wrote:
       | I am quite happy that this post argues in favor of subject-matter
       | expertise. Until recently I worked at a national lab. I had many
       | people (both leadership and colleagues) tell me that they need
       | fewer if any subject-matter experts like myself because ML/AI can
       | handle a lot of those tasks now. To that effect, lab leadership
       | was directing most of the hiring (both internal and external)
       | towards ML/AI positions.
       | 
       | I obviously think that we still need subject-matter experts. This
       | article argues correctly that the "data generation process" (or
       | as I call it, experimentation and sampling) requires "deep
       | expertise" to guide it properly past current "bottlenecks".
       | 
       | I have often phrased this to colleagues this way. We are reaching
       | a point where you cannot just throw more data at a problem
       | (especially arbitrary data). We have to think about what data we
       | intentionally use to make models. With the right sampling of
       | information, we may be able to make better models more cheaply
       | and faster. But again, that requires knowledge about what data to
       | include and how to come up with a representative sample with
       | enough "resolution" to resolve all of the nuances that the
       | problem calls for. Again, that means that subject-matter
       | expertise does matter.
        
         | 9rx wrote:
         | _> I am quite happy that this post argues in favor of subject-
         | matter expertise_
         | 
         | The funny part is that it argues in favour of scientific
         | expertise, but at the end it says they actually want to hire
         | engineers instead.
         | 
         | I suppose scientists will tell you that has always been par for
         | the course...
        
         | lawlessone wrote:
         | Without the actual SME's they'll be flying blind not knowing
         | where the models get things wrong.
         | 
         | Hopefully nothing endangers people..
        
         | m463 wrote:
         | This all reminds me of this really interesting book "The
         | Inevitable" by kevin kelly.
         | 
         | It had a fascinating look into the future, and in this case one
         | insight in particular.
         | 
         | It basically said that in the future, answers would be cheap
         | and plentiful, and questions would be valuable.
         | 
         | With AI I think this will become more true every day.
         | 
         | Maybe AI can answer anything, but won't we still need people to
         | ask the right questions?
         | 
         | https://en.wikipedia.org/wiki/The_Inevitable_(book)
        
       | Sevii wrote:
       | It's still too early but at some point we are going to start to
       | see infra and frameworks designed to be easier for LLMs to use.
       | Like a version of terraform intended for AI. Or an edition of the
       | AWS api for LLMs.
        
       | Animats wrote:
       | (Article is an employment ad.)
       | 
       | Is that actually true. Is the mini-industry of people looking at
       | pictures and classifying them dead? Does Mechanical Turk still
       | get much use?
        
       | getnormality wrote:
       | It's interesting to compare this to the new third generation
       | benchmarks from ARC-AGI, which are essentially a big collection
       | of seemingly original puzzle video games. Both Mechanize (OP) and
       | ARC want AI to start solving more real-world, long-horizon tasks.
       | Mechanize wants to get AI working directly on real software
       | development, while ARC suggests a focus on much simpler IQ test-
       | style tasks.
        
       ___________________________________________________________________
       (page generated 2025-08-07 23:02 UTC)