[HN Gopher] Builder.ai did not "fake AI with 700 engineers"
___________________________________________________________________
Builder.ai did not "fake AI with 700 engineers"
Author : tanelpoder
Score : 50 points
Date : 2025-06-12 17:47 UTC (5 hours ago)
(HTM) web link (newsletter.pragmaticengineer.com)
(TXT) w3m dump (newsletter.pragmaticengineer.com)
| cratermoon wrote:
| Unnamed former employees of a dead company say company didn't
| fake it. Film at 11.
| alephnerd wrote:
| I tend to trust Gergely Orosz (the writer of Pragmatic
| Engineer). He validate sources and has a good track record on
| reporting on the European tech scene and Engineering
| Management.
|
| His blog and newsletter are both fairly popular on HN.
| senko wrote:
| This was analyzed on HN a week or so ago:
| https://news.ycombinator.com/item?id=44176241
|
| The "700 engineers faking AI" claim seems to have been
| sloppy[0] reasoning by an influencer, which spread like
| wildfire.
|
| [0] I won't attribute malice here, but this version was
| certainly more interesting than the truth
| mediaman wrote:
| The original story doesn't make any sense. How would you fake
| an "AI" agent coding by using people on the other side? Woudn't
| it be...obvious? People cannot type code that fast.
|
| What's your non-snarky theory about how this could possibly be
| true?
| ceejayoz wrote:
| You claim you have a queue and it takes up to 24 hours for
| your job to run?
| apwell23 wrote:
| It was obviously not prompt and get response model like
| chatgpt.
| wnevets wrote:
| Are there people who actually believe that a user would enter a
| text prompt than a human programmer would generate the code?
| tomasphan wrote:
| Yes, 90% of people with no tech background reading the news
| TiredOfLife wrote:
| Majority of HN commenters
| apwell23 wrote:
| that was not the flow
| dd_xplore wrote:
| Unfortunately a lot of people!!
| hluska wrote:
| Builder.ai had a totally different flow, but yeah, when boring
| stories and exciting ones compete to tell the same story, a
| very large percentage will run with the exciting story. It's
| like death tax in US political history - the US has never had a
| death tax but it's way more exciting to call it a death tax
| than an estate tax. Only now, instead of media being the
| primary disseminator of spin, we have people sharing exciting
| stories on social media instead of boring stories about
| building internal zoom and accounting issues.
|
| Then social animals kick in, likes pour in and more people
| share. Social media has created a world where an exciting lie
| can drown out boring truth for a large percentage of people.
| DebtDeflation wrote:
| My assumption when the story broke was that the 700 engineers
| were using various AI tools (Replit, Cursor, ChatGPT, etc.) to
| create code and documentation and then stitching it all
| together somewhat manually. Sort of like that original Devin
| demo where AI was being used at each step but there was a ton
| of manual intervention along the way and the final video was
| edited to make it seem as if the whole thing ran end to end
| fully automated all from the initial prompt.
| TuringNYC wrote:
| I worked with an "AI data vendor" at work where you'd put in a
| query and "the AI gave you back a dataset" but it usually took
| 24hrs, so it was obvious they had humans pulling the data. The
| company still purchased a data plan. It happens, in this case,
| they have a unique dataset, though.
| mellosouls wrote:
| Kudos to the author for the update - and also to others including
| @dang for calling it out at the time:
|
| https://news.ycombinator.com/item?id=44169759
|
| _(Builder.ai Collapses: $1.5B 'AI' Startup Exposed as
| 'Indians'?, 367 points, 267 comments)_
| tomasphan wrote:
| I don't believe that their business entirely depended on 700
| actual humans, just as much as I don't believe that to be true
| for the Amazon store. However, both probably relied on humans in
| the loop which is not sustainable at scale.
| fragmede wrote:
| at what scale though? as long as money line go up faster than
| cost line go up, it's fine?
| Legend2440 wrote:
| If you read the article, they had two separate products: one of
| which was 700 actual humans, and the other was an LLM-powered
| coding tool.
| gamblor956 wrote:
| LLMs are all fake AI. As the recently released Apple study
| demonstrates, LLMs don't reason, they just pattern match. That's
| not "intelligence" however you define it because they can only
| solve things that are already within their training set.
|
| In this case, it would have been better for the AI industry if it
| had been 700 programmers, because then the rest of the industry
| could have argued that the utter trash code Builder.ai generated
| was the result of human coders spending a few minutes haphazardly
| typing out random code, and not the result of a specialty-trained
| LLM.
| aeve890 wrote:
| >As the recently released Apple study demonstrates, LLMs don't
| reason, they just pattern match
|
| Hold on a minute I was under the impression that "reasoning"
| was just marketing buzzword the same as "hallucinations",
| because how tf anyone expected GPUs to "reason" and
| "hallucinate" when even neurology/psychology don't have a
| strict definition of those processes.
| jacobr1 wrote:
| No, the definitions are very much up for debate, but there is
| an actual process here. "Reasoning" in this case means having
| the model not just produce whatever output is requested
| directly, but also spend some time writing out its thoughts
| about how to produce that output. Early version of this were
| just prompt engineering where you ask the model to produce
| its "chain of thought" or "work step by step" on how to
| approach the problem. Later this was trained into the model
| directly with traces on this intermediate thinking,
| especially for multistep problems, without the need for
| explicit prompting. And then architecturally these models now
| have different ways to determine when to stop "reasoning" to
| skip to generating actual output.
|
| I don't have a strict enough definition to debate if this
| reasoning is "real" - but from personal experience it
| certainly appears to be performing something that at least
| "looks" like inductive thought, and leads to better answers
| than prior model generations without reasoning/thinking
| enabled.
| codr7 wrote:
| Reasoning means what reasoning always meant.
|
| Selling an algorithm that can write a list of steps as
| reasoning is bordering on fraud.
|
| It's not uncommon that they guess the right solution, and
| then "reason" their way out of it.
| klank wrote:
| It's gradient descent. Why are we surprised when the
| answers get better the more we do it? Sometimes you're
| stuck in a local max/minima, and you hallucinate.
|
| Am I oversimplifying it? Is everybody else over-mystifying
| it?
| throwaway314155 wrote:
| > As the recently released Apple study demonstrates, LLMs don't
| reason
|
| Where is everyone getting this misconception? I have seen it
| several times. First off, the study doesn't even try to qualify
| whether or not these models use "actual reasoning" - that's
| outside of the scope. They merely examine how effective
| thinking/reasoning _is_ at producing better results. They found
| that - indeed - reasoning improves performance. But the crucial
| result is that it only improves performance up to a certain
| difficulty-cliff - at which point thinking makes no discernable
| difference due to a model collapse of sorts.
|
| It's important to read the papers you're using to champion your
| personal biases.
| UebVar wrote:
| > because they can only solve things that are already within
| their training set.
|
| That is just plain wrong, as anybody who spent more than 10
| minutes with a LLM within the last 3 years can attest. Give it
| a try, especially if you care to have an opinion on them. Ask
| an absurd question (that can be, in principle, answered) that
| nobody has asked before and see how it performs generalizing.
| The hype is real.
|
| I'm interested what study you refer to. Because I'm interested
| in their methods and what they actually found out.
| spion wrote:
| What you think is an absurd question may not be as absurd as
| it seems, given the trillions of tokens of data on the
| internet, including its darkest corners.
|
| In my experience, its better to simply try using LLMs in
| areas where they don't have a lot of training data (e.g.
| reasoning about the behaviour of terraform plans). Its not a
| hard cutoff of being _only_ able to reason exactly about
| solved things, but its not too far off as a first
| approximation.
|
| The researchers took exiting known problems and parameterised
| their difficulty [1]. While most of these are not by any
| means easy for humans, the interesting observation to me was
| that the failure_N was not proportional to the complexity of
| the problem, but more with how common solution "printouts"
| for that size of the problem can typically be encountered in
| the training data. For example, "towers of hanoi" which has
| printouts of solutions for a variety of sizes went to very
| large number of steps N, while the river crossing, which is
| almost entirely not present in the training data for N larger
| than 3, failed above pretty much that exact number.
|
| [1]: https://machinelearning.apple.com/research/illusion-of-
| think...
| CSSer wrote:
| It doesn't help that thanks to RLHF, every time a good
| example of this gains popularity, e.g. "How many Rs are in
| 'strawberry'?", it's often snuffed out quickly. If I worked
| at a company with an LLM product, I'd build tooling to look
| for these kinds of examples in social media or directly in
| usage data so they can be prioritized for fixes. I don't
| know how to feel about this.
|
| On the one hand, it's sort of like red teaming. On the
| other hand, it clearly gives consumers a false sense of
| ability.
| jvanderbot wrote:
| "The apple study" is being overblown too, but here it is:
| https://machinelearning.apple.com/research/illusion-of-
| think...
|
| The crux is that beyond a bit of complexity the whole house
| of cards comes tumbling down. This is trivially obvious to
| any user of LLMs who has trained _themselves_ to use LLMs (or
| LRMs in this case) to get better results ... the usual "But
| you're prompting it wrong" answer to any LLM skepticism.
| Well, that's definitely true! But it's also true that these
| aren't magical intelligent subservient omniscient creatures,
| because that would imply that they would learn how to work
| with _you_. And before you say "moving goalpost" remember,
| this is _essentially_ what the world thinks they are being
| sold.
|
| It can be both breathless hysteria _and_ an amazing piece of
| revolutionary and useful technology at the same time.
|
| The training set argument is just a fundamental
| misunderstanding, yes, but you should think about the
| contrapositive - can an LLM do well on things that are
| _inside_ its training set? This paper does use examples that
| are present all over the internet including solutions. Things
| children can learn to do well. Figure 5 is a good figure to
| show the collapse in the face of complexity. We've all seen
| that when tearing through a codebase or trying to "remember"
| old information.
| tough wrote:
| I think apple published that study right before WWDC to
| have an excuse to not give bigger than 3B foundation models
| locally and force you to go via their cloud -for reasoning-
| harder tasks.
|
| beta api's so its moving waters but that's my thoughts
| after playing with it, the paper makes much more sense in
| that context
| ChrisMarshallNY wrote:
| _> because they can only solve things that are already within
| their training set_
|
| I just gave up on using SwiftUI for a rewrite of a backend
| dashboard tool.
|
| The LLM didn't give up. It kept suggesting wilder, and less
| stable ideas, until I realized that this was a rabbithole full
| of misery, and went back to UIKit.
|
| It wasn't the LLM's fault. SwiftUI just isn't ready for the
| particular functionality I needed, and I guess that a day of
| watching ChatGPT get more and more desperate, saved me a lot of
| time.
|
| But the LLM didn't give up, which is maybe ot-nay oo-tay ight-
| bray.
|
| https://despair.com/cdn/shop/files/stupidity.jpg
| meowface wrote:
| AI skepticism is like a religion at this point. Weird it's so
| prominent on a tech site.
|
| (The Apple paper has had many serious holes poked in it.)
| b00ty4breakfast wrote:
| Well, if the thing is truly capable of reason, then we have
| an obligation to put the kibosh on the entire endeavor
| because we're using a potentially intelligent entity as slave
| labor. At best, we're re-inventing factory farming and at
| worst we're re-inventing chattel slavery. Neither of those
| situations is something I'm personally ok with allowing to
| continue
| klank wrote:
| I concur.
|
| I also find the assumption that tech-savvy individuals
| would inherently be for what we currently call AI to
| itself, be weird. Unfortunately I feel as though being
| knowledgable or capable within an area is conflated with an
| over-acceptance of that area.
|
| If anything, the more I've learned about technology, and
| the more experienced I am, the more fearful and cautious I
| am with it.
| alerter wrote:
| > Builder hired 300 internal engineers and kicked off building
| internal tools, all of which could have simply been purchased
|
| Tempted to say there was a bit of corruption here, crazy
| decision. Like someone had connections to the contractor
| providing all those devs.
|
| otoh they were an "app builder" company. Maybe they really wanted
| to dogfood.
| alephnerd wrote:
| A similar thing happened at Uber before the 2021 re-org. At one
| point they had 3 competing internal chat apps from what I've
| heard from peers working there, and having previously worked
| for a vendor of Uber's, I noticed a significant amount of
| disjointedness in their environment (seemed very disjointed EM
| driven with no overarching product vision).
|
| Ofc, Gergely might have some thoughts about that ;)
| quantadev wrote:
| I always knew this story was fake. Even if you have a trillion
| expert developers it would still be impossible to get fast enough
| answers to "Fake an LLM". Humans obviously aren't
| _parallelizable_ like that.
| firesteelrain wrote:
| " building internal versions of Slack, Zoom, JIRA, and more..."
|
| Did they really do this or customize Jira schemas and workflows
| for example ?
| Fraterkes wrote:
| The headline as stated is categorically false, buuuut... I think
| it's pretty salient that a company called "Builder.ai" only had
| 15 engineers working on actual ai and actually mostly functioned
| as an outsourcing intermediary for 500-1000 engineers (ie, the
| builders). When it comes to these viral misunderstandings, you
| kind of reap what you sow.
| stego-tech wrote:
| > Builder hired 300 internal engineers and kicked off building
| internal tools, all of which could have simply been purchased
|
| Dear god, _PLEASE_ hire an actual Enterprise IT professional
| early in your startup expansion phase. A single competent EIT
| person (or dinosaur like me) could have - if this story is true -
| possibly saved the whole startup by understanding what's
| immediately needed versus what's nice-to-have, what should be
| self-hosted versus what should be XaaS, stitching everything
| together to reduce silos, and ensuring every cent is not just
| accounted for but wisely invested in future success.
|
| Even if the rest of your startup isn't "worrying about the
| money", your IT and Finance people should _always_ be worried
| about the money.
___________________________________________________________________
(page generated 2025-06-12 23:00 UTC)