[HN Gopher] Training LLMs from ground zero as a startup
___________________________________________________________________
Training LLMs from ground zero as a startup
Author : swyx
Score : 100 points
Date : 2024-03-05 22:31 UTC (1 days ago)
(HTM) web link (www.yitay.net)
(TXT) w3m dump (www.yitay.net)
| swyx wrote:
| for context Yi Tay was Tech Lead on Google PaLM, UL2, Flan, Bard,
| etc and now is cofoudner at Reka (which has shipped some v
| interesting small multimodal models that have featured on here).
| I prompted him for this post as an ex-Googler now training LLMs
| as an independent startup
| https://twitter.com/YiTayML/status/1765105066263052718
|
| our conversation was recorded here
| https://sub.thursdai.news/p/thursdai-feb-15-2024-openai-chan...
| swyx wrote:
| (update: i submitted this yesterday and it didnt get traction,
| i guess @dang must've merged the old submission in here. you
| really didnt have to, but its a nice gesture. thanks dang!!)
| axpy906 wrote:
| Great too see you on here. Love Latent Space podcast.
| pama wrote:
| Training LLM from scratch is a super important issue that affects
| the pace and breadth of iteration of AI almost as much as the raw
| hardware improvements do. The blog is fun but somewhat shallow
| and not technical or very surprising if you've worked with
| clusters of GPUs in any capacity over the years. (I liked the
| perspective of a former googler, but I'm not sure why past
| colleagues would recommend Jax over pytorch for LLMs outside of
| Google.) I hope this newco eventually releases a more technical
| report about their training adventures, like the PDF file here:
| https://github.com/facebookresearch/metaseq/tree/main/projec...
| axpy906 wrote:
| If you're doing research JAX makes some sense. Probably some
| Google bias in there too.
| lyapunova wrote:
| To be honest, most researchers in applied ML in the bay say
| the opposite. If you are trying to be nimble and prototype,
| use pytorch. If you're trying to gain some optimizations as
| you near deployment, rewrite in Jax.
| abeppu wrote:
| It's worth taking a second to note that the author just assumes
| that readers understand "the wilderness" to mean "not Google".
|
| This post gives a lot of credit to Google's infra and hardware
| teams, and I'd love to read a perspective from one of those
| insiders who then went on to do related work elsewhere.
| yalok wrote:
| > All in all, this is only a small part of the story of how we
| started a company, raised some money, bought some chips and
| matched Gemini pro/GPT 3.5 and outperformed many others in less
| than a year having to build everything from scratch.
|
| I wonder what was the budget spent for the chips/cloud GPUs to
| achieve GPT 3.5 level LLM - at least in the order to magnitude -
| 2-5 millions?
| joe_the_user wrote:
| So essentially a startup in this context has a small number of
| people and a large amount of money for training clusters. The
| article describes many operation leasing servers - that you
| assume to go many startups (or existing firms).
|
| So it seems like you have the various LLM creators all doing
| roughly the same sort of thing (training with text and image
| data) with similar hardware and similar data. Each of these
| naturally has their own brand of "secret sauce" for
| distinguishing their venture. The various secret sauces can make
| a difference in the quality of an LLM's output.
|
| Yet overall, this seems like a massive, energy intensive exercise
| in redundancy.
| dauertewigkeit wrote:
| I don't think most of them have any kind of secret sauce. I
| think the founders hope to get bought out simply for being able
| to train "near-SOTA" LLMs. I guess achieving that level of
| skill and infra could be valuable enough to build upon.
| joe_the_user wrote:
| Sure, that's also a factor but I'd say it reinforces my main
| point.
| twelfthnight wrote:
| > To be very frank, I would have to say the quality of codebases
| externally significantly lag behind those I've been used to at
| Google
|
| Haven't worked at Google, anyone else share this sentiment? I
| always feel like working with Google code is typically not
| idiomatic and super difficult to go "under the hood" if anything
| isn't precisely on the happy path.
| winwang wrote:
| (not googler)
|
| Google's codebase is idiomatic to Google due to their strict
| language tooling. e.g. their C++ code stays away from advanced
| features. The tooling teams at Google have very strong say.
| twelfthnight wrote:
| I get that sense too. Probably does work awesome if you're
| inside. But man it's a mess when they externalize stuff. Just
| one example: their cloud platform CLI includes an entire
| python installation and takes 1.7G on disk, just to make API
| calls...
| jen20 wrote:
| I have never understood why cloud providers seem to think
| it is OK to write their CLIs in Python. The AWS one is too,
| and the Azure one went from Node.js to Python some time
| ago.
| anonymous-panda wrote:
| Packaging and stability reasons. Same for why it's a
| 1.7gb install - probably where they landed after having
| tons of support issues on some random Python version they
| didn't test or some issue with a dependency that had that
| issue. Freezing the entire set of artifacts is more
| stable and Python lets you move pretty quick. I can't
| speak to why nodejs vs Python though - maybe Python is
| easier to embed?
| pests wrote:
| What? They only get package and stability because they
| include the runtime. If they just went with a compiled
| language they could distribute native binaries and have
| actual packaging and stability.
| anonymous-panda wrote:
| Yes, but it's not just a single metric. Another is how
| easy it is for them to hire productive members of the
| team and how much that costs them - middling Python
| developers churning out fine"ish" code are cheaper than
| Rust developers doing the same. It's hard to find a
| language where you can be as productive as a developer in
| Python that also has AOT compilation to generate
| standalone binaries.
|
| Tldr: there's multiple factors to consider here and it's
| more interesting to understand the pressures that cause
| the decisions, especially if you want to try to create a
| world where different decisions are made.
| twelfthnight wrote:
| Yeah, I imagine that was the decision calculus. "Instead
| of spending some more effort to save millions of
| unnecessary downloads of python's runtime using a
| different language, let's just bundle Python!"
|
| I wouldn't be surprised if it was version 2.7 too...
| twelfthnight wrote:
| There probably is a sense in which the API's are
| constantly changing, so maybe an interpreted language
| might make sense? I imagine there has to be a better way
| to do with with Go or Rust though (even lua?) for a
| smaller binary.
| jyap wrote:
| It makes "sense" based on the domain of the cloud
| provider being DevOps teams who are maintaining and using
| these CLI tools. Ie. What they use day to day.
|
| For anything more advanced they offer language specific
| SDKs in Rust, Swift, Kolton, etc...
|
| For example integrating storage in an iOS app.
| marcyb5st wrote:
| Did you install all the components? Because if so you also
| installed emulators for the pubsub and big table (maybe
| others, I don't remember) which explain the big footprint.
| dheera wrote:
| > e.g. their C++ code stays away from advanced features
|
| Which honestly is a GOOD thing because it would make it much
| easier for newcomers to ramp up on existing codebases. Most
| people aren't used to working with spaceships and constexprs.
|
| Readability is also far more valuable to a large team than
| efficiency for anything that isn't a number-crunching loop.
| renegade-otter wrote:
| "Externally", no one could possibly beat Google's track record
| of not committing to products before finally killing them. But
| the code was beautiful, though!
| twelfthnight wrote:
| I mean, was Angular ever "beautiful"?
| resource0x wrote:
| Pretty sure it was. A lousy idea might still be implemented
| beautifully under the hood. :-)
| titanomachy wrote:
| I thought the quality was pretty high, largely because there
| were a lot of rails constraining how code should be written.
| Most of the code I dealt with was written using somewhat rigid
| (but generally well-designed) frameworks with programmatically-
| enforced style guides.
|
| Also, most work seemed to involve some balance of junior and
| more experienced people, which helped keep quality higher.
| Outside of Google, I've seen pretty large projects written by
| new grads with little supervision (and on a tight timeline).
| Those codebases can be pretty hairy.
| twelfthnight wrote:
| That honestly does seem like a recipe for good code. And
| sure, there's tons of open source out there of dubious
| quality.
|
| @resource0x in a sibling comment made the point that it's
| possible to write great code even if the program is a flawed
| design. I'm probably conflating those things.
| danans wrote:
| > Haven't worked at Google, anyone else share this sentiment?
|
| I worked there, and the quality is definitely much higher and
| the code tends to be far more maintainable. However, there is
| often a cost for that, which is velocity.
|
| Some of this is reduced by the sheer amount of automation in
| tooling (i.e. bots that block style violations and common bugs
| before a code change is submitted).
|
| In other cases, it slows things down quite a bit.
| bo1024 wrote:
| This is very interesting, but I really want to hear about the
| training data process!
___________________________________________________________________
(page generated 2024-03-06 23:00 UTC)