hngopher.com

       [HN Gopher] Training LLMs from ground zero as a startup
       ___________________________________________________________________
        
       Training LLMs from ground zero as a startup
        
       Author : swyx
       Score  : 395 points
       Date   : 2024-03-05 22:31 UTC (2 days ago)
        
 (HTM) web link (www.yitay.net)
 (TXT) w3m dump (www.yitay.net)
        
       | swyx wrote:
       | for context Yi Tay was Tech Lead on Google PaLM, UL2, Flan, Bard,
       | etc and now is cofoudner at Reka (which has shipped some v
       | interesting small multimodal models that have featured on here).
       | I prompted him for this post as an ex-Googler now training LLMs
       | as an independent startup
       | https://twitter.com/YiTayML/status/1765105066263052718
       | 
       | our conversation was recorded here
       | https://sub.thursdai.news/p/thursdai-feb-15-2024-openai-chan...
        
         | swyx wrote:
         | (update: i submitted this yesterday and it didnt get traction,
         | i guess @dang must've merged the old submission in here. you
         | really didnt have to, but its a nice gesture. thanks dang!!)
        
           | axpy906 wrote:
           | Great too see you on here. Love Latent Space podcast.
        
             | swyx wrote:
             | aw thank you for listening. some weeks its very much a
             | labor of love lol.
             | 
             | no events planned near term but come to the big shindig in
             | june https://ti.to/software-3/ai-engineer-worlds-fair .
             | last year's summit was the first time i really understood
             | how much of a reach we have and how many good AI people
             | we've managed to gather as friends.
        
               | dwaltrip wrote:
               | I love it as well, it's a fantastic resource :)
        
         | 3abiton wrote:
         | Is he the person after the Yi LLM model?
        
           | bigcat12345678 wrote:
           | No Yi LLM models are from [0], Kaifu Li's LLM startup.
           | 
           | [0] https://www.lingyiwanwu.com/
        
       | pama wrote:
       | Training LLM from scratch is a super important issue that affects
       | the pace and breadth of iteration of AI almost as much as the raw
       | hardware improvements do. The blog is fun but somewhat shallow
       | and not technical or very surprising if you've worked with
       | clusters of GPUs in any capacity over the years. (I liked the
       | perspective of a former googler, but I'm not sure why past
       | colleagues would recommend Jax over pytorch for LLMs outside of
       | Google.) I hope this newco eventually releases a more technical
       | report about their training adventures, like the PDF file here:
       | https://github.com/facebookresearch/metaseq/tree/main/projec...
        
         | axpy906 wrote:
         | If you're doing research JAX makes some sense. Probably some
         | Google bias in there too.
        
           | lyapunova wrote:
           | To be honest, most researchers in applied ML in the bay say
           | the opposite. If you are trying to be nimble and prototype,
           | use pytorch. If you're trying to gain some optimizations as
           | you near deployment, rewrite in Jax.
        
             | plumeria wrote:
             | Where does Tensorflow stand in this?
        
               | axpy906 wrote:
               | Somewhere next to Theano, Mxnet or Caffe.
        
               | plumeria wrote:
               | So, obsolete?
        
               | omneity wrote:
               | What about Keras?
        
               | rockinghigh wrote:
               | Tensorflow has been falling behind since they stopped
               | caring about backward compatibility. PyTorch is the
               | leading framework. Jax is getting some traction at Google
               | and was used to train Gemini.
        
             | axpy906 wrote:
             | Interesting. I've never heard that. I could see that
             | argument going both ways as PyTorch has the larger
             | ecosystem and is published the most.
        
             | pama wrote:
             | Interesting perspective about possible Jax optimizations.
             | Assuming these models are trained and deployed on non-TPU
             | hardware, are there any real advantages in using Jax for
             | deployment on GPU? I'd have assumed that inference is
             | largely a solved optimization for large transformer based
             | models (with any low hanging fruits from custom CUDA code
             | already written) and the details are shifting towards
             | infrastructure tradeoffs and availability of efficient
             | GPUs. But I may be out of the loop with the latest gossip.
             | Or do you simply mean that maybe there exist cases where
             | TPU inference makes sense financially and using jax makes a
             | difference?
        
       | abeppu wrote:
       | It's worth taking a second to note that the author just assumes
       | that readers understand "the wilderness" to mean "not Google".
       | 
       | This post gives a lot of credit to Google's infra and hardware
       | teams, and I'd love to read a perspective from one of those
       | insiders who then went on to do related work elsewhere.
        
         | joe_the_user wrote:
         | I took the phrase to mean "outside any large company". It seems
         | like a fairly obvious metaphor; if you have a starup working on
         | a large scale infrastructure project, you have to set your own
         | logistics just a camp in the literal wildness.
        
         | choppaface wrote:
         | Really telling quote:
         | 
         | > I was completely taken aback by the failure rate of GPUs as
         | opposed to my experiences on TPUs at Google
         | 
         | Should be "I was completely unaware of the failure modes of
         | GPUs, because all my career I've been inside Google and used
         | Google TPUs and was well-acquainted with those failure modes."
         | 
         | I've used GPUs mostly, and when I tried TPUs the jobs failed
         | _all the time_ for really hard-to-debug reasons. Often the
         | indirection between the x86 chip and the TPU device caused
         | hours of hair-pulling, stuff you never get with
         | x86+nvidia+pytorch.
         | 
         | 10-15 years ago, Google minted many $10m+ data scientists (aka
         | Sawzall engineers) who also ventured "into the wilderness" and
         | had very similar reactions. This blog post is much more about
         | the OP hyping his company and personal brand than contributing
         | useful notes to the community.
        
           | StarCyan wrote:
           | When was this? I use JAX+TPUs to train LLMs and haven't
           | experienced many issues. IMO it was way easier to set up
           | distributed training, sharding, etc compared to Pytorch+GPUs.
        
           | quadrature wrote:
           | I think the OP is referring to hardware failures rather than
           | software not playing well together.
        
         | ganeshkrishnan wrote:
         | OP mentions the failure rate of GPUs as "If this were in GPU
         | land, it would have failed within the first few days for
         | sure.".
         | 
         | In my humble opinion, we never had failures of GPU even for
         | large scale training. Our current training batch job is a 20GB
         | json file which takes 6 hours just to load and has been running
         | for more than 15 days with not a hiccup. And we are using the
         | older Tesla T4.
         | 
         | GPUs have memory constraint issues but if you can plan and work
         | around it, I havent seen it crash in real life.
        
           | teaearlgraycold wrote:
           | Ha! We're also committing great sins of computation against
           | T4s at our company. Hopefully, as I learn, things get less
           | janky.
        
           | gwern wrote:
           | > And we are using the older Tesla T4.
           | 
           | That's an undemanding and well-debugged chip by this point (6
           | years ago!). So you aren't experiencing any of the pain
           | people using A100s or H100s (never mind people who have to
           | stand up clusters with B100s soon) are going through now.
        
           | shrubble wrote:
           | Have you checked if there is a faster way to parse your JSON?
           | 3Gbytes/hour to load a file seems slow on today's CPUs...
        
             | flybarrel wrote:
             | What would be an ideal (or more appropriate) speed?
        
               | shrubble wrote:
               | Well it would depend on the specifics of the JSON file
               | but eyeballing the stats at
               | https://github.com/miloyip/nativejson-
               | benchmark/tree/master seems to indicate that even on a
               | 2015 MacBook the parsing proceeds using e.g. Configuru
               | parser at several megabytes per second.
        
           | nl wrote:
           | > 20GB json file... takes 6 hours just to load
           | 
           | Err you definitely should be doing something about that.
           | 
           | 20GB on T4s (how many?) isn't really comparable to terabytes
           | on thousands of A100s.
        
         | lambersley wrote:
         | Agreed. It reads like Seven of Nine realizing she's separated
         | from the Collective and needs to rely lowly human capabilities.
         | The insights into vendors was informative.
        
         | flybarrel wrote:
         | Newbie question - What happens after when an LLM training job
         | experience a hardware failure? I don't suppose you lose all the
         | training progress do you? Then the pain is mostly in the
         | diagnostic of the problem and getting the cluster running
         | again, but no need to worry about data loss right?
        
       | yalok wrote:
       | > All in all, this is only a small part of the story of how we
       | started a company, raised some money, bought some chips and
       | matched Gemini pro/GPT 3.5 and outperformed many others in less
       | than a year having to build everything from scratch.
       | 
       | I wonder what was the budget spent for the chips/cloud GPUs to
       | achieve GPT 3.5 level LLM - at least in the order to magnitude -
       | 2-5 millions?
        
       | joe_the_user wrote:
       | So essentially a startup in this context has a small number of
       | people and a large amount of money for training clusters. The
       | article describes many operation leasing servers - that you
       | assume to go many startups (or existing firms).
       | 
       | So it seems like you have the various LLM creators all doing
       | roughly the same sort of thing (training with text and image
       | data) with similar hardware and similar data. Each of these
       | naturally has their own brand of "secret sauce" for
       | distinguishing their venture. The various secret sauces can make
       | a difference in the quality of an LLM's output.
       | 
       | Yet overall, this seems like a massive, energy intensive exercise
       | in redundancy.
        
         | dauertewigkeit wrote:
         | I don't think most of them have any kind of secret sauce. I
         | think the founders hope to get bought out simply for being able
         | to train "near-SOTA" LLMs. I guess achieving that level of
         | skill and infra could be valuable enough to build upon.
        
           | joe_the_user wrote:
           | Sure, that's also a factor but I'd say it reinforces my main
           | point.
        
             | DeepChill wrote:
             | Good point, so the only real differentiator would be the
             | size & quality of the data being fed and the fine tuning
             | done on the model? I wonder what else differentiates LLMs
             | from each other
        
               | Iulioh wrote:
               | Alignment and censorship ?
        
               | pests wrote:
               | Alignment just means making it do what you want. LLMs
               | just continue the sequence, the chat questions and
               | response style we have now is an example of alignment (to
               | what humans want).
        
               | eru wrote:
               | Alignment can mean making sure your LLM doesn't continue
               | the sequence in embarrassing ways, eg by spouting
               | politically incorrect sequences of words (even though
               | those might have been common in the training data).
        
               | friendzis wrote:
               | In what way does this do more good than harm?
        
               | eru wrote:
               | In the sense of people caring about their models not
               | saying embarrassing things?
               | 
               | Different people have different goals, and they don't
               | necessarily align with yours.
        
               | llm_trw wrote:
               | Also getting a golden ticket.
               | 
               | Golliath 120b is still the best open source model and no
               | one knows why since it's just two llama2 60b glued
               | together.
        
         | doctorpangloss wrote:
         | Maybe it's simpler than that. Instead of spending money on
         | compute that costs X and that cloud providers charge 20*X for,
         | they could spend the money creating training data, but that
         | story is way too hard to tell to investors.
        
         | llm_trw wrote:
         | >Yet overall, this seems like a massive, energy intensive
         | exercise in redundancy.
         | 
         | Keep in mind that this is also chaff to distract people from
         | the real secret sauce. I imagine that just as many startups are
         | hiring writers and photographers to create extremely well
         | labelled uncontaminated data for training.
         | 
         | One only need to look at the perverts over at civitai to see
         | how far you can go with intensive labeling on a tiny compute
         | budget.
        
           | fennecbutt wrote:
           | Us furries were properly tagging data on e6 for a long time
           | before LLMs came about.
        
         | PeterStuer wrote:
         | "this seems like a massive, energy intensive exercise in
         | redundancy"
         | 
         | This is commonly refered to as a market working as intended.
         | Yes, the waste from this type of redundency can be _massive_ ,
         | especially if you realize that ultimately just a tiny
         | percentage of these efforts will result in even moderate
         | success. But it is the price to pay at the edge of progress. A
         | planned monopoly might be more efficient (despite popular
         | banter that just compares a megacorp or a gov, which is
         | basically the same, to a single succesfull startup ignoring the
         | 999 that tried and failed), but those seldom beat a market on
         | innovation.
        
           | polygamous_bat wrote:
           | > This is commonly refered to as a market working as
           | intended.
           | 
           | Is it? Seems like market is unable to separate wheat from the
           | chaff and is just throwing money around hoping to hit the
           | jackpot. While AI has massive chance of affecting our lives,
           | the investment market paints a pretty similar picture to what
           | happened during the crypto boom.
        
             | PeterStuer wrote:
             | Our inability to predict future success from failiure is
             | exactly why we have (massively inefficient) markets
             | outcompeting centralized planned approaches.
        
             | manquer wrote:
             | is it any different from evolution?
        
         | samus wrote:
         | There are not that many of these startups actually. Most use
         | cases of LLM can be backed with a fine-tune of an off-the-shelf
         | foundation model. If you're training foundation models from
         | scratch, you're entering a difficult-to-monetize market where
         | the big boys could eat your lunch by just releasing a new
         | foundation model that might be able to do more than 95% of what
         | yours does.
        
       | twelfthnight wrote:
       | > To be very frank, I would have to say the quality of codebases
       | externally significantly lag behind those I've been used to at
       | Google
       | 
       | Haven't worked at Google, anyone else share this sentiment? I
       | always feel like working with Google code is typically not
       | idiomatic and super difficult to go "under the hood" if anything
       | isn't precisely on the happy path.
        
         | winwang wrote:
         | (not googler)
         | 
         | Google's codebase is idiomatic to Google due to their strict
         | language tooling. e.g. their C++ code stays away from advanced
         | features. The tooling teams at Google have very strong say.
        
           | twelfthnight wrote:
           | I get that sense too. Probably does work awesome if you're
           | inside. But man it's a mess when they externalize stuff. Just
           | one example: their cloud platform CLI includes an entire
           | python installation and takes 1.7G on disk, just to make API
           | calls...
        
             | jen20 wrote:
             | I have never understood why cloud providers seem to think
             | it is OK to write their CLIs in Python. The AWS one is too,
             | and the Azure one went from Node.js to Python some time
             | ago.
        
               | anonymous-panda wrote:
               | Packaging and stability reasons. Same for why it's a
               | 1.7gb install - probably where they landed after having
               | tons of support issues on some random Python version they
               | didn't test or some issue with a dependency that had that
               | issue. Freezing the entire set of artifacts is more
               | stable and Python lets you move pretty quick. I can't
               | speak to why nodejs vs Python though - maybe Python is
               | easier to embed?
        
               | pests wrote:
               | What? They only get package and stability because they
               | include the runtime. If they just went with a compiled
               | language they could distribute native binaries and have
               | actual packaging and stability.
        
               | anonymous-panda wrote:
               | Yes, but it's not just a single metric. Another is how
               | easy it is for them to hire productive members of the
               | team and how much that costs them - middling Python
               | developers churning out fine"ish" code are cheaper than
               | Rust developers doing the same. It's hard to find a
               | language where you can be as productive as a developer in
               | Python that also has AOT compilation to generate
               | standalone binaries.
               | 
               | Tldr: there's multiple factors to consider here and it's
               | more interesting to understand the pressures that cause
               | the decisions, especially if you want to try to create a
               | world where different decisions are made.
        
               | jen20 wrote:
               | > It's hard to find a language where you can be as
               | productive as a developer in Python that also has AOT
               | compilation to generate standalone binaries.
               | 
               | Outside specific cases around machine learning, it's
               | really not: Go is that language. It's not like each of
               | those platforms doesn't have to have a similar team that
               | understand Go anyway (for their SDK), so they could save
               | their customers the abject pain of Python dependency
               | management by just writing their CLIs using it.
        
               | twelfthnight wrote:
               | Yeah, I imagine that was the decision calculus. "Instead
               | of spending some more effort to save millions of
               | unnecessary downloads of python's runtime using a
               | different language, let's just bundle Python!"
               | 
               | I wouldn't be surprised if it was version 2.7 too...
        
               | jen20 wrote:
               | Of course, writing them in Go would solve all of these
               | problems while producing packages which are much smaller.
        
               | twelfthnight wrote:
               | There probably is a sense in which the API's are
               | constantly changing, so maybe an interpreted language
               | might make sense? I imagine there has to be a better way
               | to do with with Go or Rust though (even lua?) for a
               | smaller binary.
        
               | candiodari wrote:
               | Google python binaries are more akin to docker or even vm
               | images, even if the actual technology used predates
               | docker and even linux VMs. They contain something like a
               | slimmed-down linux distribution, not just a binary.
               | 
               | EXTREME predictability (e.g. as never ever using the
               | system's libssl), in trade for huge binaries. They go
               | pretty damn far in this: you won't catch a Google binary
               | even using most of libc.
        
               | jyap wrote:
               | It makes "sense" based on the domain of the cloud
               | provider being DevOps teams who are maintaining and using
               | these CLI tools. Ie. What they use day to day.
               | 
               | For anything more advanced they offer language specific
               | SDKs in Rust, Swift, Kolton, etc...
               | 
               | For example integrating storage in an iOS app.
        
             | marcyb5st wrote:
             | Did you install all the components? Because if so you also
             | installed emulators for the pubsub and big table (maybe
             | others, I don't remember) which explain the big footprint.
        
           | dheera wrote:
           | > e.g. their C++ code stays away from advanced features
           | 
           | Which honestly is a GOOD thing because it would make it much
           | easier for newcomers to ramp up on existing codebases. Most
           | people aren't used to working with spaceships and constexprs.
           | 
           | Readability is also far more valuable to a large team than
           | efficiency for anything that isn't a number-crunching loop.
        
         | renegade-otter wrote:
         | "Externally", no one could possibly beat Google's track record
         | of not committing to products before finally killing them. But
         | the code was beautiful, though!
        
           | twelfthnight wrote:
           | I mean, was Angular ever "beautiful"?
        
             | resource0x wrote:
             | Pretty sure it was. A lousy idea might still be implemented
             | beautifully under the hood. :-)
        
         | titanomachy wrote:
         | I thought the quality was pretty high, largely because there
         | were a lot of rails constraining how code should be written.
         | Most of the code I dealt with was written using somewhat rigid
         | (but generally well-designed) frameworks with programmatically-
         | enforced style guides.
         | 
         | Also, most work seemed to involve some balance of junior and
         | more experienced people, which helped keep quality higher.
         | Outside of Google, I've seen pretty large projects written by
         | new grads with little supervision (and on a tight timeline).
         | Those codebases can be pretty hairy.
        
           | twelfthnight wrote:
           | That honestly does seem like a recipe for good code. And
           | sure, there's tons of open source out there of dubious
           | quality.
           | 
           | @resource0x in a sibling comment made the point that it's
           | possible to write great code even if the program is a flawed
           | design. I'm probably conflating those things.
        
           | rokkitmensch wrote:
           | The thing that impressed me most about Google was the
           | encoding-of-cultural-norms-in-various-CI-jobs.
           | 
           | It lets them extract usable SWE horsepower from pretty much
           | anyone who steps inside and at least tries to be useful and
           | not just coast. They can ingest a startup engineer, someone
           | who's been a mid-tier enterprise codemonkey, yr mythical
           | 10xer, the whole statistical gamut.
        
         | danans wrote:
         | > Haven't worked at Google, anyone else share this sentiment?
         | 
         | I worked there, and the quality is definitely much higher and
         | the code tends to be far more maintainable. However, there is
         | often a cost for that, which is velocity.
         | 
         | Some of this is reduced by the sheer amount of automation in
         | tooling (i.e. bots that block style violations and common bugs
         | before a code change is submitted).
         | 
         | In other cases, it slows things down quite a bit.
        
         | ein0p wrote:
         | A recent ex-googler here: quality of Google3 in general is
         | pretty good, but the LLM training bits are so abysmal that I
         | know people who have resigned instead of working on it. And
         | it's also extra slow because getting a couple local GPUs is not
         | really an option. So you're forced to "develop in Colab" which
         | works for some things and not for others and in general sucks
         | ass if you're working on anything substantial. For anything
         | more substantial you'll be launching stuff on some resource
         | pool, waiting for like 10-15 minutes until it starts (much
         | longer for large models), and then trying to divine why it
         | failed from voluminous and sometimes indecipherable crash logs
         | which also hang your browser when cluster UI tries to load
         | them.
         | 
         | Rumors of Google's AI code superiority are vastly overblown in
         | 2024. I'm currently at another major AI lab, and the code here
         | can actually be understood and worked on, which I consider to
         | be a massive advantage.
        
           | alsoworkedthere wrote:
           | Finally, an accurate portrayal!
           | 
           | Google has superb robustness and code quality, with garbage-
           | level usability. Once you're setup, you can kick off many
           | massive training jobs and compare results easily. However,
           | getting to that point is really hard. You'll never figure out
           | how to use the ML infrastructure and libraries on your own.
           | You can only get it to work by meeting with the teams that
           | wrote the infra so they can find and fix every error and
           | misconfiguration. Usually, there is one single way to get
           | things working together, and neither the documentation nor
           | the error messages will get you to that brittle state.
           | 
           | It's near impossible to get a VM with a TPU or GPU attached,
           | so there's no way to debug issues that happen between the
           | library and the accelerator. Plus somehow they've made Python
           | take longer to build (??!!) and run than C++ takes, so your
           | iteration cycle is several minutes for what would take
           | seconds at any other place. Fun stuff! Somehow it's still one
           | of the best places to do ML work, but they sure try to make
           | it as difficult as possible.
        
             | ein0p wrote:
             | Google doesn't use VMs internally to run workloads. But
             | yeah, seconds-long dev iteration cycles take minutes or
             | even tens of minutes there.
        
       | bo1024 wrote:
       | This is very interesting, but I really want to hear about the
       | training data process!
        
       | stealthcat wrote:
       | Should list most of the technical debt accumulated so far and
       | rank them. At this stage, lots of corners have been cut.
        
       | LZ_Khan wrote:
       | I wish I knew how to do yolo runs.
       | 
       | - signed, a compute resource hog at FAANG
        
       | planet_y wrote:
       | I'm wondering if the title should read "from the ground up"
       | instead of "ground zero"?
       | https://en.wikipedia.org/wiki/Hypocenter
        
         | zer00eyz wrote:
         | https://www.merriam-webster.com/dictionary/ground%20zero
         | 
         | It is a perfectly acceptable use of the idiom.
        
           | davidmurdoch wrote:
           | Acceptable, but maybe not perfectly.
        
         | dotancohen wrote:
         | Yes, the title sounds like somebody confused two idioms. That's
         | not the type of author from whom I want to learn.
        
           | frozenseven wrote:
           | 1. As others have pointed out, it's a perfectly valid idiom.
           | Check a dictionary.
           | 
           | 2. How do you think idioms are created in the first place?
           | 
           | 3. What exactly forces you to act like this?
        
         | makoto12 wrote:
         | could be intentional. Implying LLMs are a proverbial nuclear
         | bomb to the tech landscape. but honestly it threw me as well
        
       | julianh65 wrote:
       | So which compute providers have folks had a good experience with?
        
       | hackerlight wrote:
       | > In the end it took us only a very small number of smaller scale
       | & shorter ablation runs to get to the strong 21B Reka Flash and
       | 7B edge model (and also our upcoming largest core model). Finding
       | a solid recipe with a very limited number of runs is challenging
       | and requires changing many variables at once given the
       | ridiculously enormous search space. In order to do this, one has
       | to abandon the systematicity of Bigtech and rely a lot on "Yolo",
       | gut feeling and instinct.
       | 
       | > Thankfully, I (and many of us in the team) have built up this
       | intuition quite a bit in our ML careers to get it right within a
       | substantially short amount of tries. While we've trained really
       | good models before in our previous jobs, differences in training
       | infrastructure, data, incorporation of new ideas and other
       | environmental issues can still cause non-trivial differences in
       | outcomes. That said, a strong prior helps to significantly cut
       | down the search space and is probably one of the easiest
       | explanations to why we were able to train really strong models
       | with so few trials, resources and experimentation.
        
       | a_bonobo wrote:
       | But what is the product they're selling?
       | 
       | The main Reka.AI page looks like a regular ChatGPT clone, an LLM
       | you pay for by the token. How is this different from all these
       | other companies? Pricing seems to be comparable to ChatGPT
       | 3.5-Turbo.
        
         | polygamous_bat wrote:
         | Perhaps a cure for venture capitalist FOMO for not having
         | invested in AI?
        
       | classified wrote:
       | Absorbing the risk of copyright and license violations en masse
       | for the training data as a service?
        
       | TrackerFF wrote:
       | Big question is, how do small startups manage to get funding for
       | LLM products if they don't have the "correct" background /
       | pedigree?
       | 
       | The world of LLM startups is beginning to look like the world of
       | hedge funds and private equity firms - where the prerequisites
       | for seed/funding are:
       | 
       | A) Prestigious employment history / correct pedigree.
       | 
       | B) Solid network of investors ready to jump before any product
       | has even begun.
        
       | rvz wrote:
       | Then what happens when the LLM or AI performs worse than
       | expected? Spend more money fine tuning?
       | 
       | By the time you get it all working, not only you've spend lots of
       | your VC capital on training alone, your competitors (Google,
       | Meta, etc) already released a more powerful model much better and
       | quicker than you before you could your run the second training
       | epoch.
       | 
       | Another example of a startup incinerating the VC pump and dump
       | scheme for vaporware AI snake-oil.
        
       | tkgally wrote:
       | I learned about reka.ai from this post; their LLMs don't seem to
       | have been discussed much on HN yet [1]. So, out of curiosity, I
       | spent the last hour testing prompts with their chat interface [2]
       | in comparison with ChatGPT 4, Gemini Advanced, Claude 3, and
       | Mistral Large. I put the results at [3]. Overall, Reka Flash
       | doesn't seem significantly worse or better than the others. A lot
       | more testing would be necessary to be sure, of course.
       | 
       | [1]
       | https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
       | 
       | [2] https://chat.reka.ai/chat
       | 
       | [3] https://gally.net/temp/20240307llmcomparison.html
        
       | egberts1 wrote:
       | TL:DR: LLM training is highly susceptible to GIGO.
       | 
       | (GIGO is what one gets when feeding LLM with "G"arbage "I"n,
       | "G"arbage "O"ut.)
       | 
       | This is the current problem about making a vaccine signature
       | fitting like a glove ... as tight as possible ... when populating
       | the anti-malware (i.e. IDS/IPS/NDS/XNS) search pattern engine for
       | use by Aho-Corasick-variant algorithms (such as Parallel-
       | Failureless Aho Corasick).
       | 
       | However, LLM as a binary code-based detector for malware
       | detection has a very limited benefit (it is there but only as a
       | backend topical add-on after all other conditionals have been
       | identified).
       | 
       | LLM lacks qualifying conditionals surrounding a premise data, and
       | I have my doubts of using LLM for medical diagnosis as well:
       | until we start having LLM denote the much-needed weighted combo-
       | conditionals by "percentages".
        
       ___________________________________________________________________
       (page generated 2024-03-07 23:02 UTC)