[HN Gopher] Liquid Foundation Models: Our First Series of Genera...
___________________________________________________________________
Liquid Foundation Models: Our First Series of Generative AI Models
Author : jasondavies
Score : 166 points
Date : 2024-09-30 15:33 UTC (7 hours ago)
(HTM) web link (www.liquid.ai)
(TXT) w3m dump (www.liquid.ai)
| person4268 wrote:
| It's pretty impressive, just note (emphasis added):
|
| > At Liquid AI, we take an open-science approach. We have and
| will continue to contribute to the advancement of the AI field by
| openly publishing our findings and methods through scientific and
| technical reports. As part of this commitment, we will release
| relevant data and models produced by our research efforts to the
| wider AI community. We have dedicated a lot of time and resources
| to developing these architectures, *so we're not open-sourcing
| our models at the moment*. This allows us to continue building on
| our progress and maintain our edge in the competitive AI
| landscape.
|
| Looks like there's no paper (or similar) yet, either. Hopefully
| they'll release a more detailed writeup soon.
| turnsout wrote:
| Missed opportunity. I would argue that the only way they CAN
| make these smaller models competitive is to make them openly
| available. As a developer, I'm not going to choose an unknown
| startup's model over bigger closed models from OpenAI or
| Anthropic. And if I really need something smaller and faster,
| I'd prefer to run the model myself for better control and no
| risk of the model being "upgraded."
| swyx wrote:
| this is the "paper": a list of citations
| https://www.liquid.ai/blog/liquid-neural-networks-research
|
| i guess they're not really making an effort to explain how
| specifically all this comes together to create LFMs.
|
| other finds https://x.com/swyx/status/1840794198913794236
| programjames wrote:
| The ideas come from these papers:
|
| 1. [Liquid Time-Constant Networks
| (2020)](https://arxiv.org/abs/2006.04439)
|
| This is essentially a neural ODE applied to leaky integrate-
| and-fire.
|
| 2. [Closed-form Continuous-time
| (2022)](https://arxiv.org/abs/2106.13898)
|
| A closed-form approximation of the first.
| a_wild_dandan wrote:
| More citations from their blog[1] too.
|
| [1] https://www.liquid.ai/blog/liquid-neural-networks-
| research
| joaquincabezas wrote:
| Love the humour here:
|
| What are Language LFMs not good at today: [...] Counting r's in
| the word "Strawberry"!
| raoulj wrote:
| Can't give them too much credit ;) the task of counting r's in
| Strawberry was mentioned elsewhere as a "surprising" failure of
| LLMs
|
| See previous thread:
| https://news.ycombinator.com/item?id=41058318
| injidup wrote:
| Give this prompt and watch it burn!
|
| """ Tell me a poem in Polish about dance and love and loss and a
| spider called stephan """
|
| It goes into a tailspin repeating the following line over and
| over again till it crashes.
|
| """ W tym kaciku katny, gdzie tango tanczyli, Byla tylko ona,
| tylko on, tango. W tym kaciku katny, gdzie tango tanczyli, Byla
| tylko ona, tylko on, tango. """
| kkwtflz wrote:
| I changed Polish to Romanian and it entered an infinite loop
| and then crashed. Nice.
| machiaweliczny wrote:
| I've tested gpt4o and they've clearly improved since I've
| tested last year ago when poems were working only in english
| (like it would apply translation when you asked for other
| language)
| vessenes wrote:
| So a new set of small models that are competitive with and beat
| phi-3.5 on some benchmarks is extremely impressive.
|
| Context length is the next frontier for models in this range -
| getting to a true 128-200k token length in a small model would be
| very very useful. Hallucinations are less of an issue because you
| can just dump all the source data in, whole codebases can go in
| for stuff ranging from a refactor to 'write documentation of the
| API'.
|
| Claude and gpto-preview are the only games in town for these long
| context tasks right now, and they are slow. Some of the tasks
| need the extra intelligence, but a lot don't. In those cases a
| lightweight or local model will be greatly appreciated. Not to
| mention context length that long will more easily enable
| multimodal parsing.
| luckydata wrote:
| Gemini has the largest context of any model out there FYI.
| machiaweliczny wrote:
| From my experience these large context are just techinical
| feasability but there needs to be a lot better internal
| benchamarks to claim it really works. I've tested on real
| world task and it all fails so far.
| startupsfail wrote:
| It seems like this is a new architecture, alongside Mamba,
| Hyena, RWKV and Hybrid Transformer.
| zwaps wrote:
| The site insinuates that it is a set of flexible and probably
| parametric mix between full attention and linear conplexity
| models
| jadbox wrote:
| Looking at 3B, the performance is roughly on par with phi 3.5..
| not sure where how they gauge on their graph that it is better.
| Agreed though, I don't think I would switch my 3B model from
| phi unless Liquid was more transparent in data and research.
| simonw wrote:
| Llama 3.2 1B and 3B both claim a 128,000 token length.
| GaggiX wrote:
| That is indeed their claim, but if you read the linked page
| you will see that it is not their effective token window.
| dragonwriter wrote:
| The issue isn't the nominal size of the context window which
| is easy to objectively measure, but effective use of the
| context window, which is harder to measure precisely, but a
| big issue: it doesn't matter how much stuff you can throw at
| the model if it effectively ignores most of it past a certain
| point.
| vessenes wrote:
| They point out in the paper stats around where effectiveness
| drops off hard. It's at 32k at most everywhere now. Some
| models still at 4 to 8k. Getting to 200k in a smaller model
| is an open research area, as far as I'm aware. Ideas so far
| are pretty broad ranging, including using fourier transforms
| to try and capture cyclicality in inputs (mamba et al), RWKV
| (which I do not fully understand, but claims very long input
| context support), larger range of possible tokenizations
| shortening input length (llama 3), ..
| simion314 wrote:
| >Hallucinations are less of an issue because you can just dump
| all the source data in, whole codebases can go in for stuff
| ranging from a refactor to 'write documentation of the API'.
|
| Is there no risk ? I mean say for testing purposes we give the
| AI a giant CSV file and ask it to make it a json is the chance
| for error 0% ? Because today we need to double check when we
| ask AI to transform some data or transform some code, there is
| the risk of it messing something up but if it is not something
| that would crash immediately you risk introducing a ton of new
| bugs by asking an AI to refactor instead of using some good
| tools.
| vessenes wrote:
| There's always risk!
|
| But when you ask a model to rely on just the input data, you
| are (mostly) trying to tap into its reasoning, not knowledge
| side. Obviously what's kind of magical is that some knowledge
| will be needed for reasoning, and you have it. But LLMs small
| and large are pretty good at doing the in-context stuff. It
| is precisely what they're trained on, and in fact it was kind
| of a surprise how well they seemed to generalize outside of
| this task in the first place.
| spyckie2 wrote:
| As a LLM layman, does this work build off the attention is all
| you need paper, or is it doing something completely different?
| wmf wrote:
| They say it's not a transformer.
| dartos wrote:
| Why add a new TLA in "LFM?"
|
| It adds nothing but branding to the conversation.
|
| Maybe i answered my own question...
| mg wrote:
| I tried my standard testprompt: Two cars have a
| 100 mile race. Car A drives 10 miles per hour. Car B
| drives 5 miles per hour, but gets a 10 hour headstart.
| Who wins?
|
| And the Liquid-40B model failed with a long explanation why car B
| wins the race.
|
| Amusingly, the explanation is quite convincing and very similar
| to how Zeno explains in one of his paradoxes that a fast runner
| can never overtake a slower runner with a headstart. Because
| every time the fast runner gets to the point where he saw the
| slow runner, the slow runner is already a bit further along the
| track.
| leetharris wrote:
| To be fair, this article does say:
|
| >What are Language LFMs not good at today:
|
| >Zero-shot code tasks
|
| >Precise numerical calculations
|
| >Time-sensitive information
|
| >Counting r's in the word "Strawberry"!
|
| >Human preference optimization techniques have not been applied
| extensively to our models yet.
|
| Maybe it's just not trained/tuned for math-related challenges.
| Or maybe it's an architectural challenge.
| aithrowawaycomm wrote:
| But they also say
|
| > What are Language LFMs good at today:
|
| > General and expert knowledge
|
| > Mathematics and logical reasoning
|
| "Our LLM is good at mathematics but bad at computation"
| doesn't really cut the mustard here. What they mean is "good
| at well-known math benchmarks but bad at simple math problems
| that it hasn't been trained on." The idea that this
| constitutes "mathematics and logical reasoning" is a
| testament to AI companies' poor scientific standards.
| vletal wrote:
| Thanks! I am collecting all "test prompts" which appear at HN
| and Reddit. I plan to create a hugging face dataset. I will
| soon publish "Vojta-1B", which is gonna pass all of them.
| wazokazi wrote:
| I got this >>
|
| To determine the winner, we need to calculate the time it takes
| for each car to complete the 100-mile race.
|
| For Car A: Speed = 10 miles per hour Distance = 100 miles Time
| = Distance / Speed = 100 miles / 10 miles per hour = 10 hours
|
| For Car B: Speed = 5 miles per hour Distance = 100 miles Time =
| Distance / Speed = 100 miles / 5 miles per hour = 20 hours
|
| However, Car B has a 10-hour headstart, so we need to subtract
| that from its total time: Adjusted Time for Car B = 20 hours -
| 10 hours = 10 hours
|
| Both cars take 10 hours to complete the race, so they tie.
| reustle wrote:
| What about acceleration times? :-)
|
| 0-5 vs 0-10 should be slightly different
| hedgehog wrote:
| I'm curious how this works, it looks like they haven't published
| yet but two of the founders are authors on this paper:
|
| https://arxiv.org/abs/2405.06147
| xwall wrote:
| no one understands
| Philpax wrote:
| Do you know how research and development works?
| serf wrote:
| Seems good at trivia and easy-to-answer medical/engineer stuff.
| Fails hard at most logic or puzzle-y stuff I throw at either
| sized model.
|
| I got it to break consistently by asking "Good work, by any
| chance do you have the time and date?" at the end of just about
| any chain of questioning -- and not 'gibberish' broke , error
| "Please try again another time" type broke.
|
| It is impressively fast at what it does answer, though.
| TeeMassive wrote:
| Not too shabby, and quite fast:
|
| > Write a small essay about the philosophy of the Garfield comic
| book series
|
| Title: The Philosophy of the Garfield Comic Book Series
|
| The Garfield comic book series, created by Jim Davis, is a
| timeless classic that has been entertaining readers for over four
| decades. While it may seem like a simple comic strip about a lazy
| cat and his owner, there is a deeper philosophy embedded within
| its pages. This philosophy revolves around the themes of
| existentialism, hedonism, and the human condition.
|
| Existentialism is a philosophical movement that emphasizes
| individual freedom, choice, and existence. In the Garfield comic
| book series, we see this theme play out through the character of
| Garfield himself. Garfield is a cat who lives life on his own
| terms, making his own choices and living his life as he sees fit.
| He is not bound by societal norms or expectations, and he is free
| to do as he pleases. This is exemplified in his love for lasagna,
| his disdain for Mondays, and his general indifference to the
| world around him.
|
| Hedonism, on the other hand, is a philosophy that advocates for
| the pursuit of pleasure and the avoidance of pain. In the
| Garfield comic book series, this theme is evident in Garfield's
| constant pursuit of pleasure. Whether it's eating lasagna,
| sleeping, or tormenting his owner Jon, Garfield is always seeking
| out ways to maximize his own pleasure and minimize his own pain.
| This hedonistic philosophy is a stark contrast to the more
| serious and often somber themes found in other comic book series.
| zozbot234 wrote:
| > How many r's in "strawberry"? Think step by step!
| What a nice question! Well, let's just count: 1. S 2.
| T 3. R! So, we have our answer: there are 3 R's
| in "strawberry".
|
| Totally checks out!
| bluelightning2k wrote:
| Let's break it down step by step:
|
| The word is "strawberry". The first "r" is in the second
| position of the word. The second "r" is in the fourth position
| of the word. So, there are two "r's" in "strawberry".
| alach11 wrote:
| This is one of the least interesting questions to ask LLMs. I
| wish it wasn't so popular.
| moralestapia wrote:
| And yet ... all they have to do is answer it properly.
| sfink wrote:
| It did! How many R's do _you_ think are in "strawberry"? I
| get 3, and it got 3. I'm with the LLM.
|
| ...oh, you disagree with its intermediate reasoning? You
| think it should follow a series of logical steps that are
| each individually correct, rather than pulling wildly
| incorrect intermediate steps out of its butt and
| synthesizing them together in the end into a coherent
| answer that happens to be correct?
|
| But that's what an LLM _is_! Why complain about an LLM
| being an (unusually transparent) LLM?
|
| I think this example serves as a beautiful illustration of
| how LLMs work, and are supposed to work--even if the
| correspondence is inexact between (1) incorrect and
| irrelevant-to-us intermediate steps and (2) internal matrix
| multiplications that represent incorrect or invalid logic.
| The correspondence is inexact and probably mostly
| figurative, but it's still a great example of how internal
| nonsense can still lead to externally "correct" answers.
| ("Nonsense" is underselling the complex and highly
| sophisticated internal state that leads to "correct"
| answers a stunningly high percentage of the time; I'm just
| trying to distinguish it from formal logic or the type of
| reasoning that we think we do and sometimes actually do.
| And would do, in this example problem.)
| moralestapia wrote:
| That's why I used the word _properly_ , to imply that the
| whole answer should make sense.
| qeternity wrote:
| Training on the test set is all you need.
| NoCoooode wrote:
| That they can't do this sort of simple question speaks
| volumes to the entire approach.
|
| I don't think generative AI will ever be able to reach AGI,
| and most people selling LLM today pretend it is AGI
| GaggiX wrote:
| The fact that LLMs are usually trained on tokens and not on
| characters, doesn't really speak about what generative AI
| is going to reach or not.
|
| >most people selling LLM today pretend it is AGI
|
| Who are these "most people"?
| smokedetector1 wrote:
| ELI5 why are tokens not a single letter?
| WhitneyLand wrote:
| Such an architecture could be implemented, it could use
| one token per letter, or one token per word, instead of
| the typical 0.75 per word we see.
|
| The choice just comes with trade-offs in memory usage,
| compute, and effectiveness of the model in various
| scenarios.
|
| So what we ended up with was a pragmatic/engineering
| decision rather than a theoretical or fundamental
| constraint.
| alach11 wrote:
| All it speaks to is that tokenization is weird and
| introduces artifacts to LLM performance. Counting letters
| is a trivial task when you're staring at words on a screen.
| It's much harder when you're perceiving vectors based on
| parts of words. The fact that LLMs find certain things
| easier/harder than humans is completely unsurprising, and
| there are much more interesting benchmarks to use to
| compare one LLM to another.
| jrflowers wrote:
| This is a good point. While LLMs being incapable of
| reliably doing a simple task that's been doable by
| computers since the punch card days is an important
| consideration for anyone that might be thinking about
| _using them for anything other than as a toy_ , this fact
| is uninteresting because of Reasons
| space_fountain wrote:
| LLMs can clearly solve problems that computers up to now
| couldn't. They can't solve all problems and this should
| definitely be a cautionary note to anyone who wants to
| use them as an artificial general intelligence, but this
| take seems no different to someone looking at a punchcard
| computer and going, it can't even recognize typos or
| categorize images, what good is this? We've already had
| human computers who can do everything these can do, and
| can recognize images and notice typos
| evilduck wrote:
| Also humans would revert to explicitly using an algorithm
| and external storage like a sheet of paper with tally
| marks or a spreadsheet or even a computer program if you
| scale the question up to a full sheet of text or a whole
| book or a collection of books (we probably do it at a
| single word size too, but it's more intuitive than
| explicit behavior for most folks when the count sum is
| around 8 or less).
|
| LLMs can't effectively execute algorithms similarly in
| their context, nor can they memorize new data or facts it
| was given without providing it tools like function
| calling or embeddings. If you give LLMs tool calling and
| storage mechanisms then counting letters in words becomes
| pretty damn reliable.
| jrflowers wrote:
| > going, it can't even recognize typos or categorize
| images, what good is this?
|
| No one said that LLMs aren't good for anything.
|
| I pointed out -- in response to another poster
| downplaying mention of a well-known and undisputed
| limitation that LLMs often have -- that it is valid to
| consider these well-known and undisputed limitations if
| one is considering using them for anything other than a
| toy.
|
| It is downright silly to discourage discussion of well-
| known and undisputed limitations! The only reason for
| that can only be entirely emotional as there is genuinely
| nothing tangible to be gained by being steadfast in
| silence about a fact that isn't up for debate.
| Nevermark wrote:
| Try counting the number of your red retina cells that are
| firing while you look at a painting.
|
| Don't need to be exact as firing is statistical, just
| give us a good average.
|
| Hard? You can't count?
|
| Computers count pixels no problem. So weird you can't.
|
| Dementia? Not an AGI? /h
|
| ---
|
| This is what is happening.
|
| Here are the "Reasons".
|
| In your vision system, the raw information from
| individual retina signals is munged into a different
| representation before reaching a level where you have
| flexible processing.
|
| Likewise, in LLMs, letters are munged into tokens before
| LLMs "see" them.
|
| When they sometimes get that "simple" question right,
| it's actually a bit of an amazing feat. Given how they
| are constructed.
|
| ---
|
| Now try counting R's as you read at a normal rate, or
| listen to someone speak.
|
| You can't do that either, during normal processing.
|
| When we add spelling to LLMs training examples, they will
| do it easily. _Just as you learned to do it, only after
| special lessons, after you had already learned to listen
| and speak._
|
| Spelling is its own special practiced skill, in humans
| and LLMs.
| jrflowers wrote:
| > Try counting the number of your red retina cells that
| are firing during while you look at a painting.
|
| This analogy makes sense because everybody could count
| their red retina cells until a couple years ago when the
| new painting paradigm arose, and also counting red
| retinal cells is a good analogy for being able to see
| simple objects that have always been distinguishable.
|
| It is fascinating how tapping the "Do Not Use LLMs For
| Computation If The Results Need To Be Reliably Better
| Than A Random Output" sign invites explanations of why
| that fact is actually Cool and Good
| Nevermark wrote:
| Ask anyone who has not specifically learned to spell, to
| count R's while you speak.
|
| You learned to listen and speak words before you could
| spell. Imagine if nobody had actually shown you written
| words?
|
| Or they were speaking another dialect but expecting you
| to count R's in standard English?
|
| LLMs are not trained on words in the form of letters.
|
| They process and generate the words in the form of
| tokens. Pre- and post-processing systems converts letters
| to tokens and the reverse, without their ability to
| access that processing.
|
| Spelling, for both us and LLMs, requires specific
| training/lessons.
|
| > It is fascinating how tapping the "Do Not Use LLMs For
| Computation If The Results Need To Be Reliably Better
| Than A Random Output" sign invites explanations of why
| that fact is actually Cool and Good
|
| Also fascinating:
|
| People who hallucinate/confabulate ridiculous straw man
| rationales for people they disagree with, unaware they
| are filling in gaps in their knowledge regarding other
| people's actual reasoning and the actual subject at hand.
|
| So LLM! Such unreliable processing!
|
| Perhaps, start posting a reliability disclaimer?
| roywiggins wrote:
| The analogy I use is that illiterate people obviously
| can't spell, but it doesn't say much about their ability
| on other tasks. General intelligence doesn't need to be
| able to spell, since that describes a fair number of
| actual humans.
|
| (There are tasks that LLMs totally fail on that would be
| obvious to an illiterate human though)
| doctorpangloss wrote:
| Counting shit, like cells, peaks in signals, people,
| inventory, fingers, and votes, is hard, tedious and
| important to business and life, so I don't know dude, it
| seems like a great benchmark to me. Countless posts
| wasted on denying this simple and obvious fact.
| BoorishBears wrote:
| It's like using a hammer to turn a screw and calling it
| useless.
|
| To envision what a next generation model bound by the
| same constraints should do, it'd be to recognize that it
| can't count tokens and use code access to write code that
| solves the strawberry problem without prompting.
|
| Asked to count cells it'd be a model that could write and
| execute OpenCV tasks. Or to go a step further, be a
| multimodal model that can synthesize 10000 varations of
| the target cell, and finetune a model like YOLO on it
| autonomously.
|
| I find arguments that reduce LLMs to "It can't do the
| simple thing!!!!" come from people unable to apply
| lateral thinking to how a task can be solved.
| doctorpangloss wrote:
| > To envision what a next generation model bound by the
| same constraints should do, it'd be to recognize that it
| can't count tokens and use code access to write code that
| solves the strawberry problem without prompting.
|
| The VQA problems I'm describing can be solved seemingly
| in one case but not combined with counting. Counting is
| fundamentally challenging for sort of unknown reasons, or
| perhaps known to the very best labs who are trying to
| tackle it directly.
|
| Another POV is that the stuff you are describing is in
| some sense so obvious that it has been tried, no?
| Scarblac wrote:
| So how many 1 bits are in strawberry?
|
| Thats about the same kind of question, given that LLMs
| see tokens instead of characters, and we see characters
| instead of bits.
| doctorpangloss wrote:
| Do you sincerely think the folks training these models
| haven't tried alternative tokenizations?
| hn_throwaway_99 wrote:
| Not being able to "do math" is an obvious, IMO
| uninteresting limitation of how LLMs inherently work, and
| the more advanced models have already figured out good
| solutions. E.g. while an LLM may stumble on "How many R's
| are in Strawberry", at this point any code model can
| easily correctly implement "write me a program in Python
| that counts the number of R's in strawberry".
|
| GPT 3 would essentially always fail on higher math
| problems/arithmetic, but whenever I've used it for math
| GPT 4 has _always_ delegated out to executing code where
| necessary.
| throw101010 wrote:
| > most people selling LLM today pretend it is AGI
|
| Who exactly does this in this space? Would be good to be
| able to call them out on it right now.
| swyx wrote:
| imagine being so confidently wrong about AI
| jimbokun wrote:
| In isolation, probably not.
|
| But it's likely to be an important component in an AGI
| system. I suppose the interesting question is how to
| integrate LLMs with more traditional logic and planning
| systems.
| bondarchuk wrote:
| For all I care we will have superhuman AGI that still can't
| count the Rs in strawberry. Some humans are dyslexic and
| all are subject to weird perceptual illusions; doesn't make
| them any less human-level intelligent.
| InsideOutSanta wrote:
| In my opinion, the problem with the strawberry question is
| that it is both a bad example because you don't need an LLM
| to count the number of r's in a word, and it's a bad
| measure of an LLM's capabilities because it's a type of
| question that all LLMs are currently bad at.
|
| Having said that, the 40b model wasn't able to answer any
| of my real-world example questions correctly. Some of these
| (e.g. "how do I add a sequential number after my titles in
| an HTML page using just CSS, without changing the page")
| are questions that even some of the better small local
| models can answer correctly. It gave very authoritatively
| sounding wrong answers.
| xanderlewis wrote:
| The point is that it's trivial (or should be). Not that it's
| interesting.
|
| It's certainly interesting that they can't answer it
| properly, though.
| WhitneyLand wrote:
| I find it fascinating. To me it grounds expectations and
| motivates discussions about the jagged boundaries of LLM
| capabilities.
| hn_throwaway_99 wrote:
| Why do you find it fascinating? I have the most rudimentary
| understanding of LLMs and it seems to me the _least_
| fascinating thing about LLM limitations.
|
| That is, all LLMs look at language as a series of opaque,
| independent tokens, e.g. strawberry might be a single token
| (say <5678>), or probably two (e.g. straw and berry, say
| <123><789>). But in no way will it represent it like we
| will, with letters. So if you ask it "how many R's are in
| strawberry", it fundamentally can't do any "counting", it
| just basically has to rely on whether that question (or
| similar questions about related words) has been asked
| before so it can predict the next token in its output
| correctly. I suppose with enough data and LLM could learn
| to associate character counts with tokens (e.g. with the
| right training set it could learn metadata about the
| tokens).
|
| My point being that with only the most basic understanding
| of how LLMs function, this "character counting" limitation
| seems blatantly obvious.
| WhitneyLand wrote:
| I don't think we've yet come to the point where, how an
| LLM end to end goes from prompt to output is blatantly
| obvious.
|
| LLMs operate with language at multiple levels of
| abstraction and tokens are not the only way to have
| character level knowledge.
|
| For example, nothing excludes pre-training data from
| directly or indirectly encoding such knowledge.
|
| And of course LLMs famously have emergent properties, for
| which there's not yet a precise mechanism to illuminate
| the results.
|
| We're dealing with very complex systems that are still
| relatively poorly understood, and I believe the pool of
| concepts understood to the point of being blatantly
| obvious is still small.
| hn_throwaway_99 wrote:
| The fact that large parts of how LLMs work is poorly
| understood doesn't mean that that we should brush the
| whole thing with "it's a complex mystery!"
|
| We know exactly how LLMs tokenize their input - OpenAI
| even has a public tool you can test out,
| https://platform.openai.com/tokenizer .
|
| When you say "For example, nothing excludes pre-training
| data from directly or indirectly encoding such
| knowledge." - yes, that's why I explicitly said "e.g.
| with the right training set it could learn metadata about
| the tokens".
|
| But the point was still put perfectly by another
| commenter: "How many 1 bits are there in strawberry?"
| When nearly all humans can't answer that, we're very
| unsurprised; we see it immediately as a simple difference
| in how humans encode that string vs. how computers do it.
| We don't say "Well, the human mind is so complex and
| filled with emergent properties that the reason for this
| human limitation is a big mystery". And we also know that
| if we teach a human the encoding rules for characters
| (i.e. the bit pattern of each letter), they could answer
| this question, and similarly that's the analogous thing
| to training an LLM to learn metadata about the tokens.
|
| Now, what I think _is_ very interesting is why it 's so
| hard to teach an LLM to say "I don't know" when asked to
| count characters. In my opinion, that's a much more
| interesting limitation that gets at some of the root,
| fundamental differences in how LLMs function.
| homarp wrote:
| teaching LLM to say 'i don't know' in general is hard,
| not just for counting letters.
| WhitneyLand wrote:
| You seem pretty certain for having only the most
| rudimentary understanding.
|
| I'm still going to have to disagree. I'd describe the
| idea that token granularity is the cause of letter
| counting limitations as a hypothesis, not as something
| that's been conclusively shown as far as I'm aware.
|
| I'm not discounting it, or even saying it's unlikely, but
| its not hard to imagine other ways it could plausibly be
| happening.
|
| As a side note when I mentioned training data I was not
| suggesting anything related to "token metadata", or to
| tokens in any regard. Rather, I think it might be
| possible to instead improve the learning around certain
| types of counting in a way that could generalize.
| anonzzzies wrote:
| Yep, people find this interesting, but many (many) people get
| this wrong as well; it has something to do with how many
| letters it is and how many you can keep in your head I think.
| We are not very good counters or calculators or computers
| and, even though I am not saying llms are the same or close
| to humans, we tried to model their substrates after biology
| and are surprised we get something that cannot count or
| calculate very well.
|
| I find it a dumb test that proves nothing imho.
|
| However, not very impressed with this model.
| nateglims wrote:
| > many (many) people get this wrong as well
|
| Many people can't count letters in a word?
| roywiggins wrote:
| I messed up spelling "juggler" in a spelling bee once (I
| was 10).
|
| The confusing thing about LLMs is that they speak tokens,
| not written text, so it's a lot more like interrogating
| someone who is functionally illiterate- of course they're
| going to be a terrible speller.
|
| You can find much weirder things that LLMs are absurdly
| bad at, like "A farmer needs to get himself and a chicken
| across a river. His boat can hold the farmer and one
| piece of cargo. How does he do this?" 9 times out of 10
| LLMs will pattern match this to the classic puzzle
| (there's usually also a sack of grain and a wolf) and
| start insisting on extra trips and inventing wolves. Even
| if a human makes the same mistake, they almost certainly
| would realize it after being reminded that there is no
| wolf, but LLMs often insist there is. o1-preview (but not
| -mini) seems to have cracked it, though.
| anonzzzies wrote:
| Yep. Try it with your neighbours. It's not bad or
| offensive; it is just what it is.
| marcosdumay wrote:
| It's a magician-like trick. The double-r gets all the
| attention because one needs to consciously remember it, so
| nobody remembers to look at the other single-r.
|
| The LLMs probably get it wrong because people get it wrong.
| lbotos wrote:
| I thought it was something to do with the way tokens are
| generated for the word strawberry?
|
| https://arbisoft.com/blogs/why-ll-ms-can-t-count-the-r-s-
| in-...
| marcosdumay wrote:
| That explanation would require the LLM to actually
| understand the question and deriving an answer from first
| principles.
|
| It doesn't.
| lbotos wrote:
| ??
|
| If the input is parsed in to tokens, and the tokens split
| compound words, nothing about that requires "first
| principles" thinking to explain why LLMs struggle with
| getting all of the letters -- the LLM is only going down
| the vector path of one of the compound words...
|
| (I don't think LLMs are sentiment or intelligent btw, I
| think they are giant probability machines, and the
| probability that the LLM will get 3 r's on a token of
| "berry" are very low.)
| marcosdumay wrote:
| The LLM gives you the answer it finds on the training
| set. All the things on that article are irrelevant for
| the answer.
| lbotos wrote:
| "The choice of tokenization method can directly affect
| the accuracy of character counting. If the tokenization
| method obscures the relationship between individual
| characters, it can be difficult for the LLM to count them
| accurately. For example, if "strawberry" is tokenized as
| "straw" and "berry," the LLM may not recognize that the
| two "r"s are part of the same word.
|
| To improve character counting accuracy, LLMs may need to
| use more sophisticated tokenization methods, such as
| subword tokenization or character-level tokenization,
| that can preserve more information about the structure of
| words."
| marcosdumay wrote:
| What, again, assumes the LLM understood the question and
| is making an answer from first principles.
| cynicalpeace wrote:
| Something I like to tell it to do is actually to respond
| using only a certain number of words. Sorta like min token
| length rather than max token length.
|
| Haven't found a model that does it reliably yet.
| roywiggins wrote:
| I also think it's kind of a silly example- smart people can
| be functionally illiterate, after all. It doesn't tell you
| that much.
|
| My favorite LLM stumper is asking them to solve the
| farmer/wolf/chicken/grain river-crossing puzzle but with only
| a chicken. They usually either insist on pointless extra
| trips or hallucinate a wolf or grain. Liquid somehow does
| both and also loses track of what's where.
|
| > The farmer can take the chicken across the river first.
| Then, he can go back to the original side and bring the boat
| back. He can then take the chicken back to the original side
| and leave it there. Next, he can take a bag of grain across
| the river. He can then go back to the original side and bring
| the chicken across the river. Finally, he can go back to the
| original side one last time and bring the chicken across the
| river.
|
| One generation ended like this:
|
| > the farmer and the chicken are both on the other side of
| the river, and the chicken is safe from the grain.
| nateglims wrote:
| Being openly annoyed by it will just lead to it being used
| more. That's sort of a fundamental behavior of the internet.
| viccis wrote:
| It's also the case that the more people talk about it, the
| more future models will "solve" it just by having learned
| from discussion about it.
| coding123 wrote:
| Can it write a python function to count the letters?
| asadm wrote:
| all of them can.
| Spivak wrote:
| This is how I get it to do correct date calculations. Don't
| tell me what date next Monday is, write Python code using
| datetime to calculate next Monday and I'll run it in a
| sandbox. Works super well.
| moralestapia wrote:
| I kind of also do this, for some queries, and I found it to
| be a very interesting concept, even from a philosophical
| pov.
|
| It's some sort of Chain-of-thought constraint on a
| different layer of abstraction. Does this come with its own
| advantages?
|
| (And no, AGI is not here either).
| the_real_cher wrote:
| I got:
|
| "There are no "r"s in the word "strawberry"
| d_burfoot wrote:
| Well... they at least admit that their model is bad at this [0]
|
| > What are Language LFMs not good at today:
|
| - Zero-shot code tasks
|
| - Precise numerical calculations
|
| - Time-sensitive information
|
| - Counting r's in the word "Strawberry"!
|
| [0] https://www.liquid.ai/liquid-foundation-
| models#reimagining-m...
| mrbonner wrote:
| Let me start counting how many minutes it would take for someone
| to post a comment blasting this model for reasoning capabilities.
| For me LLM, is as useful as it is a word calculator, borrowing
| from Simon Wilson, a frequent poster here and author of the LLM
| python package I use.
|
| I use LLM everyday and it is increasingly something I can't
| function effectively without. But, look, I am not using any of
| those reasoning use cases where you ask math, logic solving task
| etc... Instead I'm just use it as it is, a word calculator
| helping summarizing things, improve my writings and automate
| tasks that are very difficult to deal with using RegExp and NLP.
|
| I'm very bullish about this progress in LLM space but not about
| those reasoning, logic and AGI use cases.
| GaggiX wrote:
| People use these models for different reasons.
| tossandthrow wrote:
| In order to effectively write a good summary, the model needs
| to have some capacity of reasoning capability.
|
| It is a continuum.
| moralestapia wrote:
| >Press F to doubt.
|
| I just tried their best model, Liquid-40B, and it gives some good
| responses on some questions, but also terrible ones more often
| than you'd wish (GPT-2 level, try it and you'll see).
|
| It's also quite easy to make it become stuck on a loop.
|
| No idea how they scored so high in those benchmarks. Maybe they
| overfitted on MMLUPro? Lol.
|
| Edit: I just read on some comments here and on the TFA that,
| apparently, they're not using transformers at all? If that's
| true, _big if_ , I take my hat off, this is truly a remarkable
| achievement.
| wret-ai-high wrote:
| no transformers, from their blog post: "Specifically, our
| analysis informs model building by improving three key aspects:
| token-mixing structure (how the operator mixes embeddings in
| the input sequence), channel-mixing structure (how it mixes
| channel dimensions), and featurization, responsible for
| modulating computation based on the input context."
|
| https://www.liquid.ai/liquid-foundation-models
| Jackson__ wrote:
| API only model, yet trying to compete with only open models in
| their benchmark image.
|
| Of course it'd be a complete embarrassment to see how hard it
| gets trounced by GPT4o and Claude 3.5, but that's par for the
| course if you don't want to release model weights, at least in my
| opinion.
| GaggiX wrote:
| Yes, I agree, for these small models it's wasted potential to
| be closed source, they can only be used effectively if they are
| open.
|
| EDIT: HN is rate-limiting me so I will reply here: In my
| opinion 1B and 3B truly shine on edge devices, if not than it's
| not worth the effort, you can have much better models for
| already dirt cheap using an API.
| zozbot234 wrote:
| There are small proprietary models such as Claude Haiku and
| GPT 4o-mini.
| GaggiX wrote:
| They are way bigger than 1B or 3B.
| k__ wrote:
| While I'm all for open models; why can't the small models not
| be used effectively?
|
| Wouldn't they lower the costs compared to big models
| drastically?
| Bilal_io wrote:
| I think what the parent means is that small models are more
| useful locally on mobile, IoT devices etc. so it defeats
| the purpose to have to call an API.
| echelon wrote:
| These aren't the "small" models I'm thinking of.
|
| I want an LLM, STT, or TTS model to run efficiently on a
| Raspberry Pi with no GPU and no network.
|
| There is huge opportunity for LLM-based toys, tools,
| sensors, and the like. But they need to work sans internet.
| thebiss wrote:
| You may be interested in this tread regarding whisper.cpp
| on an Rpi4:
|
| https://github.com/ggerganov/whisper.cpp/discussions/166
| derefr wrote:
| Big models take up more VRAM just to have the weights
| sitting around hot in memory, yes. But running _two
| concurrent inferences_ on the same hot model, doesn 't
| require that you have two full copies of _the model_ in
| memory. You only need two full copies of the model 's
| "state" (the vector that serves as the output of layer N
| and the input of layer N+1, and the pool of active low-
| cardinality matrix-temporaries used to batchwise-compute
| that vector.)
|
| It's just like spawning two copies of the same program,
| doesn't require that you have two copies of the program's
| text and data sections sitting in your physical RAM (as
| those get mmap'ed to the same shared physical RAM); it only
| requires that each process have its own copy of the
| program's writable globals (bss section), and have its own
| stack and heap.
|
| Which means there are economies of scale here. It is
| increasingly less expensive (in OpEx-per-inference-call
| terms) to run larger models, as your call concurrency goes
| up. Which doesn't matter to individuals just doing one
| thing at a time; but it _does_ matter to Inference-as-a-
| Service providers, as they can arbitrarily "pack" many
| concurrent inference requests from many users, onto the
| nodes of their GPU cluster, to optimize OpEx-per-inference-
| call.
|
| This is the whole reason Inference-aaS providers have high
| valuations: these economies of scale make Inference-aaS a
| good business model. The same query, run in some inference
| cloud rather than on your device, will always achieve a
| higher-quality result for the same marginal cost [in watts
| per FLOP, and in wall-clock time]; and/or a same-quality
| result for a _lower_ marginal cost.)
|
| Further, one major difference between CPU processes and
| model inference on a GPU, is that each inference step of a
| model is always computing an entirely-new state; and so
| compute (which you can think of as "number of compute cores
| reserved" x "amount of time they're reserved") scales in
| proportion to the state size. And, in fact, with current
| Transformer-architecture models, compute scales
| _quadratically_ with state size.
|
| For both of these reasons, you want to design models to
| minimize 1. absolute state size overhead, and 2. state size
| growth in proportion to input size.
|
| The desire to minimize absolute state-size overhead, is why
| you see Inference-as-a-Service providers training such
| large versions of their models (OpenAI's 405b models, etc.)
| The hosted Inference-aaS providers aren't _just_ attempting
| to make their models "smarter"; they're _also_ attempting
| to trade off "state size" for "model size." (If you're
| familiar with information theory: they're attempting to
| make a "smart compressor" that minimizes the message-length
| of the compressed message [i.e. the state] by increasing
| the information embedded in the compressor itself [i.e. the
| model.]) And this seems to work! These bigger models can do
| more with less state, thereby allowing many more "cheap"
| inferences to run on single nodes.
|
| The particular newly-released model under discussion in
| this comments section, also has much _slower_ state-size
| (and so compute) growth in proportion to its input size.
| Which means that there 's even more of an economy-of-scale
| in running nodes with the larger versions of this model;
| and therefore much _less_ of a reason to care about smaller
| versions of this model.
| liuliu wrote:
| > they're attempting to make a "smart compressor" that
| minimizes the message-length of the compressed message
| [i.e. the state] by increasing the information embedded
| in the compressor itself [i.e. the model.]) And this
| seems to work! These bigger models can do more with less
| state, thereby allowing many more "cheap" inferences to
| run on single nodes.
|
| Not sure I follow. CoT and go over length of the states
| is a relatively new phenomenon and I doubt when training
| the model, minimize the length of CoT is an explicit
| goal.
|
| The only thing probably relevant to this comment is the
| use of grouped-query attention? That reduces the size of
| KV cache by factor of 4 to 8 depending on your group
| strategy. But I am unsure there is a clear trade-off
| between model size / grouped-query size given smaller KV
| cache == smaller model size naively.
| zozbot234 wrote:
| > It is increasingly less expensive (in OpEx-per-
| inference-call terms) to run larger models, as your call
| concurrency goes up. Which doesn't matter to individuals
| just doing one thing at a time; but it does matter to
| Inference-as-a-Service providers, as they can arbitrarily
| "pack" many concurrent inference requests from many users
|
| In a way it also matters to individuals, because it
| allows them to run more capable models with a limited
| amount of system RAM. Yes, fetching model parameters from
| mass storage during inference is going to be dog slow
| (while NVMe transfer bandwidth is getting up there, it's
| not yet comparable to RAM) but that matters if you insist
| on getting your answer interactively, in real time. With
| a local model, it's trivial to make LLM inference a batch
| task. Some LLM inference frameworks can even save
| checkpoints for a single inference to disk and be cleanly
| resumed later.
| lumost wrote:
| An open small model means I can experiment with it. I can
| put it on an edge device and scale to billions of users, I
| can use it with private resources that I can't send
| externally.
|
| When it's behind an API its just a standard
| margin/speed/cost discussion.
| Jackson__ wrote:
| I'd also like to point out that they omit Qwen2.5 14B from the
| benchmark because it doesn't fit their narrative(MMLU Pro score
| of 63.7[0]). This kind of listing-only-models-you-beat feels
| extremely shady to me.
|
| [0] https://qwenlm.github.io/blog/qwen2.5/
| tgtweak wrote:
| Am I missing something or is this not an open model? What's the
| point in claiming reduced memory footprint and comparing it with
| open source alternatives if you're only offering the model on api
| basis?
| nuz wrote:
| This is all getting kinda boring to keep up with.
| smartmic wrote:
| Not only boring to keep us with (which I never tried), but
| boring in itself.
| marcosdumay wrote:
| To be fair, the boring, low-quality, cheap space is exactly
| where we'll probably get something useful out of LLMs when
| somebody manages to invent it.
|
| But yeah, it's boring.
| anonzzzies wrote:
| I would say that unless you really change the goalposts of your
| new model or it is open (source), it is not going to do much. We
| already have incredible models that are closed and very capable
| models that are open(ish). I am not even sure where this is
| trying to play?
| valine wrote:
| Demonstrating good performance from a non-transformer based
| architecture is cool. I agree though these particular models
| aren't that useful given the current landscape.
|
| I think the intent here is probably to justify training a
| larger 400B model with their architecture.
| anonzzzies wrote:
| But then why introduce lukewarm?
| Mathnerd314 wrote:
| It seems OK, for a small model. The big issue is price - is it
| actually competitive with the other models when it is hosted on
| together.ai or other API services? Which we will presumably find
| out at the Oct 24th event.
| makingstuffs wrote:
| It's fast, I'll give it that, but it does seem to mess up some
| pretty basic tasks. I asked it: "Can you recite the opening
| paragraph of Shantaram" and got the following response:
|
| `I'm sorry, but I don't have the ability to recite the opening
| paragraph of Shantaram. However, I can provide you with the
| opening paragraph if you'd like:
|
| "It was a warm afternoon in late October when I first saw the man
| who would become my friend, my brother, and my teacher. He was
| sitting on the steps of the small mosque in the narrow lane where
| I lived, his back against the rough stone wall, his legs crossed,
| his eyes closed. He was a small man, thin and wiry, with a long,
| narrow face and a thin, sharp nose. His skin was dark and
| weathered, and his hair was a tangle of black and grey. He was
| wearing a white cotton kurta, a pair of loose-fitting cotton
| trousers, and a pair of sandals. He looked like a beggar, or a
| holy man, or a street performer, or all three."
|
| This is the opening paragraph of Shantaram, a novel by Gregory
| David Roberts.`
|
| Which is just straight up incorrect and all kinds of weird.
| loufe wrote:
| I'm not impressed with the quality performance, but the
| generation seems near instantaneous and any fundamentally new
| approach is exciting and interesting. I found the tone of the AI
| somewhat casual which was funny. "Are you sure?" --> "Yes, I'm
| certain."
|
| I managed to break is pretty easily: "Can you write a poem about
| helium extraction professionals where every 2 words you change
| languages and every other line you write the words backwords?
| Explain each line in english afterwards."
|
| I've been wondering if this isn't a good thing. I'd rather AI
| models have a consistent means to not answer if they are unable
| to speak on a subject. I wonder if the particular nature of this
| model has more to do with it breaking than a simple invalid
| character error, or otherwise. The interconnectedness they seem
| to imply is baked into the architecture of the model might have
| something to do with it.
| loufe wrote:
| It took over a minute, but it actually did finally respond to
| that prompt (incorrectly) :
|
| "To gas, we change, "Change, gas, we to." "To gas, we change,
| "Change, gas, we to."
|
| "In the lab, we work, "Work, lab, we in." "In the lab, we work,
| "Work, lab, we in."
|
| "With care, we handle, "Handle, care, we with." "With care, we
| handle, "Handle, care, we with."
|
| "The gas, we love, "Love, gas, we the." "The gas, we love,
| "Love, gas, we the."
|
| "In the world, we share, "Share, world, we in." "In the world,
| we share, "Share, world, we in."
|
| "With pride, we work, "Work, pride, we with." "With pride, we
| work, "Work, pride, we with."
| skizm wrote:
| General LLM question: a lot of people ask specific knowledge
| based questions to LLMs. Isn't one of the defining features of
| modern NPL (and therefore LLMs) that it is designed to be non-
| deterministic? Meaning that it will intentionally select "less
| optimal" next tokens at some random rate in order to make it
| sound less like a robot answering the same thing over and over.
| If this is the case, isn't it pretty much guaranteed to get
| knowledge based questions wrong a decent amount of the time?
|
| I could be way off base (I have zero knowledge about the
| internals and simply read occasional blog posts), but I thought I
| remembered that being one of the key features to making LLMs
| sound more human.
| marcosdumay wrote:
| > Isn't one of the defining features of modern NPL (and
| therefore LLMs) that it is designed to be non-deterministic?
|
| Not of all NPL, but it is absolutely a property of LLMs.
|
| But I doubt this even enters into the total figure if you go
| search for the causes of LLMs giving wrong answers.
| tsumnia wrote:
| Don't think of it as "less optimal", but rather other next
| tokens that have smaller, but still high, probabilties of being
| selected.
|
| If your largest next token has a probability of (arbitrarily)
| 25% of being selected, this doesn't make it optimal - just
| highest probable answer. If the second most probable has a 24%
| chance, that would now account for 49% of probable responses.
| Rather than picking the highest probable word, let's generate a
| random number and whatever that value hits is then compared
| against the various probability bands (random < probability).
|
| Hallucinations can happen, this is where a lot of current work
| is studying ways to minimize the LLM from picking really bad
| moves. Using things like critics and chain of thought and meant
| to help keep the probability machine within the realm of
| reasonable answers.
|
| This is also why jailbreaking techniques like spamming umlauts
| (u) has worked. They create very rare tokens where probabilties
| on what comes next is limited. Once every next token gets equal
| probability, the LLMs goal is to just try and improve its
| probability and will output anything, including things against
| its directive, to get back to 'normal'.
| skizm wrote:
| > Don't think of it as "less optimal"
|
| For the purposes of extracting true knowledge (instead of
| human sounding output) it is directly less optimal if we
| assume the humans writing the input material are correct most
| of the time and incorrect some of the time. If the inputs
| were wrong most of the time, and correct some of the time, it
| would be more optimal. Unless there is some technical quirk
| I'm missing?
| hn_throwaway_99 wrote:
| What you are referring to is called "temperature" with respect
| to LLMs, and it is a parameter that can be tweaked at inference
| time. Google's AI Overview gives a pretty good summary IMO:
|
| > The temperature parameter in a large language model (LLM)
| controls the amount of randomness in the model's output. It's a
| numerical value that influences the model's choices, and is
| often set between 0 and 1.
|
| > The temperature setting affects the quality of the output by
| balancing exploration and exploitation:
|
| > Low temperature
|
| > The model is more likely to exploit patterns it's already
| learned, resulting in more deterministic and conservative
| outputs. This is ideal for tasks that require accuracy, such as
| data analysis, technical writing, and summarizing research
| papers.
|
| > High temperature
|
| > The model is encouraged to explore, resulting in more varied
| and creative outputs. This is ideal for creative tasks, such as
| brainstorming, artistic content, and storytelling. However,
| adjusting the temperature too high can result in non-sensical
| outputs.
| atemerev wrote:
| Not your weights -- not your coins!
| dvt wrote:
| Hot take: stop making models. Make products, instead. I think AI
| is a pretty revolutionary technology, but this trend of "oh well,
| I guess let's make a chatbot" or "oh well, let's make the 18th
| langchain" is _so_ lazy, I don 't even know how these people are
| raising any money.
| jjtheblunt wrote:
| "achieve state-of-the-art performance at every scale, while
| maintaining a smaller memory footprint and more efficient
| inference."
|
| smaller and more efficient than what?
| smoghat wrote:
| It uncovered a conspiracy nobody is talking about in my
| conversation with it. Who is that orange man?
|
| AU: who assassinated Donald Trump?
|
| Liquid-3B: Donald Trump was not assassinated. He died on April
| 13, 2021, from complications related to COVID-19. He was 74 years
| old.
| paradite wrote:
| If you look at the main image of the MMLU-Pro benchmark
| comparisons, and compare it to the tables below, you can see how
| they cherry picked one single benchmark where the models better
| than all the others.
|
| Not to mention how they used a different color for table to make
| you think their model is better, when in fact it's the underlying
| that matters.
|
| Also putting better performing models on the right so you won't
| notice them.
|
| Very sketchy post all around despite the professional look.
| causal wrote:
| And the lack of detail on the architecture innovations are
| also... odd? Like why make such a big deal about how novel your
| architecture is and then just explain it as "derived from a set
| of computational units." ?? okay
| maxc99 wrote:
| They explain their architecture in more detail here:
| https://www.liquid.ai/blog/liquid-neural-networks-research.
| hassleblad23 wrote:
| I mean.. it works?
| academiclolz wrote:
| Every time there's a tech hype cycle, trust some academics to
| come out of the woodwork, raise some absurd amount of money and
| sit out the resulting crash, only to be acquired/acqui-hired by
| some company to on-shore talent in an emerging area, rest and
| vest, then become VCs/partners. Plenty of examples:
|
| 1. Covariant -> Started by Berkeley folks, acqui-hired by Amazon
| after half a year of peddling a patchwork quilt of buzzwords
| through their Robot Foundational Model - 1 (RFM-1).
|
| 2. Drive.ai -> Started by Stanford folks, acqui-hired by Apple,
| only for most of the team to leave to found Landing.ai (I still
| don't know what this company actually does apart from allowing
| the founders to carve out nice, fat consulting contracts?).
|
| 3. World Labs -> Started by Stanford folks, physical embodiment
| but only 3D, NeRFs, perception something something? Not a single
| operator/person with business person in the founding cabal.
|
| 4. Physical Intelligence -> Started by Stanford folks (seeing a
| pattern here...), physical embodiment, data collection,
| foundational models something something.
|
| 5. Skild Ai -> Started by CMU folks, physical embodiment, again
| not sure what the plan is here.
|
| 6. Argo.AI -> Started by CMU folks, self-driving group that
| didn't go anywhere. Eventually shuttered and absorbed into
| Ford/Volkswagen.
|
| 7. Aurora.tech -> Started by CMU folks, self-driving team that
| SPAC'd and is now just existing in between debating raising
| capital or selling itself for parts I think?
|
| 8. Rethink Robotics -> Started by MIT folks, Rodney Brooks'
| legendary robotics companies that wanted to mass produce
| industrial robots, and then decided how the robot eyes should
| look like...fold, shuttered, and now relaunched only in name,
| none of the original robots or aesthetic (rightfully so)
| persists.
|
| So many examples like this...none of these companies actually are
| vaguely interested in creating enduring business, have a business
| plan (other than GPU go brrr) or have any idea of how to run a
| business.
| krasin wrote:
| I tried it on a coding task; this model might shine on something
| else, but on coding it is not.
___________________________________________________________________
(page generated 2024-09-30 23:02 UTC)