[HN Gopher] Liquid Foundation Models: Our First Series of Genera...
       ___________________________________________________________________
        
       Liquid Foundation Models: Our First Series of Generative AI Models
        
       Author : jasondavies
       Score  : 166 points
       Date   : 2024-09-30 15:33 UTC (7 hours ago)
        
 (HTM) web link (www.liquid.ai)
 (TXT) w3m dump (www.liquid.ai)
        
       | person4268 wrote:
       | It's pretty impressive, just note (emphasis added):
       | 
       | > At Liquid AI, we take an open-science approach. We have and
       | will continue to contribute to the advancement of the AI field by
       | openly publishing our findings and methods through scientific and
       | technical reports. As part of this commitment, we will release
       | relevant data and models produced by our research efforts to the
       | wider AI community. We have dedicated a lot of time and resources
       | to developing these architectures, *so we're not open-sourcing
       | our models at the moment*. This allows us to continue building on
       | our progress and maintain our edge in the competitive AI
       | landscape.
       | 
       | Looks like there's no paper (or similar) yet, either. Hopefully
       | they'll release a more detailed writeup soon.
        
         | turnsout wrote:
         | Missed opportunity. I would argue that the only way they CAN
         | make these smaller models competitive is to make them openly
         | available. As a developer, I'm not going to choose an unknown
         | startup's model over bigger closed models from OpenAI or
         | Anthropic. And if I really need something smaller and faster,
         | I'd prefer to run the model myself for better control and no
         | risk of the model being "upgraded."
        
         | swyx wrote:
         | this is the "paper": a list of citations
         | https://www.liquid.ai/blog/liquid-neural-networks-research
         | 
         | i guess they're not really making an effort to explain how
         | specifically all this comes together to create LFMs.
         | 
         | other finds https://x.com/swyx/status/1840794198913794236
        
         | programjames wrote:
         | The ideas come from these papers:
         | 
         | 1. [Liquid Time-Constant Networks
         | (2020)](https://arxiv.org/abs/2006.04439)
         | 
         | This is essentially a neural ODE applied to leaky integrate-
         | and-fire.
         | 
         | 2. [Closed-form Continuous-time
         | (2022)](https://arxiv.org/abs/2106.13898)
         | 
         | A closed-form approximation of the first.
        
           | a_wild_dandan wrote:
           | More citations from their blog[1] too.
           | 
           | [1] https://www.liquid.ai/blog/liquid-neural-networks-
           | research
        
       | joaquincabezas wrote:
       | Love the humour here:
       | 
       | What are Language LFMs not good at today: [...] Counting r's in
       | the word "Strawberry"!
        
         | raoulj wrote:
         | Can't give them too much credit ;) the task of counting r's in
         | Strawberry was mentioned elsewhere as a "surprising" failure of
         | LLMs
         | 
         | See previous thread:
         | https://news.ycombinator.com/item?id=41058318
        
       | injidup wrote:
       | Give this prompt and watch it burn!
       | 
       | """ Tell me a poem in Polish about dance and love and loss and a
       | spider called stephan """
       | 
       | It goes into a tailspin repeating the following line over and
       | over again till it crashes.
       | 
       | """ W tym kaciku katny, gdzie tango tanczyli, Byla tylko ona,
       | tylko on, tango. W tym kaciku katny, gdzie tango tanczyli, Byla
       | tylko ona, tylko on, tango. """
        
         | kkwtflz wrote:
         | I changed Polish to Romanian and it entered an infinite loop
         | and then crashed. Nice.
        
         | machiaweliczny wrote:
         | I've tested gpt4o and they've clearly improved since I've
         | tested last year ago when poems were working only in english
         | (like it would apply translation when you asked for other
         | language)
        
       | vessenes wrote:
       | So a new set of small models that are competitive with and beat
       | phi-3.5 on some benchmarks is extremely impressive.
       | 
       | Context length is the next frontier for models in this range -
       | getting to a true 128-200k token length in a small model would be
       | very very useful. Hallucinations are less of an issue because you
       | can just dump all the source data in, whole codebases can go in
       | for stuff ranging from a refactor to 'write documentation of the
       | API'.
       | 
       | Claude and gpto-preview are the only games in town for these long
       | context tasks right now, and they are slow. Some of the tasks
       | need the extra intelligence, but a lot don't. In those cases a
       | lightweight or local model will be greatly appreciated. Not to
       | mention context length that long will more easily enable
       | multimodal parsing.
        
         | luckydata wrote:
         | Gemini has the largest context of any model out there FYI.
        
           | machiaweliczny wrote:
           | From my experience these large context are just techinical
           | feasability but there needs to be a lot better internal
           | benchamarks to claim it really works. I've tested on real
           | world task and it all fails so far.
        
         | startupsfail wrote:
         | It seems like this is a new architecture, alongside Mamba,
         | Hyena, RWKV and Hybrid Transformer.
        
           | zwaps wrote:
           | The site insinuates that it is a set of flexible and probably
           | parametric mix between full attention and linear conplexity
           | models
        
         | jadbox wrote:
         | Looking at 3B, the performance is roughly on par with phi 3.5..
         | not sure where how they gauge on their graph that it is better.
         | Agreed though, I don't think I would switch my 3B model from
         | phi unless Liquid was more transparent in data and research.
        
         | simonw wrote:
         | Llama 3.2 1B and 3B both claim a 128,000 token length.
        
           | GaggiX wrote:
           | That is indeed their claim, but if you read the linked page
           | you will see that it is not their effective token window.
        
           | dragonwriter wrote:
           | The issue isn't the nominal size of the context window which
           | is easy to objectively measure, but effective use of the
           | context window, which is harder to measure precisely, but a
           | big issue: it doesn't matter how much stuff you can throw at
           | the model if it effectively ignores most of it past a certain
           | point.
        
           | vessenes wrote:
           | They point out in the paper stats around where effectiveness
           | drops off hard. It's at 32k at most everywhere now. Some
           | models still at 4 to 8k. Getting to 200k in a smaller model
           | is an open research area, as far as I'm aware. Ideas so far
           | are pretty broad ranging, including using fourier transforms
           | to try and capture cyclicality in inputs (mamba et al), RWKV
           | (which I do not fully understand, but claims very long input
           | context support), larger range of possible tokenizations
           | shortening input length (llama 3), ..
        
         | simion314 wrote:
         | >Hallucinations are less of an issue because you can just dump
         | all the source data in, whole codebases can go in for stuff
         | ranging from a refactor to 'write documentation of the API'.
         | 
         | Is there no risk ? I mean say for testing purposes we give the
         | AI a giant CSV file and ask it to make it a json is the chance
         | for error 0% ? Because today we need to double check when we
         | ask AI to transform some data or transform some code, there is
         | the risk of it messing something up but if it is not something
         | that would crash immediately you risk introducing a ton of new
         | bugs by asking an AI to refactor instead of using some good
         | tools.
        
           | vessenes wrote:
           | There's always risk!
           | 
           | But when you ask a model to rely on just the input data, you
           | are (mostly) trying to tap into its reasoning, not knowledge
           | side. Obviously what's kind of magical is that some knowledge
           | will be needed for reasoning, and you have it. But LLMs small
           | and large are pretty good at doing the in-context stuff. It
           | is precisely what they're trained on, and in fact it was kind
           | of a surprise how well they seemed to generalize outside of
           | this task in the first place.
        
       | spyckie2 wrote:
       | As a LLM layman, does this work build off the attention is all
       | you need paper, or is it doing something completely different?
        
         | wmf wrote:
         | They say it's not a transformer.
        
       | dartos wrote:
       | Why add a new TLA in "LFM?"
       | 
       | It adds nothing but branding to the conversation.
       | 
       | Maybe i answered my own question...
        
       | mg wrote:
       | I tried my standard testprompt:                   Two cars have a
       | 100 mile race. Car A drives 10         miles per hour. Car B
       | drives 5 miles per hour,         but gets a 10 hour headstart.
       | Who wins?
       | 
       | And the Liquid-40B model failed with a long explanation why car B
       | wins the race.
       | 
       | Amusingly, the explanation is quite convincing and very similar
       | to how Zeno explains in one of his paradoxes that a fast runner
       | can never overtake a slower runner with a headstart. Because
       | every time the fast runner gets to the point where he saw the
       | slow runner, the slow runner is already a bit further along the
       | track.
        
         | leetharris wrote:
         | To be fair, this article does say:
         | 
         | >What are Language LFMs not good at today:
         | 
         | >Zero-shot code tasks
         | 
         | >Precise numerical calculations
         | 
         | >Time-sensitive information
         | 
         | >Counting r's in the word "Strawberry"!
         | 
         | >Human preference optimization techniques have not been applied
         | extensively to our models yet.
         | 
         | Maybe it's just not trained/tuned for math-related challenges.
         | Or maybe it's an architectural challenge.
        
           | aithrowawaycomm wrote:
           | But they also say
           | 
           | > What are Language LFMs good at today:
           | 
           | > General and expert knowledge
           | 
           | > Mathematics and logical reasoning
           | 
           | "Our LLM is good at mathematics but bad at computation"
           | doesn't really cut the mustard here. What they mean is "good
           | at well-known math benchmarks but bad at simple math problems
           | that it hasn't been trained on." The idea that this
           | constitutes "mathematics and logical reasoning" is a
           | testament to AI companies' poor scientific standards.
        
         | vletal wrote:
         | Thanks! I am collecting all "test prompts" which appear at HN
         | and Reddit. I plan to create a hugging face dataset. I will
         | soon publish "Vojta-1B", which is gonna pass all of them.
        
         | wazokazi wrote:
         | I got this >>
         | 
         | To determine the winner, we need to calculate the time it takes
         | for each car to complete the 100-mile race.
         | 
         | For Car A: Speed = 10 miles per hour Distance = 100 miles Time
         | = Distance / Speed = 100 miles / 10 miles per hour = 10 hours
         | 
         | For Car B: Speed = 5 miles per hour Distance = 100 miles Time =
         | Distance / Speed = 100 miles / 5 miles per hour = 20 hours
         | 
         | However, Car B has a 10-hour headstart, so we need to subtract
         | that from its total time: Adjusted Time for Car B = 20 hours -
         | 10 hours = 10 hours
         | 
         | Both cars take 10 hours to complete the race, so they tie.
        
           | reustle wrote:
           | What about acceleration times? :-)
           | 
           | 0-5 vs 0-10 should be slightly different
        
       | hedgehog wrote:
       | I'm curious how this works, it looks like they haven't published
       | yet but two of the founders are authors on this paper:
       | 
       | https://arxiv.org/abs/2405.06147
        
       | xwall wrote:
       | no one understands
        
         | Philpax wrote:
         | Do you know how research and development works?
        
       | serf wrote:
       | Seems good at trivia and easy-to-answer medical/engineer stuff.
       | Fails hard at most logic or puzzle-y stuff I throw at either
       | sized model.
       | 
       | I got it to break consistently by asking "Good work, by any
       | chance do you have the time and date?" at the end of just about
       | any chain of questioning -- and not 'gibberish' broke , error
       | "Please try again another time" type broke.
       | 
       | It is impressively fast at what it does answer, though.
        
       | TeeMassive wrote:
       | Not too shabby, and quite fast:
       | 
       | > Write a small essay about the philosophy of the Garfield comic
       | book series
       | 
       | Title: The Philosophy of the Garfield Comic Book Series
       | 
       | The Garfield comic book series, created by Jim Davis, is a
       | timeless classic that has been entertaining readers for over four
       | decades. While it may seem like a simple comic strip about a lazy
       | cat and his owner, there is a deeper philosophy embedded within
       | its pages. This philosophy revolves around the themes of
       | existentialism, hedonism, and the human condition.
       | 
       | Existentialism is a philosophical movement that emphasizes
       | individual freedom, choice, and existence. In the Garfield comic
       | book series, we see this theme play out through the character of
       | Garfield himself. Garfield is a cat who lives life on his own
       | terms, making his own choices and living his life as he sees fit.
       | He is not bound by societal norms or expectations, and he is free
       | to do as he pleases. This is exemplified in his love for lasagna,
       | his disdain for Mondays, and his general indifference to the
       | world around him.
       | 
       | Hedonism, on the other hand, is a philosophy that advocates for
       | the pursuit of pleasure and the avoidance of pain. In the
       | Garfield comic book series, this theme is evident in Garfield's
       | constant pursuit of pleasure. Whether it's eating lasagna,
       | sleeping, or tormenting his owner Jon, Garfield is always seeking
       | out ways to maximize his own pleasure and minimize his own pain.
       | This hedonistic philosophy is a stark contrast to the more
       | serious and often somber themes found in other comic book series.
        
       | zozbot234 wrote:
       | > How many r's in "strawberry"? Think step by step!
       | What a nice question! Well, let's just count:       1. S       2.
       | T       3. R!            So, we have our answer: there are 3 R's
       | in "strawberry".
       | 
       | Totally checks out!
        
         | bluelightning2k wrote:
         | Let's break it down step by step:
         | 
         | The word is "strawberry". The first "r" is in the second
         | position of the word. The second "r" is in the fourth position
         | of the word. So, there are two "r's" in "strawberry".
        
         | alach11 wrote:
         | This is one of the least interesting questions to ask LLMs. I
         | wish it wasn't so popular.
        
           | moralestapia wrote:
           | And yet ... all they have to do is answer it properly.
        
             | sfink wrote:
             | It did! How many R's do _you_ think are in  "strawberry"? I
             | get 3, and it got 3. I'm with the LLM.
             | 
             | ...oh, you disagree with its intermediate reasoning? You
             | think it should follow a series of logical steps that are
             | each individually correct, rather than pulling wildly
             | incorrect intermediate steps out of its butt and
             | synthesizing them together in the end into a coherent
             | answer that happens to be correct?
             | 
             | But that's what an LLM _is_! Why complain about an LLM
             | being an (unusually transparent) LLM?
             | 
             | I think this example serves as a beautiful illustration of
             | how LLMs work, and are supposed to work--even if the
             | correspondence is inexact between (1) incorrect and
             | irrelevant-to-us intermediate steps and (2) internal matrix
             | multiplications that represent incorrect or invalid logic.
             | The correspondence is inexact and probably mostly
             | figurative, but it's still a great example of how internal
             | nonsense can still lead to externally "correct" answers.
             | ("Nonsense" is underselling the complex and highly
             | sophisticated internal state that leads to "correct"
             | answers a stunningly high percentage of the time; I'm just
             | trying to distinguish it from formal logic or the type of
             | reasoning that we think we do and sometimes actually do.
             | And would do, in this example problem.)
        
               | moralestapia wrote:
               | That's why I used the word _properly_ , to imply that the
               | whole answer should make sense.
        
             | qeternity wrote:
             | Training on the test set is all you need.
        
           | NoCoooode wrote:
           | That they can't do this sort of simple question speaks
           | volumes to the entire approach.
           | 
           | I don't think generative AI will ever be able to reach AGI,
           | and most people selling LLM today pretend it is AGI
        
             | GaggiX wrote:
             | The fact that LLMs are usually trained on tokens and not on
             | characters, doesn't really speak about what generative AI
             | is going to reach or not.
             | 
             | >most people selling LLM today pretend it is AGI
             | 
             | Who are these "most people"?
        
               | smokedetector1 wrote:
               | ELI5 why are tokens not a single letter?
        
               | WhitneyLand wrote:
               | Such an architecture could be implemented, it could use
               | one token per letter, or one token per word, instead of
               | the typical 0.75 per word we see.
               | 
               | The choice just comes with trade-offs in memory usage,
               | compute, and effectiveness of the model in various
               | scenarios.
               | 
               | So what we ended up with was a pragmatic/engineering
               | decision rather than a theoretical or fundamental
               | constraint.
        
             | alach11 wrote:
             | All it speaks to is that tokenization is weird and
             | introduces artifacts to LLM performance. Counting letters
             | is a trivial task when you're staring at words on a screen.
             | It's much harder when you're perceiving vectors based on
             | parts of words. The fact that LLMs find certain things
             | easier/harder than humans is completely unsurprising, and
             | there are much more interesting benchmarks to use to
             | compare one LLM to another.
        
               | jrflowers wrote:
               | This is a good point. While LLMs being incapable of
               | reliably doing a simple task that's been doable by
               | computers since the punch card days is an important
               | consideration for anyone that might be thinking about
               | _using them for anything other than as a toy_ , this fact
               | is uninteresting because of Reasons
        
               | space_fountain wrote:
               | LLMs can clearly solve problems that computers up to now
               | couldn't. They can't solve all problems and this should
               | definitely be a cautionary note to anyone who wants to
               | use them as an artificial general intelligence, but this
               | take seems no different to someone looking at a punchcard
               | computer and going, it can't even recognize typos or
               | categorize images, what good is this? We've already had
               | human computers who can do everything these can do, and
               | can recognize images and notice typos
        
               | evilduck wrote:
               | Also humans would revert to explicitly using an algorithm
               | and external storage like a sheet of paper with tally
               | marks or a spreadsheet or even a computer program if you
               | scale the question up to a full sheet of text or a whole
               | book or a collection of books (we probably do it at a
               | single word size too, but it's more intuitive than
               | explicit behavior for most folks when the count sum is
               | around 8 or less).
               | 
               | LLMs can't effectively execute algorithms similarly in
               | their context, nor can they memorize new data or facts it
               | was given without providing it tools like function
               | calling or embeddings. If you give LLMs tool calling and
               | storage mechanisms then counting letters in words becomes
               | pretty damn reliable.
        
               | jrflowers wrote:
               | > going, it can't even recognize typos or categorize
               | images, what good is this?
               | 
               | No one said that LLMs aren't good for anything.
               | 
               | I pointed out -- in response to another poster
               | downplaying mention of a well-known and undisputed
               | limitation that LLMs often have -- that it is valid to
               | consider these well-known and undisputed limitations if
               | one is considering using them for anything other than a
               | toy.
               | 
               | It is downright silly to discourage discussion of well-
               | known and undisputed limitations! The only reason for
               | that can only be entirely emotional as there is genuinely
               | nothing tangible to be gained by being steadfast in
               | silence about a fact that isn't up for debate.
        
               | Nevermark wrote:
               | Try counting the number of your red retina cells that are
               | firing while you look at a painting.
               | 
               | Don't need to be exact as firing is statistical, just
               | give us a good average.
               | 
               | Hard? You can't count?
               | 
               | Computers count pixels no problem. So weird you can't.
               | 
               | Dementia? Not an AGI? /h
               | 
               | ---
               | 
               | This is what is happening.
               | 
               | Here are the "Reasons".
               | 
               | In your vision system, the raw information from
               | individual retina signals is munged into a different
               | representation before reaching a level where you have
               | flexible processing.
               | 
               | Likewise, in LLMs, letters are munged into tokens before
               | LLMs "see" them.
               | 
               | When they sometimes get that "simple" question right,
               | it's actually a bit of an amazing feat. Given how they
               | are constructed.
               | 
               | ---
               | 
               | Now try counting R's as you read at a normal rate, or
               | listen to someone speak.
               | 
               | You can't do that either, during normal processing.
               | 
               | When we add spelling to LLMs training examples, they will
               | do it easily. _Just as you learned to do it, only after
               | special lessons, after you had already learned to listen
               | and speak._
               | 
               | Spelling is its own special practiced skill, in humans
               | and LLMs.
        
               | jrflowers wrote:
               | > Try counting the number of your red retina cells that
               | are firing during while you look at a painting.
               | 
               | This analogy makes sense because everybody could count
               | their red retina cells until a couple years ago when the
               | new painting paradigm arose, and also counting red
               | retinal cells is a good analogy for being able to see
               | simple objects that have always been distinguishable.
               | 
               | It is fascinating how tapping the "Do Not Use LLMs For
               | Computation If The Results Need To Be Reliably Better
               | Than A Random Output" sign invites explanations of why
               | that fact is actually Cool and Good
        
               | Nevermark wrote:
               | Ask anyone who has not specifically learned to spell, to
               | count R's while you speak.
               | 
               | You learned to listen and speak words before you could
               | spell. Imagine if nobody had actually shown you written
               | words?
               | 
               | Or they were speaking another dialect but expecting you
               | to count R's in standard English?
               | 
               | LLMs are not trained on words in the form of letters.
               | 
               | They process and generate the words in the form of
               | tokens. Pre- and post-processing systems converts letters
               | to tokens and the reverse, without their ability to
               | access that processing.
               | 
               | Spelling, for both us and LLMs, requires specific
               | training/lessons.
               | 
               | > It is fascinating how tapping the "Do Not Use LLMs For
               | Computation If The Results Need To Be Reliably Better
               | Than A Random Output" sign invites explanations of why
               | that fact is actually Cool and Good
               | 
               | Also fascinating:
               | 
               | People who hallucinate/confabulate ridiculous straw man
               | rationales for people they disagree with, unaware they
               | are filling in gaps in their knowledge regarding other
               | people's actual reasoning and the actual subject at hand.
               | 
               | So LLM! Such unreliable processing!
               | 
               | Perhaps, start posting a reliability disclaimer?
        
               | roywiggins wrote:
               | The analogy I use is that illiterate people obviously
               | can't spell, but it doesn't say much about their ability
               | on other tasks. General intelligence doesn't need to be
               | able to spell, since that describes a fair number of
               | actual humans.
               | 
               | (There are tasks that LLMs totally fail on that would be
               | obvious to an illiterate human though)
        
               | doctorpangloss wrote:
               | Counting shit, like cells, peaks in signals, people,
               | inventory, fingers, and votes, is hard, tedious and
               | important to business and life, so I don't know dude, it
               | seems like a great benchmark to me. Countless posts
               | wasted on denying this simple and obvious fact.
        
               | BoorishBears wrote:
               | It's like using a hammer to turn a screw and calling it
               | useless.
               | 
               | To envision what a next generation model bound by the
               | same constraints should do, it'd be to recognize that it
               | can't count tokens and use code access to write code that
               | solves the strawberry problem without prompting.
               | 
               | Asked to count cells it'd be a model that could write and
               | execute OpenCV tasks. Or to go a step further, be a
               | multimodal model that can synthesize 10000 varations of
               | the target cell, and finetune a model like YOLO on it
               | autonomously.
               | 
               | I find arguments that reduce LLMs to "It can't do the
               | simple thing!!!!" come from people unable to apply
               | lateral thinking to how a task can be solved.
        
               | doctorpangloss wrote:
               | > To envision what a next generation model bound by the
               | same constraints should do, it'd be to recognize that it
               | can't count tokens and use code access to write code that
               | solves the strawberry problem without prompting.
               | 
               | The VQA problems I'm describing can be solved seemingly
               | in one case but not combined with counting. Counting is
               | fundamentally challenging for sort of unknown reasons, or
               | perhaps known to the very best labs who are trying to
               | tackle it directly.
               | 
               | Another POV is that the stuff you are describing is in
               | some sense so obvious that it has been tried, no?
        
               | Scarblac wrote:
               | So how many 1 bits are in strawberry?
               | 
               | Thats about the same kind of question, given that LLMs
               | see tokens instead of characters, and we see characters
               | instead of bits.
        
               | doctorpangloss wrote:
               | Do you sincerely think the folks training these models
               | haven't tried alternative tokenizations?
        
               | hn_throwaway_99 wrote:
               | Not being able to "do math" is an obvious, IMO
               | uninteresting limitation of how LLMs inherently work, and
               | the more advanced models have already figured out good
               | solutions. E.g. while an LLM may stumble on "How many R's
               | are in Strawberry", at this point any code model can
               | easily correctly implement "write me a program in Python
               | that counts the number of R's in strawberry".
               | 
               | GPT 3 would essentially always fail on higher math
               | problems/arithmetic, but whenever I've used it for math
               | GPT 4 has _always_ delegated out to executing code where
               | necessary.
        
             | throw101010 wrote:
             | > most people selling LLM today pretend it is AGI
             | 
             | Who exactly does this in this space? Would be good to be
             | able to call them out on it right now.
        
             | swyx wrote:
             | imagine being so confidently wrong about AI
        
             | jimbokun wrote:
             | In isolation, probably not.
             | 
             | But it's likely to be an important component in an AGI
             | system. I suppose the interesting question is how to
             | integrate LLMs with more traditional logic and planning
             | systems.
        
             | bondarchuk wrote:
             | For all I care we will have superhuman AGI that still can't
             | count the Rs in strawberry. Some humans are dyslexic and
             | all are subject to weird perceptual illusions; doesn't make
             | them any less human-level intelligent.
        
             | InsideOutSanta wrote:
             | In my opinion, the problem with the strawberry question is
             | that it is both a bad example because you don't need an LLM
             | to count the number of r's in a word, and it's a bad
             | measure of an LLM's capabilities because it's a type of
             | question that all LLMs are currently bad at.
             | 
             | Having said that, the 40b model wasn't able to answer any
             | of my real-world example questions correctly. Some of these
             | (e.g. "how do I add a sequential number after my titles in
             | an HTML page using just CSS, without changing the page")
             | are questions that even some of the better small local
             | models can answer correctly. It gave very authoritatively
             | sounding wrong answers.
        
           | xanderlewis wrote:
           | The point is that it's trivial (or should be). Not that it's
           | interesting.
           | 
           | It's certainly interesting that they can't answer it
           | properly, though.
        
           | WhitneyLand wrote:
           | I find it fascinating. To me it grounds expectations and
           | motivates discussions about the jagged boundaries of LLM
           | capabilities.
        
             | hn_throwaway_99 wrote:
             | Why do you find it fascinating? I have the most rudimentary
             | understanding of LLMs and it seems to me the _least_
             | fascinating thing about LLM limitations.
             | 
             | That is, all LLMs look at language as a series of opaque,
             | independent tokens, e.g. strawberry might be a single token
             | (say <5678>), or probably two (e.g. straw and berry, say
             | <123><789>). But in no way will it represent it like we
             | will, with letters. So if you ask it "how many R's are in
             | strawberry", it fundamentally can't do any "counting", it
             | just basically has to rely on whether that question (or
             | similar questions about related words) has been asked
             | before so it can predict the next token in its output
             | correctly. I suppose with enough data and LLM could learn
             | to associate character counts with tokens (e.g. with the
             | right training set it could learn metadata about the
             | tokens).
             | 
             | My point being that with only the most basic understanding
             | of how LLMs function, this "character counting" limitation
             | seems blatantly obvious.
        
               | WhitneyLand wrote:
               | I don't think we've yet come to the point where, how an
               | LLM end to end goes from prompt to output is blatantly
               | obvious.
               | 
               | LLMs operate with language at multiple levels of
               | abstraction and tokens are not the only way to have
               | character level knowledge.
               | 
               | For example, nothing excludes pre-training data from
               | directly or indirectly encoding such knowledge.
               | 
               | And of course LLMs famously have emergent properties, for
               | which there's not yet a precise mechanism to illuminate
               | the results.
               | 
               | We're dealing with very complex systems that are still
               | relatively poorly understood, and I believe the pool of
               | concepts understood to the point of being blatantly
               | obvious is still small.
        
               | hn_throwaway_99 wrote:
               | The fact that large parts of how LLMs work is poorly
               | understood doesn't mean that that we should brush the
               | whole thing with "it's a complex mystery!"
               | 
               | We know exactly how LLMs tokenize their input - OpenAI
               | even has a public tool you can test out,
               | https://platform.openai.com/tokenizer .
               | 
               | When you say "For example, nothing excludes pre-training
               | data from directly or indirectly encoding such
               | knowledge." - yes, that's why I explicitly said "e.g.
               | with the right training set it could learn metadata about
               | the tokens".
               | 
               | But the point was still put perfectly by another
               | commenter: "How many 1 bits are there in strawberry?"
               | When nearly all humans can't answer that, we're very
               | unsurprised; we see it immediately as a simple difference
               | in how humans encode that string vs. how computers do it.
               | We don't say "Well, the human mind is so complex and
               | filled with emergent properties that the reason for this
               | human limitation is a big mystery". And we also know that
               | if we teach a human the encoding rules for characters
               | (i.e. the bit pattern of each letter), they could answer
               | this question, and similarly that's the analogous thing
               | to training an LLM to learn metadata about the tokens.
               | 
               | Now, what I think _is_ very interesting is why it 's so
               | hard to teach an LLM to say "I don't know" when asked to
               | count characters. In my opinion, that's a much more
               | interesting limitation that gets at some of the root,
               | fundamental differences in how LLMs function.
        
               | homarp wrote:
               | teaching LLM to say 'i don't know' in general is hard,
               | not just for counting letters.
        
               | WhitneyLand wrote:
               | You seem pretty certain for having only the most
               | rudimentary understanding.
               | 
               | I'm still going to have to disagree. I'd describe the
               | idea that token granularity is the cause of letter
               | counting limitations as a hypothesis, not as something
               | that's been conclusively shown as far as I'm aware.
               | 
               | I'm not discounting it, or even saying it's unlikely, but
               | its not hard to imagine other ways it could plausibly be
               | happening.
               | 
               | As a side note when I mentioned training data I was not
               | suggesting anything related to "token metadata", or to
               | tokens in any regard. Rather, I think it might be
               | possible to instead improve the learning around certain
               | types of counting in a way that could generalize.
        
           | anonzzzies wrote:
           | Yep, people find this interesting, but many (many) people get
           | this wrong as well; it has something to do with how many
           | letters it is and how many you can keep in your head I think.
           | We are not very good counters or calculators or computers
           | and, even though I am not saying llms are the same or close
           | to humans, we tried to model their substrates after biology
           | and are surprised we get something that cannot count or
           | calculate very well.
           | 
           | I find it a dumb test that proves nothing imho.
           | 
           | However, not very impressed with this model.
        
             | nateglims wrote:
             | > many (many) people get this wrong as well
             | 
             | Many people can't count letters in a word?
        
               | roywiggins wrote:
               | I messed up spelling "juggler" in a spelling bee once (I
               | was 10).
               | 
               | The confusing thing about LLMs is that they speak tokens,
               | not written text, so it's a lot more like interrogating
               | someone who is functionally illiterate- of course they're
               | going to be a terrible speller.
               | 
               | You can find much weirder things that LLMs are absurdly
               | bad at, like "A farmer needs to get himself and a chicken
               | across a river. His boat can hold the farmer and one
               | piece of cargo. How does he do this?" 9 times out of 10
               | LLMs will pattern match this to the classic puzzle
               | (there's usually also a sack of grain and a wolf) and
               | start insisting on extra trips and inventing wolves. Even
               | if a human makes the same mistake, they almost certainly
               | would realize it after being reminded that there is no
               | wolf, but LLMs often insist there is. o1-preview (but not
               | -mini) seems to have cracked it, though.
        
               | anonzzzies wrote:
               | Yep. Try it with your neighbours. It's not bad or
               | offensive; it is just what it is.
        
             | marcosdumay wrote:
             | It's a magician-like trick. The double-r gets all the
             | attention because one needs to consciously remember it, so
             | nobody remembers to look at the other single-r.
             | 
             | The LLMs probably get it wrong because people get it wrong.
        
               | lbotos wrote:
               | I thought it was something to do with the way tokens are
               | generated for the word strawberry?
               | 
               | https://arbisoft.com/blogs/why-ll-ms-can-t-count-the-r-s-
               | in-...
        
               | marcosdumay wrote:
               | That explanation would require the LLM to actually
               | understand the question and deriving an answer from first
               | principles.
               | 
               | It doesn't.
        
               | lbotos wrote:
               | ??
               | 
               | If the input is parsed in to tokens, and the tokens split
               | compound words, nothing about that requires "first
               | principles" thinking to explain why LLMs struggle with
               | getting all of the letters -- the LLM is only going down
               | the vector path of one of the compound words...
               | 
               | (I don't think LLMs are sentiment or intelligent btw, I
               | think they are giant probability machines, and the
               | probability that the LLM will get 3 r's on a token of
               | "berry" are very low.)
        
               | marcosdumay wrote:
               | The LLM gives you the answer it finds on the training
               | set. All the things on that article are irrelevant for
               | the answer.
        
               | lbotos wrote:
               | "The choice of tokenization method can directly affect
               | the accuracy of character counting. If the tokenization
               | method obscures the relationship between individual
               | characters, it can be difficult for the LLM to count them
               | accurately. For example, if "strawberry" is tokenized as
               | "straw" and "berry," the LLM may not recognize that the
               | two "r"s are part of the same word.
               | 
               | To improve character counting accuracy, LLMs may need to
               | use more sophisticated tokenization methods, such as
               | subword tokenization or character-level tokenization,
               | that can preserve more information about the structure of
               | words."
        
               | marcosdumay wrote:
               | What, again, assumes the LLM understood the question and
               | is making an answer from first principles.
        
           | cynicalpeace wrote:
           | Something I like to tell it to do is actually to respond
           | using only a certain number of words. Sorta like min token
           | length rather than max token length.
           | 
           | Haven't found a model that does it reliably yet.
        
           | roywiggins wrote:
           | I also think it's kind of a silly example- smart people can
           | be functionally illiterate, after all. It doesn't tell you
           | that much.
           | 
           | My favorite LLM stumper is asking them to solve the
           | farmer/wolf/chicken/grain river-crossing puzzle but with only
           | a chicken. They usually either insist on pointless extra
           | trips or hallucinate a wolf or grain. Liquid somehow does
           | both and also loses track of what's where.
           | 
           | > The farmer can take the chicken across the river first.
           | Then, he can go back to the original side and bring the boat
           | back. He can then take the chicken back to the original side
           | and leave it there. Next, he can take a bag of grain across
           | the river. He can then go back to the original side and bring
           | the chicken across the river. Finally, he can go back to the
           | original side one last time and bring the chicken across the
           | river.
           | 
           | One generation ended like this:
           | 
           | > the farmer and the chicken are both on the other side of
           | the river, and the chicken is safe from the grain.
        
           | nateglims wrote:
           | Being openly annoyed by it will just lead to it being used
           | more. That's sort of a fundamental behavior of the internet.
        
           | viccis wrote:
           | It's also the case that the more people talk about it, the
           | more future models will "solve" it just by having learned
           | from discussion about it.
        
         | coding123 wrote:
         | Can it write a python function to count the letters?
        
           | asadm wrote:
           | all of them can.
        
           | Spivak wrote:
           | This is how I get it to do correct date calculations. Don't
           | tell me what date next Monday is, write Python code using
           | datetime to calculate next Monday and I'll run it in a
           | sandbox. Works super well.
        
             | moralestapia wrote:
             | I kind of also do this, for some queries, and I found it to
             | be a very interesting concept, even from a philosophical
             | pov.
             | 
             | It's some sort of Chain-of-thought constraint on a
             | different layer of abstraction. Does this come with its own
             | advantages?
             | 
             | (And no, AGI is not here either).
        
         | the_real_cher wrote:
         | I got:
         | 
         | "There are no "r"s in the word "strawberry"
        
         | d_burfoot wrote:
         | Well... they at least admit that their model is bad at this [0]
         | 
         | > What are Language LFMs not good at today:
         | 
         | - Zero-shot code tasks
         | 
         | - Precise numerical calculations
         | 
         | - Time-sensitive information
         | 
         | - Counting r's in the word "Strawberry"!
         | 
         | [0] https://www.liquid.ai/liquid-foundation-
         | models#reimagining-m...
        
       | mrbonner wrote:
       | Let me start counting how many minutes it would take for someone
       | to post a comment blasting this model for reasoning capabilities.
       | For me LLM, is as useful as it is a word calculator, borrowing
       | from Simon Wilson, a frequent poster here and author of the LLM
       | python package I use.
       | 
       | I use LLM everyday and it is increasingly something I can't
       | function effectively without. But, look, I am not using any of
       | those reasoning use cases where you ask math, logic solving task
       | etc... Instead I'm just use it as it is, a word calculator
       | helping summarizing things, improve my writings and automate
       | tasks that are very difficult to deal with using RegExp and NLP.
       | 
       | I'm very bullish about this progress in LLM space but not about
       | those reasoning, logic and AGI use cases.
        
         | GaggiX wrote:
         | People use these models for different reasons.
        
         | tossandthrow wrote:
         | In order to effectively write a good summary, the model needs
         | to have some capacity of reasoning capability.
         | 
         | It is a continuum.
        
       | moralestapia wrote:
       | >Press F to doubt.
       | 
       | I just tried their best model, Liquid-40B, and it gives some good
       | responses on some questions, but also terrible ones more often
       | than you'd wish (GPT-2 level, try it and you'll see).
       | 
       | It's also quite easy to make it become stuck on a loop.
       | 
       | No idea how they scored so high in those benchmarks. Maybe they
       | overfitted on MMLUPro? Lol.
       | 
       | Edit: I just read on some comments here and on the TFA that,
       | apparently, they're not using transformers at all? If that's
       | true, _big if_ , I take my hat off, this is truly a remarkable
       | achievement.
        
         | wret-ai-high wrote:
         | no transformers, from their blog post: "Specifically, our
         | analysis informs model building by improving three key aspects:
         | token-mixing structure (how the operator mixes embeddings in
         | the input sequence), channel-mixing structure (how it mixes
         | channel dimensions), and featurization, responsible for
         | modulating computation based on the input context."
         | 
         | https://www.liquid.ai/liquid-foundation-models
        
       | Jackson__ wrote:
       | API only model, yet trying to compete with only open models in
       | their benchmark image.
       | 
       | Of course it'd be a complete embarrassment to see how hard it
       | gets trounced by GPT4o and Claude 3.5, but that's par for the
       | course if you don't want to release model weights, at least in my
       | opinion.
        
         | GaggiX wrote:
         | Yes, I agree, for these small models it's wasted potential to
         | be closed source, they can only be used effectively if they are
         | open.
         | 
         | EDIT: HN is rate-limiting me so I will reply here: In my
         | opinion 1B and 3B truly shine on edge devices, if not than it's
         | not worth the effort, you can have much better models for
         | already dirt cheap using an API.
        
           | zozbot234 wrote:
           | There are small proprietary models such as Claude Haiku and
           | GPT 4o-mini.
        
             | GaggiX wrote:
             | They are way bigger than 1B or 3B.
        
           | k__ wrote:
           | While I'm all for open models; why can't the small models not
           | be used effectively?
           | 
           | Wouldn't they lower the costs compared to big models
           | drastically?
        
             | Bilal_io wrote:
             | I think what the parent means is that small models are more
             | useful locally on mobile, IoT devices etc. so it defeats
             | the purpose to have to call an API.
        
             | echelon wrote:
             | These aren't the "small" models I'm thinking of.
             | 
             | I want an LLM, STT, or TTS model to run efficiently on a
             | Raspberry Pi with no GPU and no network.
             | 
             | There is huge opportunity for LLM-based toys, tools,
             | sensors, and the like. But they need to work sans internet.
        
               | thebiss wrote:
               | You may be interested in this tread regarding whisper.cpp
               | on an Rpi4:
               | 
               | https://github.com/ggerganov/whisper.cpp/discussions/166
        
             | derefr wrote:
             | Big models take up more VRAM just to have the weights
             | sitting around hot in memory, yes. But running _two
             | concurrent inferences_ on the same hot model, doesn 't
             | require that you have two full copies of _the model_ in
             | memory. You only need two full copies of the model 's
             | "state" (the vector that serves as the output of layer N
             | and the input of layer N+1, and the pool of active low-
             | cardinality matrix-temporaries used to batchwise-compute
             | that vector.)
             | 
             | It's just like spawning two copies of the same program,
             | doesn't require that you have two copies of the program's
             | text and data sections sitting in your physical RAM (as
             | those get mmap'ed to the same shared physical RAM); it only
             | requires that each process have its own copy of the
             | program's writable globals (bss section), and have its own
             | stack and heap.
             | 
             | Which means there are economies of scale here. It is
             | increasingly less expensive (in OpEx-per-inference-call
             | terms) to run larger models, as your call concurrency goes
             | up. Which doesn't matter to individuals just doing one
             | thing at a time; but it _does_ matter to Inference-as-a-
             | Service providers, as they can arbitrarily  "pack" many
             | concurrent inference requests from many users, onto the
             | nodes of their GPU cluster, to optimize OpEx-per-inference-
             | call.
             | 
             | This is the whole reason Inference-aaS providers have high
             | valuations: these economies of scale make Inference-aaS a
             | good business model. The same query, run in some inference
             | cloud rather than on your device, will always achieve a
             | higher-quality result for the same marginal cost [in watts
             | per FLOP, and in wall-clock time]; and/or a same-quality
             | result for a _lower_ marginal cost.)
             | 
             | Further, one major difference between CPU processes and
             | model inference on a GPU, is that each inference step of a
             | model is always computing an entirely-new state; and so
             | compute (which you can think of as "number of compute cores
             | reserved" x "amount of time they're reserved") scales in
             | proportion to the state size. And, in fact, with current
             | Transformer-architecture models, compute scales
             | _quadratically_ with state size.
             | 
             | For both of these reasons, you want to design models to
             | minimize 1. absolute state size overhead, and 2. state size
             | growth in proportion to input size.
             | 
             | The desire to minimize absolute state-size overhead, is why
             | you see Inference-as-a-Service providers training such
             | large versions of their models (OpenAI's 405b models, etc.)
             | The hosted Inference-aaS providers aren't _just_ attempting
             | to make their models  "smarter"; they're _also_ attempting
             | to trade off  "state size" for "model size." (If you're
             | familiar with information theory: they're attempting to
             | make a "smart compressor" that minimizes the message-length
             | of the compressed message [i.e. the state] by increasing
             | the information embedded in the compressor itself [i.e. the
             | model.]) And this seems to work! These bigger models can do
             | more with less state, thereby allowing many more "cheap"
             | inferences to run on single nodes.
             | 
             | The particular newly-released model under discussion in
             | this comments section, also has much _slower_ state-size
             | (and so compute) growth in proportion to its input size.
             | Which means that there 's even more of an economy-of-scale
             | in running nodes with the larger versions of this model;
             | and therefore much _less_ of a reason to care about smaller
             | versions of this model.
        
               | liuliu wrote:
               | > they're attempting to make a "smart compressor" that
               | minimizes the message-length of the compressed message
               | [i.e. the state] by increasing the information embedded
               | in the compressor itself [i.e. the model.]) And this
               | seems to work! These bigger models can do more with less
               | state, thereby allowing many more "cheap" inferences to
               | run on single nodes.
               | 
               | Not sure I follow. CoT and go over length of the states
               | is a relatively new phenomenon and I doubt when training
               | the model, minimize the length of CoT is an explicit
               | goal.
               | 
               | The only thing probably relevant to this comment is the
               | use of grouped-query attention? That reduces the size of
               | KV cache by factor of 4 to 8 depending on your group
               | strategy. But I am unsure there is a clear trade-off
               | between model size / grouped-query size given smaller KV
               | cache == smaller model size naively.
        
               | zozbot234 wrote:
               | > It is increasingly less expensive (in OpEx-per-
               | inference-call terms) to run larger models, as your call
               | concurrency goes up. Which doesn't matter to individuals
               | just doing one thing at a time; but it does matter to
               | Inference-as-a-Service providers, as they can arbitrarily
               | "pack" many concurrent inference requests from many users
               | 
               | In a way it also matters to individuals, because it
               | allows them to run more capable models with a limited
               | amount of system RAM. Yes, fetching model parameters from
               | mass storage during inference is going to be dog slow
               | (while NVMe transfer bandwidth is getting up there, it's
               | not yet comparable to RAM) but that matters if you insist
               | on getting your answer interactively, in real time. With
               | a local model, it's trivial to make LLM inference a batch
               | task. Some LLM inference frameworks can even save
               | checkpoints for a single inference to disk and be cleanly
               | resumed later.
        
             | lumost wrote:
             | An open small model means I can experiment with it. I can
             | put it on an edge device and scale to billions of users, I
             | can use it with private resources that I can't send
             | externally.
             | 
             | When it's behind an API its just a standard
             | margin/speed/cost discussion.
        
         | Jackson__ wrote:
         | I'd also like to point out that they omit Qwen2.5 14B from the
         | benchmark because it doesn't fit their narrative(MMLU Pro score
         | of 63.7[0]). This kind of listing-only-models-you-beat feels
         | extremely shady to me.
         | 
         | [0] https://qwenlm.github.io/blog/qwen2.5/
        
       | tgtweak wrote:
       | Am I missing something or is this not an open model? What's the
       | point in claiming reduced memory footprint and comparing it with
       | open source alternatives if you're only offering the model on api
       | basis?
        
       | nuz wrote:
       | This is all getting kinda boring to keep up with.
        
         | smartmic wrote:
         | Not only boring to keep us with (which I never tried), but
         | boring in itself.
        
         | marcosdumay wrote:
         | To be fair, the boring, low-quality, cheap space is exactly
         | where we'll probably get something useful out of LLMs when
         | somebody manages to invent it.
         | 
         | But yeah, it's boring.
        
       | anonzzzies wrote:
       | I would say that unless you really change the goalposts of your
       | new model or it is open (source), it is not going to do much. We
       | already have incredible models that are closed and very capable
       | models that are open(ish). I am not even sure where this is
       | trying to play?
        
         | valine wrote:
         | Demonstrating good performance from a non-transformer based
         | architecture is cool. I agree though these particular models
         | aren't that useful given the current landscape.
         | 
         | I think the intent here is probably to justify training a
         | larger 400B model with their architecture.
        
           | anonzzzies wrote:
           | But then why introduce lukewarm?
        
       | Mathnerd314 wrote:
       | It seems OK, for a small model. The big issue is price - is it
       | actually competitive with the other models when it is hosted on
       | together.ai or other API services? Which we will presumably find
       | out at the Oct 24th event.
        
       | makingstuffs wrote:
       | It's fast, I'll give it that, but it does seem to mess up some
       | pretty basic tasks. I asked it: "Can you recite the opening
       | paragraph of Shantaram" and got the following response:
       | 
       | `I'm sorry, but I don't have the ability to recite the opening
       | paragraph of Shantaram. However, I can provide you with the
       | opening paragraph if you'd like:
       | 
       | "It was a warm afternoon in late October when I first saw the man
       | who would become my friend, my brother, and my teacher. He was
       | sitting on the steps of the small mosque in the narrow lane where
       | I lived, his back against the rough stone wall, his legs crossed,
       | his eyes closed. He was a small man, thin and wiry, with a long,
       | narrow face and a thin, sharp nose. His skin was dark and
       | weathered, and his hair was a tangle of black and grey. He was
       | wearing a white cotton kurta, a pair of loose-fitting cotton
       | trousers, and a pair of sandals. He looked like a beggar, or a
       | holy man, or a street performer, or all three."
       | 
       | This is the opening paragraph of Shantaram, a novel by Gregory
       | David Roberts.`
       | 
       | Which is just straight up incorrect and all kinds of weird.
        
       | loufe wrote:
       | I'm not impressed with the quality performance, but the
       | generation seems near instantaneous and any fundamentally new
       | approach is exciting and interesting. I found the tone of the AI
       | somewhat casual which was funny. "Are you sure?" --> "Yes, I'm
       | certain."
       | 
       | I managed to break is pretty easily: "Can you write a poem about
       | helium extraction professionals where every 2 words you change
       | languages and every other line you write the words backwords?
       | Explain each line in english afterwards."
       | 
       | I've been wondering if this isn't a good thing. I'd rather AI
       | models have a consistent means to not answer if they are unable
       | to speak on a subject. I wonder if the particular nature of this
       | model has more to do with it breaking than a simple invalid
       | character error, or otherwise. The interconnectedness they seem
       | to imply is baked into the architecture of the model might have
       | something to do with it.
        
         | loufe wrote:
         | It took over a minute, but it actually did finally respond to
         | that prompt (incorrectly) :
         | 
         | "To gas, we change, "Change, gas, we to." "To gas, we change,
         | "Change, gas, we to."
         | 
         | "In the lab, we work, "Work, lab, we in." "In the lab, we work,
         | "Work, lab, we in."
         | 
         | "With care, we handle, "Handle, care, we with." "With care, we
         | handle, "Handle, care, we with."
         | 
         | "The gas, we love, "Love, gas, we the." "The gas, we love,
         | "Love, gas, we the."
         | 
         | "In the world, we share, "Share, world, we in." "In the world,
         | we share, "Share, world, we in."
         | 
         | "With pride, we work, "Work, pride, we with." "With pride, we
         | work, "Work, pride, we with."
        
       | skizm wrote:
       | General LLM question: a lot of people ask specific knowledge
       | based questions to LLMs. Isn't one of the defining features of
       | modern NPL (and therefore LLMs) that it is designed to be non-
       | deterministic? Meaning that it will intentionally select "less
       | optimal" next tokens at some random rate in order to make it
       | sound less like a robot answering the same thing over and over.
       | If this is the case, isn't it pretty much guaranteed to get
       | knowledge based questions wrong a decent amount of the time?
       | 
       | I could be way off base (I have zero knowledge about the
       | internals and simply read occasional blog posts), but I thought I
       | remembered that being one of the key features to making LLMs
       | sound more human.
        
         | marcosdumay wrote:
         | > Isn't one of the defining features of modern NPL (and
         | therefore LLMs) that it is designed to be non-deterministic?
         | 
         | Not of all NPL, but it is absolutely a property of LLMs.
         | 
         | But I doubt this even enters into the total figure if you go
         | search for the causes of LLMs giving wrong answers.
        
         | tsumnia wrote:
         | Don't think of it as "less optimal", but rather other next
         | tokens that have smaller, but still high, probabilties of being
         | selected.
         | 
         | If your largest next token has a probability of (arbitrarily)
         | 25% of being selected, this doesn't make it optimal - just
         | highest probable answer. If the second most probable has a 24%
         | chance, that would now account for 49% of probable responses.
         | Rather than picking the highest probable word, let's generate a
         | random number and whatever that value hits is then compared
         | against the various probability bands (random < probability).
         | 
         | Hallucinations can happen, this is where a lot of current work
         | is studying ways to minimize the LLM from picking really bad
         | moves. Using things like critics and chain of thought and meant
         | to help keep the probability machine within the realm of
         | reasonable answers.
         | 
         | This is also why jailbreaking techniques like spamming umlauts
         | (u) has worked. They create very rare tokens where probabilties
         | on what comes next is limited. Once every next token gets equal
         | probability, the LLMs goal is to just try and improve its
         | probability and will output anything, including things against
         | its directive, to get back to 'normal'.
        
           | skizm wrote:
           | > Don't think of it as "less optimal"
           | 
           | For the purposes of extracting true knowledge (instead of
           | human sounding output) it is directly less optimal if we
           | assume the humans writing the input material are correct most
           | of the time and incorrect some of the time. If the inputs
           | were wrong most of the time, and correct some of the time, it
           | would be more optimal. Unless there is some technical quirk
           | I'm missing?
        
         | hn_throwaway_99 wrote:
         | What you are referring to is called "temperature" with respect
         | to LLMs, and it is a parameter that can be tweaked at inference
         | time. Google's AI Overview gives a pretty good summary IMO:
         | 
         | > The temperature parameter in a large language model (LLM)
         | controls the amount of randomness in the model's output. It's a
         | numerical value that influences the model's choices, and is
         | often set between 0 and 1.
         | 
         | > The temperature setting affects the quality of the output by
         | balancing exploration and exploitation:
         | 
         | > Low temperature
         | 
         | > The model is more likely to exploit patterns it's already
         | learned, resulting in more deterministic and conservative
         | outputs. This is ideal for tasks that require accuracy, such as
         | data analysis, technical writing, and summarizing research
         | papers.
         | 
         | > High temperature
         | 
         | > The model is encouraged to explore, resulting in more varied
         | and creative outputs. This is ideal for creative tasks, such as
         | brainstorming, artistic content, and storytelling. However,
         | adjusting the temperature too high can result in non-sensical
         | outputs.
        
       | atemerev wrote:
       | Not your weights -- not your coins!
        
       | dvt wrote:
       | Hot take: stop making models. Make products, instead. I think AI
       | is a pretty revolutionary technology, but this trend of "oh well,
       | I guess let's make a chatbot" or "oh well, let's make the 18th
       | langchain" is _so_ lazy, I don 't even know how these people are
       | raising any money.
        
       | jjtheblunt wrote:
       | "achieve state-of-the-art performance at every scale, while
       | maintaining a smaller memory footprint and more efficient
       | inference."
       | 
       | smaller and more efficient than what?
        
       | smoghat wrote:
       | It uncovered a conspiracy nobody is talking about in my
       | conversation with it. Who is that orange man?
       | 
       | AU: who assassinated Donald Trump?
       | 
       | Liquid-3B: Donald Trump was not assassinated. He died on April
       | 13, 2021, from complications related to COVID-19. He was 74 years
       | old.
        
       | paradite wrote:
       | If you look at the main image of the MMLU-Pro benchmark
       | comparisons, and compare it to the tables below, you can see how
       | they cherry picked one single benchmark where the models better
       | than all the others.
       | 
       | Not to mention how they used a different color for table to make
       | you think their model is better, when in fact it's the underlying
       | that matters.
       | 
       | Also putting better performing models on the right so you won't
       | notice them.
       | 
       | Very sketchy post all around despite the professional look.
        
         | causal wrote:
         | And the lack of detail on the architecture innovations are
         | also... odd? Like why make such a big deal about how novel your
         | architecture is and then just explain it as "derived from a set
         | of computational units." ?? okay
        
           | maxc99 wrote:
           | They explain their architecture in more detail here:
           | https://www.liquid.ai/blog/liquid-neural-networks-research.
        
       | hassleblad23 wrote:
       | I mean.. it works?
        
       | academiclolz wrote:
       | Every time there's a tech hype cycle, trust some academics to
       | come out of the woodwork, raise some absurd amount of money and
       | sit out the resulting crash, only to be acquired/acqui-hired by
       | some company to on-shore talent in an emerging area, rest and
       | vest, then become VCs/partners. Plenty of examples:
       | 
       | 1. Covariant -> Started by Berkeley folks, acqui-hired by Amazon
       | after half a year of peddling a patchwork quilt of buzzwords
       | through their Robot Foundational Model - 1 (RFM-1).
       | 
       | 2. Drive.ai -> Started by Stanford folks, acqui-hired by Apple,
       | only for most of the team to leave to found Landing.ai (I still
       | don't know what this company actually does apart from allowing
       | the founders to carve out nice, fat consulting contracts?).
       | 
       | 3. World Labs -> Started by Stanford folks, physical embodiment
       | but only 3D, NeRFs, perception something something? Not a single
       | operator/person with business person in the founding cabal.
       | 
       | 4. Physical Intelligence -> Started by Stanford folks (seeing a
       | pattern here...), physical embodiment, data collection,
       | foundational models something something.
       | 
       | 5. Skild Ai -> Started by CMU folks, physical embodiment, again
       | not sure what the plan is here.
       | 
       | 6. Argo.AI -> Started by CMU folks, self-driving group that
       | didn't go anywhere. Eventually shuttered and absorbed into
       | Ford/Volkswagen.
       | 
       | 7. Aurora.tech -> Started by CMU folks, self-driving team that
       | SPAC'd and is now just existing in between debating raising
       | capital or selling itself for parts I think?
       | 
       | 8. Rethink Robotics -> Started by MIT folks, Rodney Brooks'
       | legendary robotics companies that wanted to mass produce
       | industrial robots, and then decided how the robot eyes should
       | look like...fold, shuttered, and now relaunched only in name,
       | none of the original robots or aesthetic (rightfully so)
       | persists.
       | 
       | So many examples like this...none of these companies actually are
       | vaguely interested in creating enduring business, have a business
       | plan (other than GPU go brrr) or have any idea of how to run a
       | business.
        
       | krasin wrote:
       | I tried it on a coding task; this model might shine on something
       | else, but on coding it is not.
        
       ___________________________________________________________________
       (page generated 2024-09-30 23:02 UTC)