hngopher.com

       [HN Gopher] Gemini "duck" demo was not done in realtime or with ...
       ___________________________________________________________________
        
       Gemini "duck" demo was not done in realtime or with voice
        
       Author : apsec112
       Score  : 623 points
       Date   : 2023-12-07 18:03 UTC (4 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | throwitaway222 wrote:
       | link to duck video: https://www.youtube.com/watch?v=UIZAiXYceBI
        
         | recursive wrote:
         | Thanks. This is the first I'm hearing of a duck demo, and
         | couldn't figure out what it was.
        
           | Garrrrrr wrote:
           | Timestamp for the duck demo:
           | https://youtu.be/UIZAiXYceBI?si=pNT74PXjyDataF1T&t=246
        
       | Inward wrote:
       | Yes, that was obvious as soon as I saw it wasn't live I clicked
       | off. You can train any LLM to perform a certain task(s) well and
       | google engineers are not that dense. This was obvious marketing
       | PR as open AI has completely made google basically obsolete with
       | 90% of my queries can be answered without wading through LLM
       | generated text for a simple answer.
        
         | nirvael wrote:
         | >without wading through LLM generated text
         | 
         | ...OpenAI solved this by generating LLM text for you to wade
         | through?
        
           | rose_ann_ wrote:
           | No. It solved it by (most of the time) giving the OP and I
           | the answer to our queries, without us needing to wade through
           | spammy SERP links.
        
             | kweingar wrote:
             | If LLMs can replace 90% of your queries, then you have very
             | different search patterns from me. When I search on Kagi,
             | much of the time I'm looking for the website of a business,
             | a public figure's social media page, a restaurant's hours
             | of operation, a software library's official documentation,
             | etc.
             | 
             | LLMs have been very useful, but regular search is still a
             | big part of everyday life for me.
        
             | GolfPopper wrote:
             | How do you tell a plausible wrong answer from a real one?
        
               | rose_ann_ wrote:
               | By testing the code it returns (I mostly use it as a
               | coding assistant) to see if it works. 95% of the time it
               | does.
               | 
               | For technical questions, ChatGPT has almost completely
               | replaced Google & Stack Overflow for me.
        
               | 13415 wrote:
               | In my experience, testing code in a way that ensures that
               | it works is often harder and takes more time than writing
               | it.
        
           | data-ottawa wrote:
           | GPT4 search is a very good experience.
           | 
           | Though because you don't see the answers it doesn't show you,
           | it's hard to really validate the quality, so I'm still wary,
           | but when I look for specific stuff it tends to find it.
        
       | kweingar wrote:
       | The video itself and the video description give a disclaimer to
       | this effect. Agreed that some will walk away with an incorrect
       | view of how Gemini functions, though.
       | 
       | Hopefully realtime interaction will be part of an app soon.
       | Doesn't seem like there would be too many technical hurdles
       | there.
        
         | billconan wrote:
         | performance and cost are hurdles?
        
           | kweingar wrote:
           | It can be realtime while still having more latency than
           | depicted in the video (and the video clearly stated that
           | Gemini does not respond that quickly).
           | 
           | A local model could send relevant still images from the
           | camera feed to Gemini, along with the text transcript of the
           | user's speech. Then Gemini's output could be read aloud with
           | text-to-speech. Seems doable within the present cost and
           | performance constraints.
        
         | anigbrowl wrote:
         | People don't really pay attention to disclaimers. Google made a
         | choice knowing people would remember the hype, not the
         | disclaimer.
        
           | lainga wrote:
           | :%s/Google/the team         :%s/people/the promotion board
           | 
           | Conway's law applied to the corporate-public interface :)
        
           | 3pt14159 wrote:
           | I remember watching it and I was pretty impressed, but as I
           | was walking around thinking to myself I came to the
           | conclusion that there was something fishy about the demo. I
           | didn't know exactly what they fudged, but it was far too
           | polished to explain how well their current AI demos preform.
           | 
           | I'm not saying there have been no improvements in AI. There
           | is and this includes Google. But the reason why ChatGPT has
           | really taken over the world is that the demo is in your own
           | hands and it does quite well there.
        
         | peteradio wrote:
         | If there weren't serious technical hurdles they wouldn't have
         | faked it.
        
         | jefftk wrote:
         | The disclaimer in the description is "For the purposes of this
         | demo, latency has been reduced and Gemini outputs have been
         | shortened for brevity."
         | 
         | That's different from "Gemini was shown selected still images
         | and not video".
        
           | tobr wrote:
           | What I found impressive about it was the voice, the fast
           | real-time response to video, and the succinct responses. So
           | apparently all of that was fake. You got me, Google.
        
         | TillE wrote:
         | The entirety of the disclaimer is "sequences shortened
         | throughout", in tiny text at the bottom for two seconds.
         | 
         | They do disclose most of the details elsewhere, but the video
         | itself is produced and edited in such a way that it's extremely
         | misleading. They really want you to think that it's responding
         | in complex ways to simple voice prompts and a video feed, and
         | it's just not.
        
           | dogprez wrote:
           | Yea, of all the edits in the video, the editing for timing is
           | the least of concern. My gripe is that the prompting was
           | different and in order to get that information you have to
           | watch the video only on YouTube, expand the description and
           | click on a link to a different blog article. Linking a
           | "making of" video where they show this and interview some of
           | the minds behind Gemini would have been better PR.
        
         | Jagerbizzle wrote:
         | They were just parroting this video on CNBC without any
         | disclaimers, so the viewers who don't happen to also read
         | hacker news will likely form a different opinion than those of
         | us who do.
        
         | skepticATX wrote:
         | No. The disclaimer was not nearly enough.
         | 
         | The video fooled many people, including myself. This was not
         | your typical super optimized and scripted demo.
         | 
         | This was blatant false advertising. Showing capabilities that
         | do not exist. It's shameful behavior from Google, to be
         | perfectly honest.
        
         | titzer wrote:
         | Yeah, and ads on Google search have the teeniest, tiniest
         | little "ad" chip on them, a long progression of making ads more
         | in-your-face and less well-distinguished.
         | 
         | In my estimation, given the context around AI-generated content
         | and general fakery, this video was deceptive. The only
         | impressive thing about the video (to me) was how snappy and
         | fluid it seemed to be, presumably processing video in real
         | time. None of that was real. It's borderline fraudulent.
        
       | kaoD wrote:
       | How is this not false advertising?
        
         | barbazoo wrote:
         | Or worse, fraud to make their stock go up
         | 
         | edit: s/stuck/stock
        
         | drcode wrote:
         | I suppose it's not false advertising, since they don't even
         | claim to have a product released yet that can do this, since
         | Trojans Ultra won't be available until an unspecified time next
         | year
        
           | stephbu wrote:
           | You're right, it's astroturfing a placeholder in the market
           | in the absence of product. The difference is probably just
           | the target audience - feels like this one is more aimed at
           | share-holders and internal politics.
        
           | empath-nirvana wrote:
           | possibly securities fraud though. Their stock popped a few
           | percent on the back of that faked demo.
        
           | imiric wrote:
           | It's still false advertising.
           | 
           | This is common in all industries. Take gaming, for example.
           | Game publishers love this kind of publicity, as it creates
           | hype, which leads to sales. There have been numerous examples
           | of this over the years: Watch Dogs, No Man's Sky, Cyberpunk
           | 2077, etc. There's a period of controversy once consumers
           | realize they've been duped, the company releases some fake
           | apology and promises or doubles down, but they still walk out
           | of it richer, and ready to do it again next time.
           | 
           | It's absolutely insidious, and should be heavily fined and
           | regulated.
        
         | Tao3300 wrote:
         | It's a software demo. If you ever gave an honest demo, you gave
         | a bad demo. If you ever saw a good and honest demo, you were
         | fooled.
        
           | dragontamer wrote:
           | As a programmer, I'd say that all the demos of my code were
           | honest and representative of what my code was doing.
           | 
           | But I recognize we're all different programmers in different
           | circumstances. But at a minimum, I'd like to be honest with
           | my work. My bosses seem to agree with me and I've never been
           | pressured into hosting a fake demo or lie about the features.
           | 
           | In most cases, demos are needed because there's that dogfood
           | problem. Its just not possible for me to know how my
           | (prospective) customers will use my code. So I need to show
           | off what has been coded, my progress, and my intentions for
           | the feature set. In response, the (prospective) customer may
           | walk away, they may have some comments that increases the
           | odds of adoption, or they think its cool and amazing and take
           | it on the spot. We can go back and forth with regards to
           | feature changes or what is possible, but that's how things
           | should work.
           | 
           | ------------
           | 
           | I've done a few "I could do it like this" demos, where
           | everyone in the room knew that I didn't finish the code yet
           | and its just me projecting into the future of how code would
           | work and/or how it'd be used. But everyone knew the code
           | wasn't done yet (despite that, I've always delivered on what
           | I've promised).
           | 
           | There is a degree of professional ethics I'd expect from my
           | peers. Hosting honest demos is one of them, especially with
           | technical audience members.
        
           | saagarjha wrote:
           | I prefer to let my software be good enough to let it speak
           | for itself without resorting to fraud, thank you ver much.
        
           | qwertox wrote:
           | https://www.youtube.com/watch?v=OPUq31JZFsA
        
       | drcongo wrote:
       | Remember when they faked that Google Assistant booking a
       | restaurant thing too.
        
         | miraculixx wrote:
         | Mhm
        
         | umeshunni wrote:
         | How was that fake?
        
       | valine wrote:
       | It's not live, but it's in the realm of outputs I would expect
       | from a GPT trained on video embeddings.
       | 
       | Implying they've solved single token latency, however, is very
       | distasteful.
        
         | zozbot234 wrote:
         | OP says that Gemini had still images as input, not video - and
         | the dev blog post shows it was instructed to reply to each
         | input in relevant terms. Needless to say, that's quite
         | different from what's implied in the demo, and at least
         | theoretically is already within GPT's abilities.
        
           | valine wrote:
           | How do you think the cup demo works? Lots of still images?
        
             | watusername wrote:
             | A few hand-picked images (search for "cup shuffling"):
             | https://developers.googleblog.com/2023/12/how-its-made-
             | gemin...
        
               | valine wrote:
               | Holy crap that demo is misleading. Thanks for the link.
        
       | Animats wrote:
       | The Twitter-linked Bloomberg page is now down.[1] Alternative
       | page: [2] New page says it was partly faked. Can't find old page
       | in archives.
       | 
       | [1]
       | https://www.bloomberg.com/opinion/articles/2023-12-07/google...
       | 
       | [2]
       | https://www.bloomberg.com/opinion/articles/2023-12-07/google...
        
         | sowbug wrote:
         | I am similarly enraged when TV show characters respond to text
         | messages faster than humans can type. It destroys the realism
         | of my favorite rom-coms.
        
       | imacomputertoo wrote:
       | it was obviously marketing material, but if this tweet is right,
       | then it was just blatant false advertising.
        
         | kjkjadksj wrote:
         | Google always does fake advertising. "Unlimited" google drive
         | accounts for example. They just have such a beastly legal team
         | no one is going to challenge them on anything like that.
        
           | Dylan16807 wrote:
           | What was fake about unlimited google drive? There were some
           | people using petabytes.
           | 
           | The eventual removal of that tier and anything even close
           | speaks to Google's general issues with cancelling services,
           | but that doesn't mean it was less real while it existed.
        
       | peteradio wrote:
       | Lol, could have done without the cocky narration. "I think we're
       | done here."
        
         | cedws wrote:
         | The whole launch is cocky. Bleh. Stick to the engineering.
        
       | cryptoz wrote:
       | I'll admit I was fooled. I didn't read the description of the
       | video. The most impressive thing they showed was the real-time
       | responses to watching a video. Everything else was about
       | expected.
       | 
       | Very misleading and sad Google would so obviously fake a demo
       | like this. Mentioning in the description that it's edited is not
       | really in the realm of doing enough to make clear the fakery.
        
         | LesZedCB wrote:
         | i too was excited and duped about the real-time implications.
         | though i'm not surprised at all to find out it's false.
         | 
         | mea cupla i should have looked at the bottom of the description
         | box on youtube where it probably says "this demonstration is
         | based on an actual interaction with an LLM"
        
           | wvenable wrote:
           | I'm surprised it was false. It was made to look realistic and
           | I wouldn't expect Google to fake this kind of thing.
           | 
           | All they've done is completely destroy my trust in anything
           | they present.
        
       | zozbot234 wrote:
       | Good, that video was mostly annoying and creepy. The AI responses
       | as shown in the linked Google dev blogpost are a lot more
       | reasonable and helpful. BTW I agree that the way the original
       | video was made seems quite misleading in retrospect. But that's
       | also par for the course for AI "demos", it's an enduring
       | tradition in that field and part of its history. You really have
       | to look at production systems and ignore "demos" and pointless
       | proofs of concept.
        
         | peteradio wrote:
         | What the Quack? I found it tasty as pate.
        
         | danielbln wrote:
         | The GPT-4 demo early this year when it was released was a lot
         | less.. fake, and in fact very much indicative of it's feature
         | set. The same is true for what OpenAI showed during their dev
         | days, so at the very least those demos don't have too much
         | fakery going on, as far as I could tell.
        
         | Frost1x wrote:
         | >You really have to look at production systems and ignore
         | "demos" and pointless proofs of concept.
         | 
         | While I agree, I wouldn't call proofs or concepts and demos
         | pointless. They often illustrate a goal or target functionality
         | you're working towards. In some cases it's really just a matter
         | of allotting some time and resources to go from a concept to a
         | product, no real engineering is needed, it all exists, but
         | there's capital needed to get there.
         | 
         | Meanwhile some proof of concepts skip steps and show higher
         | level function that needs some serious breakthrough work to get
         | to, maybe multiple steps of that. Even this is useful because
         | it illustrates a vision that may be possible so people can
         | understand and internalize things you're trying to do or the
         | real potential impact of something. That wasn't done here, it
         | was embedded in a side note. That information needs to be
         | before the demo to some degree without throwing a wet blanket
         | on everything and needs to be in the same medium as the demo
         | itself so it's very clear what you're seeing.
         | 
         | I have no problem with any of that. I have a lot of problems
         | when people don't make it explicitly clear beforehand that it's
         | a demo and explain earnestly what's needed. Is it really
         | something that exists today in working systems someone just
         | needs to invest money and wire it up without new research
         | needed? Or is it missing some breakthroughs, how many/what are
         | they, how long have these things been pursued, how many people
         | are working on them... what does recent progress look like and
         | so on (in a nice summarized fashion).
         | 
         | Any demo/poc should come up front with an earnest general
         | feasibility assessment. When a breakthrough or two are needed
         | then that should skyrocket. If it's just a lot of expensive
         | engineering then that's also a challenge but tractable.
         | 
         | I've given a lot of scientific tech demonstrations over the
         | years and the businesses behind me obviously want me to be as
         | vague as possible to pull money in. I of course have some of
         | those same incentives (I need to eat and pay my mortgage like
         | everyone else). None-the-less the draw of science to me has
         | always been pulling the veil from deception and mystery and I'm
         | a firm believer in being as upfront as possible. If you don't
         | lead with disclaimers, imaginations run wild into what can be
         | done today. Adding disclaimers helps imaginations run wild
         | about what can be done tomorrow, which I think is great.
        
       | borissk wrote:
       | Google did the same with Pixel 8 Pro advertising - they showed
       | stuff like photo and video editing, that people couldn't
       | replicate on their phones.
        
       | pizzafeelsright wrote:
       | I suppose this is a great example of how trust in authentic
       | videos, audio, images, company marketing must be questioned and,
       | until verified, assumed to be 'generated'.
       | 
       | I am curious, if the voice, email, chat, and shortly video can
       | all be entirely generated in real or near real time, how can we
       | be sure that remote employee is actually not a full or partially
       | generated entity?
       | 
       | Shared secrets are great when verifying but when the bodies are
       | fully remote - what is the solution?
       | 
       | I am traveling at the moment. How can my family validate that it
       | is ME claiming lost luggage and requesting a Venmo request?
        
         | takoid wrote:
         | >I am traveling at the moment. How can my family validate that
         | it is ME claiming lost luggage and requesting a Venmo request?
         | 
         | PGP
        
           | tadfisher wrote:
           | Now you have two problems.
           | 
           | (I say this in jest, as a PGP user)
        
         | adewinter wrote:
         | Make up a code phrase/word for emergencies, share it with your
         | family, then use it for these types of situations.
        
           | mikepurvis wrote:
           | Fair, but that also assumes the recipients ("family") are in
           | a mindset of constantly thinking about the threat model in
           | this type of situation and will actually insist on hearing
           | the passphrase.
        
             | pizzafeelsright wrote:
             | This will only work once.
        
         | vasco wrote:
         | Ask for information that only the actual person would know.
        
           | pizzafeelsright wrote:
           | That will only work once if the channels are monitored.
        
             | vasco wrote:
             | You only know one piece of information about your family? I
             | feel like I could reference many childhood facts or random
             | things that happened years ago in social situations.
        
         | raincole wrote:
         | If you can't verify whether your employee is AI, then you fire
         | them and replace them with AI.
        
           | vasco wrote:
           | The question is if an attacker tells you they lost access can
           | you please reset some credential, and your security process
           | is getting on a video call because you're a fully remote
           | company let's say.
        
         | kjkjadksj wrote:
         | At this point, probably a handwritten letter. Back to the 20th
         | century we go.
        
         | robbomacrae wrote:
         | I think it's also why we as a community should speak out when
         | we catch them for doing this as they are discrediting tech
         | demos. It won't be enough because a lie will be around the
         | world before the truth gets out the starting gates but we can't
         | just let this go unchecked.
        
       | GaggiX wrote:
       | I imagine the model also has some video embeddings like for the
       | example when it needed to find where the ball was hiding.
        
       | AndrewKemendo wrote:
       | I have used Swype texting since the t9 days.
       | 
       | If I demoed swype texting as it functions in my day to day life
       | to someone used to a querty keyboard they would never adopt it
       | 
       | The rate at which it makes wrong assumptions about the word, or I
       | have to fix it is probably 10% to 20% of the time
       | 
       | However because it's so easy to fix this is not an issue and it
       | doesn't slow me down at all. So within the context of the
       | different types of text Systems out there, I t's the best thing
       | going for me personally, but it takes some time to learn how to
       | use it.
       | 
       | This is every product.
       | 
       | If you demonstrated to people how something will actually work
       | after 100 hours of habituation and compensation for edge cases,
       | nobody would ever adopt anything.
       | 
       | I'm not sure how to solve this because both are bad.
       | 
       | (Edit: I'm keeping all my typos as meta-comment on this given
       | that I'm posting via swype on my phone :))
        
         | mulmen wrote:
         | Does swype make editing easier somehow? iOS spellcheck has
         | negative value. I turned it off years ago and it reduced errors
         | but there are still typos to fix.
         | 
         | Unfortunately iOS text editing is also completely worthless. It
         | forces strange selections and inserts edited text in awkward
         | ways.
         | 
         | I'm a QWERTY texter but text entry on iOS is a complete
         | disaster that has only gotten worse over time.
        
           | mikepurvis wrote:
           | I'm an iOS user and prefer the swipe input implementation in
           | GBoard over the one in the native keyboard. I'm not sure what
           | the differences are, but GBoard just seems to overall make
           | fewer mistakes and do a better job correcting itself from
           | context.
        
             | wlesieutre wrote:
             | Have you tried the native keyboard since iOS 17? It's quite
             | a lot better than older versions.
        
             | nozzlegear wrote:
             | As I was reading Andrew's comment to myself, I was trying
             | to figure out when and why I stopped using swype typing on
             | my phone. Then it hit me - I stopped after I switched from
             | Android to iOS a few years ago. Something about the iOS
             | implementation just doesn't feel right.
        
               | rochak wrote:
               | Apple's version is shit. Period. That's why.
        
           | pb7 wrote:
           | Hard disagree. I could type your whole comment without any
           | typos completely blindly (except maybe "QWERTY" because
           | uppercaps don't get autocorrected).
        
             | newaccount74 wrote:
             | Apple autocorrect has a tendency to replace technical terms
             | with similar words, eg. rvm turns into rum or ram or
             | something.
             | 
             | It's even worse on the watch somehow. I take care to hit
             | every key exactly, the correct word is there, I hit space,
             | boom replaced with a completely different word. On the
             | watch it seems to replace almost every word with bullshit,
             | not just technical terms.
        
               | rootusrootus wrote:
               | > seems to replace almost every word with bullshit
               | 
               | Sort of related, it also doesn't let you cuss. It will
               | insist on replacing fuck with pretty much anything else.
               | I had to add fuck to the custom replacement dictionary so
               | it would let me be. What language I choose to use is mine
               | and mine alone, I don't want Nanny to clean it up.
        
         | peteradio wrote:
         | What is the latency is Swype? < 10ms? Not at all comparable to
         | the video.
        
         | kjkjadksj wrote:
         | Its honestly pretty mind boggling that we'd even use querty on
         | a smartphone. The entire point of the layout is to keep your
         | fingers on the home row. Meanwhile people text with a single or
         | two thumbs 100% of the time.
        
           | hiccuphippo wrote:
           | I use 8vim[0] from time to time, it's a good idea but needs a
           | dictionary/autocompletion. You can get ok speeds after an
           | hour of usage.
           | 
           | [0] https://f-droid.org/en/packages/inc.flide.vi8/
        
           | jerf wrote:
           | "The entire point of the layout is to keep your fingers on
           | the home row."
           | 
           | No, that is how you're _told to type_. You have to be told to
           | type that way precisely because QWERTY is _not_ designed to
           | keep your fingers on the home row. If you type in a layout
           | that is designed to do that, you don 't need to be _told_ to
           | keep your fingers on the home row, because you naturally
           | will.
           | 
           | Nobody really knows what the designers were thinking, which I
           | do not mean as sarcasm, I mean it straight. History lost that
           | information. But whatever they were thinking that is clearly
           | not it because it is plainly obvious just by looking at it
           | how bad it is at that. Nobody trying to design a layout for
           | "keeping your fingers on the home row" would leave
           | hjkl(semicolon) under the resting position of the dominant
           | hand for ~90% of the people.
           | 
           | This, perhaps in one of technical history's great ironies,
           | makes it a fairly good keyboard for swype-like technologies!
           | A keyboard layout like Dvorak that has "aoeui" all right next
           | to each other and "dhtns" on the other would be _constantly_
           | having trouble figuring out which one you meant between
           | "hat" and "ten" to name just one example. "uio" on qwerty
           | could probably stand a bit more separation, but "a" and "e"
           | are generally far enough apart that at least for me they
           | don't end up confused, and pushing the most common consonants
           | towards the outer part of the keyboard rather than clustering
           | them next to each other in the center (on the home row) helps
           | them be distinguishable too. "fghjkl" is almost a probability
           | dead zone, and the "asd" on the left are generally reasonably
           | distinct even if you kinda miss one of them badly.
           | 
           | I don't know what an optimal swype keyboard would be, and
           | there's probably still a good 10% gain to be made if someone
           | tried to make one, but it wouldn't be enough to justify
           | learning a new layout.
        
             | mvdtnz wrote:
             | > Nobody really knows what the designers were thinking,
             | which I do not mean as sarcasm, I mean it straight. History
             | lost that information.
             | 
             | My understanding of QWERTY layout is that it was designed
             | so that characters frequently used in succession should not
             | be able to be typed in rapid succession, so that typewriter
             | hammers had less chance of colliding. Or is this an urban
             | myth?
        
             | kjkjadksj wrote:
             | You have to be taught to use the home row because the
             | natural inclination for most people is to peck and hunt
             | with their two index fingers. Watch how old people or young
             | kids type. That being said staying on the home row is how
             | you type fast and make the most of the layout. Everything
             | is comfortably reachable for the most part unless you are a
             | windows user ime.
        
               | jerf wrote:
               | If you learn a keyboard layout where the home row is
               | actually the most common keys you use, you will not have
               | to be encouraged to use the home row. You just will. I
               | know, because I have, and I never "tried" to use the home
               | row.
               | 
               | People don't hunt and peck after years of keyboard use
               | because of the keyboard; they do it because of the
               | keyboard _layout_.
               | 
               | If you want to prove I'm wrong, go learn Dvorak or
               | Colemak and show me that once you're comfortable you
               | still hunt and peck. You won't be, because it wouldn't
               | even make sense. Or, less effort, find a hunt & peck
               | Dvorak or Colemak user who is definitely at the
               | "comfortable" phase.
        
             | bigtunacan wrote:
             | Hold up young one. The reason for QWERTYs design has
             | absolutely not been lost to history yet.
             | 
             | The design was to spread out the hammers of the most
             | frequently used letters to reduce the frequency of hammer
             | jamming back when people actually used typewriters and not
             | computers.
             | 
             | The problem it attempted to improve upon, and which is was
             | pretty effective at, is just a problem that no longer
             | exists.
        
               | saagarjha wrote:
               | I'm curious how this works because all the common letters
               | seem to be next to each other on the left side of the
               | keyboard
        
               | zlg_codes wrote:
               | The original intent I do believe was not separating the
               | hammers per se, but also helping the hands alternate, so
               | they would naturally not jam as much.
               | 
               | However, I use a Dvorak layout and my hands feel like
               | they alternate better on that due to the vowels being all
               | on one hand. The letters are also in more sensical
               | locations, at least for English writing.
               | 
               | It can get annoying when G and C are next to each other,
               | and M and W, but most of the time I type faster on Dvorak
               | than I ever did on Qwerty. It helps that I learned during
               | a time where I used qwerty at work and Dvorak at home, so
               | the mental switch only takes a few seconds now.
        
               | jerf wrote:
               | Also apocryphal: https://en.wikipedia.org/wiki/QWERTY#Con
               | temporaneous_alterna...
               | 
               | And it does a bad job at it, which is further evidence
               | that it was not the design consideration. People may not
               | have been able to run a quick perl script over a few
               | gigabytes of English text, but they would have gotten
               | much closer if that was the desire. I don't believe that
               | was their goal but they were just too stupid to get it
               | even close to right.
        
           | heleninboodler wrote:
           | The reason we use qwerty on a smartphone is extremely
           | straightforward: people tend to know where to look for the
           | keys already, so it's easy to adopt to even though it's not
           | "efficient". We know it better than we know the positions of
           | letters in the alphabet. You can easily see the difference if
           | you're ever presented with an onscreen keyboard that's in
           | alphabetical order instead of qwerty (TVs do this a lot, for
           | some reason, and it's a different physical input method but
           | alpha order really does make you have to stop and hunt). It
           | slows you down quite a bit.
        
             | swores wrote:
             | That's definitely a good reason why, but perhaps if iOS or
             | Android were to research what the best layout is for
             | typical touch screen typing and release that as a new
             | default, people would find it quite quick to learn a second
             | layout and soon get just the benefits?
             | 
             | After all, with TVs I've had the same experience as you
             | with the annoying alphabetical keyboard, but we type into
             | they maybe a couple of times a year, or maybe once in 5
             | years, whereas if we changed our phone keyboard layout we'd
             | likely get used to it quite quickly.
             | 
             | Even if not going so far as to push it as a new default for
             | all users (I'm willing to accept the possibility that I'm
             | speaking for myself as the kind of geeky person who
             | wouldn't mind the initial inconvenience of a new kb layout
             | if it meant saving time in the long run, and that maybe a
             | large majority of people would just hate it too much to be
             | willing to give it a chance), they could at least figure
             | out what the best layout is (maybe this has been studied
             | and decided already, by somebody?) and offer that as an
             | option for us geeks.
        
               | mellinoe wrote:
               | Even most technically-minded people still use QWERTY on
               | full-size computer keyboards despite it being a terrible
               | layout for a number of reasons. I really doubt a new,
               | nonstandard keyboard would get much if any traction on
               | phones.
        
               | aaronax wrote:
               | T9 was fine for typing and probably hundreds of millions
               | of people used it.
        
           | rurp wrote:
           | Path dependency is the reason for this, and is the reason why
           | a lot of things are the way they are. An early goal with
           | smart phone keyboards was to take a tool that everyone
           | already knew how to use, and port it over with as little
           | friction as possible. If smart phones happened to be invented
           | before external keyboards the layouts probably would have
           | been quite different.
        
         | skywhopper wrote:
         | I know marketing is marketing, but it's bad form IMO to "demo"
         | something in a manner totally detached from its actual manner
         | of use. A swype keyboard takes practice to use, but the demos
         | of that sort of input typically show it being used in a
         | realistic way, even if the demo driver is an "expert".
         | 
         | This is the sort of demo that 1) gives people a misleading idea
         | of what the product can actually do; and 2) ultimately
         | contributes to the inevitable cynical backlash.
         | 
         | If the product is really great, people can see it in a
         | realistic demo of its capabilities.
        
         | mvdtnz wrote:
         | Showing a product in its best light is one thing. Demonstrating
         | a mode of operation that doesn't exist is entirely another. It
         | would be like if a demo of your swipe keyboard included
         | telepathic mind control for correcting errors.
        
           | AndrewKemendo wrote:
           | I'm not sure I'd agree that what they showed will never be
           | possible and in fact my whole point is that I think Google
           | can most likely deliver on that in this specific case. Chalk
           | it up to my experience in the space, but from what I can see
           | it looks like something Google can actually execute on
           | (unlike many areas where they fail on product regularly).
           | 
           | I would agree completely that it's not ready for consumers
           | the way it was displayed, which is my point.
           | 
           | I do want to add that I believe that the right way to do
           | these types of new product rollout is not with these giant
           | public announcements.
           | 
           | In fact, I think generally speaking the "right" way to do
           | something like this demonstrates only things that are
           | possible robustly. However that's not the market that Google
           | lives in. They're capitalists trying to make as much money as
           | possible. I'm simply evaluating that what they're showing I
           | think is absolutely technically possible and I think Google
           | can deliver it even if its not ready today.
           | 
           | Do I think it's supremely ethical the way that they did it?
           | No I don't.
        
             | robbomacrae wrote:
             | The voice interaction part didn't look a far cry from what
             | we are doing with Dynamic Interaction at SoundHound.
             | Because of this I assumed (like many it seems) that they
             | had caught up.
             | 
             | And it's dangerous to assume they can just "deliver later".
             | It's not that simple. If it is why not bake it in right now
             | instead of committing fraud?
             | 
             | This is damaging to companies that walk the walk and then
             | people have literally said to me "but what about that
             | Gemini"? and dismiss our work.
        
               | AndrewKemendo wrote:
               | I feel that more than you realize
               | 
               | That was basically what magic leap did to the whole AR
               | development market. Everyone deep in it knew they
               | couldn't do it but they messed up so badly that it
               | basically killed the entire industry
        
             | mvdtnz wrote:
             | I don't care what google could, in theory, deliver on some
             | time in the future maybe. That's irrelevant. They are
             | demonstrating something that can't be done with the product
             | as they are selling it.
        
         | Aurornis wrote:
         | > However because it's so easy to fix this is not an issue and
         | it doesn't slow me down at all.
         | 
         | But that's a different issue than LLM hallucinations.
         | 
         | With Swype, you already know what the correct output looks
         | like. If the output doesn't match what you wanted, you
         | immediately understand and fix it.
         | 
         | When you ask an LLM a question, you don't necessarily know the
         | right answer. If the output looks confident enough, people take
         | it as the truth. Outside of experimenting and testing, people
         | aren't using LLMs to ask questions for which they already know
         | the correct answer.
        
         | cja wrote:
         | I think you mean swipe. Swype was a brilliant third party
         | keyboard app for Android which was better at text prediction
         | and manual correction than Gboard is today. If however you
         | really do still use Swype then please tell me how because I
         | miss it.
        
           | AndrewKemendo wrote:
           | Ha good point, and yes I agree Swype continues to be the best
           | text input technology that I'll never be able to use again. I
           | guess I just committed genericide here but I meant the
           | general "swiping" process at this point
        
         | snowwrestler wrote:
         | The insight here is that the speed of correction is a crucial
         | component of the perceived long-term value of an interface
         | technology.
         | 
         | It is the main reason that handwriting recognition did not
         | displace keyboards. Once the handwriting is converted to text,
         | it's easier to fix errors with a pointer and keyboard. So after
         | a few rounds of this most people start thinking: might as well
         | just start with the pointer and keyboard and save some time.
         | 
         | So the question is, how easy is it to detect and correct errors
         | in generative AI output? And the unfortunate answer is that
         | unless you already know the answer you're asking for, it can be
         | very difficult to pick out the errors.
        
           | AndrewKemendo wrote:
           | I think this is a good rebuttal.
           | 
           | Yeah the feedback loop with consumers has a higher likelihood
           | of being detrimental, so even if the iteration rate is high,
           | it's potentially high cost at each step.
           | 
           | I think the current trend is to nerf the models or otherwise
           | put bumpers on them so people can't hurt themselves. That's
           | one approach that is brittle at best and someone with more
           | risk tolerance (OpenAI) will exploit that risk gap.
           | 
           | It's a contradiction then at best and depending on the level
           | of unearned trust from the misleading marketing, will
           | certainly lead to some really odd externalities
           | 
           | Think "man follows google maps directions into pond" but for
           | vastly more things.
           | 
           | I really hated marketing before but yeah this really proves
           | the warning I make in the AI addendum to my scarcity theory
           | (in my bio).
        
       | CamperBob2 wrote:
       | Any sufficiently-advanced technology is indistinguishable from a
       | rigged demo.
        
         | peteradio wrote:
         | Fake it til you make it, then keep faking it.
        
       | rollulus wrote:
       | I watched this video, impressed, and thought: what if it's fake.
       | But then dismissed the thought because it would come out and the
       | damage wouldn't be worth it. I was wrong.
        
         | imiric wrote:
         | The worst part is that there won't be any damage. They'll
         | release a blog post with PR apologies, but the publicity they
         | got from this stunt will push up their brand in mainstream AI
         | conversations regardless.
         | 
         | "There's no such thing as bad publicity."
        
           | steego wrote:
           | There's no such thing as bad publicity only applies to people
           | and companies that know how to spin it.
           | 
           | Reading the comments of all these disillusioned developers,
           | it's already damaged them because now smart people will be
           | extra dubious when Google starts making claims.
           | 
           | They just made it harder for themselves to convince
           | developers to even try their APIs, let alone bet on them.
           | 
           | This was stupid.
        
       | h0rv wrote:
       | https://nitter.unixfox.eu/parmy/status/1732811357068615969?f...
        
       | Nekorosu wrote:
       | Gemini demo looks like ChatGPT with a video feed, except it
       | doesn't exist, like ChatGPT. I have ChatGPT on my phone right
       | now, and it works (and it can process images, audio, and audio
       | feed in). This means Google has shown nothing of substance. In my
       | world, it's a classic stock price manipulation move.
        
         | onlyrealcuzzo wrote:
         | Gemini Pro is available on Bard now.
         | 
         | Ultra is not yet available.
        
           | replwoacause wrote:
           | Yeah and have you tried it? It's as dogshit as the original
           | Bard.
        
       | Kim_Bruning wrote:
       | Even a year ago, this advert would have been obvious puffery in
       | advertising.
       | 
       | But right now, all the bits needed to do this already exist (just
       | need to be assembled and -to be fair- given a LOT of polish), so
       | it would be somewhat reasonable to think that someone had
       | actually Put In The Work already.
        
       | xnx wrote:
       | That demo was much further on the "marketing" end of the spectrum
       | when compared to some of their other videos from yesterday which
       | even included debug views: https://youtu.be/v5tRc_5-8G4?t=43
        
       | karaterobot wrote:
       | This is endemic to public product demos. The thing never works as
       | it does in the video. I'm not excusing it, I'm saying: don't
       | trust public product demos. They are commercials, they exist to
       | sell to you, not to document objectively and accurately, and they
       | will always lie and mislead within the limits of the law.
        
       | k__ wrote:
       | I really thought this was a realtime demo.
       | 
       | Shame on them :(
        
       | suriyaG wrote:
       | The bloomberg article seems to have been taken down and is now
       | going to 404.
       | https://www.bloomberg.com/opinion/articles/2023-12-07/google...
        
         | dilap wrote:
         | Just an error in the link, here's the corrected version:
         | https://www.bloomberg.com/opinion/articles/2023-12-07/google...
        
           | thrtythreeforty wrote:
           | and here's a readable version: https://archive.ph/ABhZi
        
       | dougmwne wrote:
       | I was fooled. The model release announcement said it could accept
       | video and audio multi-modal input. I understood that there was a
       | lot of editing and cutting, but I really believed I was looking
       | at an example of video and audio input. I was completely
       | impressed since it's quite a leap to go from text and still
       | images to "eyes and ears." There's even the segment where
       | instruments are drown and music was generated. I thought I was
       | looking at a model that could generate music based on language
       | prompts, as we have seen specialized models do.
       | 
       | This was all fake. You are taking a collection of cherry picked
       | prompt engineered examples, then dramatizing them for maximum
       | shareholder hype. The music example was just outputting a
       | description of a song, not the generated music we heard in the
       | video.
       | 
       | It's one thing to release a hype video with what-ifs and quite
       | another to claim that your new multi-modal model is king of the
       | hill then game all the benchmarks and fake all the demos.
       | 
       | Google seems to be in an evil phase. OpenAI and MS must be quite
       | pleased with themselves.
        
         | skepticATX wrote:
         | Exactly. Personally I'm fine with both:
         | 
         | 1) Forward looking demoes that demonstrate the future of your
         | product, where it's clear that you're not there yet but working
         | in that direction
         | 
         | or
         | 
         | 2) Demoes that show off current capabilities, but are scripted
         | and edited to do so in the best light possible.
         | 
         | Both of those are standard practice and acceptable. What Google
         | did was just wrong. They deserve to face backlash for this.
        
         | miraculixx wrote:
         | Do you believe everything verbatim that companies tell you in
         | advertising?
        
           | gregshap wrote:
           | If they show a car driving I believe it's capable of self-
           | propulsion and not just rolling downhill.
        
             | dylan604 wrote:
             | Hmm, might I interest you in a video of an electric semi-
             | truck?
        
             | macNchz wrote:
             | A marketing trick that has, in fact, been tried:
             | https://arstechnica.com/cars/2020/09/nikola-admits-
             | prototype...
        
               | daveguy wrote:
               | Used to be "marketing tricks" were prosecuted as fraud.
        
               | jes5199 wrote:
               | still is. Nikola's CEO, Trevor Milton, was convicted of
               | fraud and is awaiting sentencing.
        
               | olliej wrote:
               | If I recall correctly, that led to literal criminal fraud
               | charges.
               | 
               | And iirc Tesla is also being investigated for fraudulent
               | claims for faking the safety of their self driving cars.
        
           | sp332 wrote:
           | When a company invents tech that can do this, how would their
           | ad be different?
        
           | slim wrote:
           | this was plausible
        
           | steego wrote:
           | No, but most people tend to make a mental note of which
           | companies tend to deliver and which ones work hard to mislead
           | them.
           | 
           | You do understand the concept of reputation, right?
        
         | replwoacause wrote:
         | Well put. I'm not touching anything Google does any more.
         | They're far too dishonest. This failed attempt at a release
         | (which turns out was all sizzle and no steak) only underscored
         | how far behind OpenAI they actually are. I'd love to have been
         | a fly on the wall in the OAI offices when this demo video went
         | live.
        
         | renegade-otter wrote:
         | This kind of moral fraud - unethical behavior - is tolerated
         | for some reason. It's almost like investors want to be fooled.
         | There is no room for due diligence. They squeel like excited
         | Taylor Swift fans as they are being lied to.
        
         | rdedev wrote:
         | Seems reminiscent of a video where the lead research department
         | within Google is an animation studio (wish I could remember
         | more about that video)
         | 
         | Doing all these hype videos just for the sake of satisfying
         | shareholders or whatever is just making me loose trust in their
         | research division. I don't think they did anything like this
         | when they released Bert.
        
           | Davidzheng wrote:
           | I agree completely. When alphazero was announced I remember
           | feeling like shocked over how they stated this revolutionary
           | breakthrough as if it was like a regular thing. Alphafold and
           | Alphacode are also impressive but this one just sounds like
           | it was forced from Sundar and not the usual deepmind
        
         | hanspeter wrote:
         | I too thought it was able to accept video.
         | 
         | Given the massive data volume in videos, I assumed it processed
         | video into pictures by extracting a frame per second or
         | something along those lines, while still taking the entire
         | video as the initial input.
         | 
         | Turns out, it wasn't even doing that!
        
       | iamleppert wrote:
       | You can tell whoever put together that demo video gave no f*cks
       | whatsoever. This is the quality of work you can expect under an
       | uninspiring leader (Sundar) in a culture of constant layoff fear
       | and bureaucracy.
       | 
       | Literally everyone I know who works at Google hates their job and
       | are completely checked out.
        
         | CamperBob2 wrote:
         | Huh? It was a GREAT demo video!
         | 
         | If it had been real, that is.
        
       | htk wrote:
       | The whole Gemini webpage and contents felt weird to me, it's in
       | the uncanny valley of trying to look and feel like an Apple
       | marketing piece. The hyperbolic language, surgically precise
       | ethnic/gender diversity, unnecessary animations and the sales
       | pitch from the CEO felt like a small player in the field trying
       | to pass as a big one.
        
         | kjkjadksj wrote:
         | I'm imagining the project managers are patting themselves on
         | the back for checking all the performative boxes, blind to the
         | absolute satire of it all.
        
         | cedws wrote:
         | I got the same vibes. Ultra and Pro. It feels tacky that it
         | declares the "Gemini era" before it's even available. Google
         | _really_ want to be seen as level on the playing field.
        
         | crazygringo wrote:
         | > _surgically precise ethnic /gender diversity_
         | 
         | What does that mean and why is it bad?
         | 
         | Diversity in marketing is used because, well, your desired
         | market is diverse.
         | 
         | I don't know what it means for it to be surgically precise,
         | though.
        
           | cheeze wrote:
           | Agreed with your comment. This is every marketing department
           | on the planet right now, and it's not a bad thing IMO. Can
           | feel a bit forced at times, but it's better than the
           | alternative.
        
         | kozikow wrote:
         | It's funny because now the OpenAI keynote feels like it's
         | emulating the Google keynotes from 5 years ago.
         | 
         | Google Keynote feels like it's emulating the Apple keynote from
         | 5 years ago.
         | 
         | And the Apple keynote looks like robots just out of an uncanny
         | valley pretending to be humans - just like keynotes might look
         | in 5 years, but actually made by AI. Apple is always ahead of
         | the curve in keynote trends.
        
           | robbomacrae wrote:
           | The more I think about this the more it rings true...
        
           | wharvle wrote:
           | I hadn't thought about it until just now, but the most recent
           | Apple events really are the closest real-person thing I've
           | ever seen to some of the "good" computer generated
           | photorealistic (kinda...) humans "reading" with text-to-
           | speech that I've seen.
           | 
           | It's the stillness between "beats" that does it, I think, and
           | the very-constrained and repetitive motion.
        
       | sheepscreek wrote:
       | I don't understand why is Gemini even considered "jaw-dropping"
       | to begin with. GPT-4V has set the bar so high that all their
       | demos and presentations paled in comparison. And it's available
       | for anyone to use. People have already build mind-blowing demos
       | with it (like https://arstechnica.com/information-
       | technology/2023/11/ai-po...).
       | 
       | The entire launch felt like a concentrated effort to "appear"
       | competitive to OpenAI. Google was splitting hairs talking about
       | low single digit percentage improvement in benchmarks. Against a
       | model that has been out for over 6 months.
       | 
       | I have never been so unimpressed with them. Not only has OpenAI
       | managed to snag this one from under Google's nose, IMO - they
       | seem to be defending their lead quite well. Now that is something
       | unmistakably remarkable. Color me impressed!
        
         | kjkjadksj wrote:
         | Some other commenter, a former googler, a while back alluded to
         | figuring out the big secret and being thrown for a tizzy at the
         | resulting cognitive dissonance they realize they've been buying
         | into. Its never about making a good product. Its about keeping
         | up with the joneses in the eyes of tech investors. And just
         | look at the movement on the stock today as a result of this
         | probable lemon of a product: nothing else mattered except
         | keeping up appearances. CEOs make historic careers optimizing
         | companies for appearances over function like this.
        
       | modeless wrote:
       | That's not the only thing wrong. Gemini makes a false statement
       | in the video, serving as a great demonstration of how these
       | models still outright lie so frequently, so casually, and so
       | convincingly that you won't notice, even if you have a whole team
       | of researchers and video editors reviewing the output.
       | 
       | It's the single biggest problem with LLMs and Gemini isn't
       | solving it. You simply can't rely on them when correctness is
       | important. Even when the model has the knowledge it would need to
       | answer correctly, as in this case, it will still lie.
       | 
       | The false statement is after it says the duck floats, it
       | continues "It is made of a material that is less dense than
       | water." This is false; "rubber" ducks are made of vinyl polymers
       | which are more dense than water. It floats because the hollow
       | shape contains air, of course.
        
         | ace2358 wrote:
         | I totally agree with you on the confident lies. And it's really
         | tough. Technically the duck is made out of air and plastic
         | right?
         | 
         | If I pushed the model further on the composition of a rubber
         | duck, and it failed to mention its construction, then it'd be
         | lying.
         | 
         | However there is this disgusting part of language where a
         | statement can be misleading, technically true, not the whole
         | truth, missing caveats etc.
         | 
         | Very challenging problem. Obviously Google decided to mislead
         | the audience and basically cover up the shortcomings. Terrible
         | behaviour.
        
           | mechagodzilla wrote:
           | No, the density of the object is less than water, not the
           | density of the material. The Duck is made of plastic, and it
           | traps air. Similarly, you can make a boat that floats in
           | water out of concrete or metal. It is an important
           | distinction when trying to understand buoyancy.
        
           | modeless wrote:
           | Calling the air inside the duck (which is not sealed inside)
           | part of its "material" would be misleading. That's not how
           | most people would interpret the statement and I'm confident
           | that's not the explanation for why the statement was made.
        
             | onedognight wrote:
             | The air doesn't matter. Even with a vacuum inside it would
             | float. It's the overall density of "the duck" that matters,
             | not the density of the plastic.
        
               | hunter2_ wrote:
               | A canoe floats, and that doesn't even command any thought
               | regarding whether you can replace trapped air with a
               | vacuum. If you had a giant cube half full of water, with
               | a boat on the water, the boat would float regardless of
               | whether the rest of the cube contained air or vacuum, and
               | regardless of whether the boat traps said air (like a
               | pontoon) or is totally vented (like a canoe). The overall
               | density of the canoe is NOT influenced by its shape or
               | any air, though. The canoe is strictly more dense than
               | water (it will sink if it capsizes) yet in the correct
               | orientation it floats.
               | 
               | What does matter, however, is the overall density of the
               | space that was water and became displaced by the canoe.
               | That space can be populated with dense water, or with a
               | less dense canoe+air (or canoe+vacuum) combination.
               | That's what a rubber duck also does: the duck+air (or
               | duck+vacuum) combination is less dense than the displaced
               | water.
        
         | jmathai wrote:
         | This seems to be a common view among some folks. Personally,
         | I'm impartial.
         | 
         | Search or even asking other expert human beings are prone to
         | provide incorrect results. I'm unsure where this expectation of
         | 100% absolute correctness comes from. I'm sure there are use
         | cases, but I assume it's the vast minority and most can
         | tolerate larger than expected inaccuracies.
        
           | stefan_ wrote:
           | Let's see, so we exclude law, we exclude medical.. it's
           | certainly not a "vast minority" and the failure cases are
           | nothing at all like search or human experts.
        
             | jmathai wrote:
             | Are you suggesting that failure cases are lower when
             | interacting with humans? I don't think that's my experience
             | at all.
             | 
             | Maybe I've only ever seen terrible doctors but I _always_
             | cross reference what doctors say with reputable sources
             | like WebMD (which I understand likely contain errors).
             | Sometimes I 'll go straight to WebMD.
             | 
             | This isn't a knock on doctors - they're humans and prone to
             | errors. Lawyers, engineers, product managers, teachers too.
        
               | stefan_ wrote:
               | You think you ask your legal assistant to find some
               | precedents related to your current case and they will
               | come back with an A4 page full of made up cases that
               | sound vaguely related and convincing but _are not real_?
               | I don 't think you understand the failure case at all.
        
               | jmathai wrote:
               | That example seems a bit hyperbolic. Do you think lawyers
               | who leverage ChatGPT will take the made up cases and
               | present them to a judge without doing some additional
               | research?
               | 
               | What I'm saying is that the tolerance for mistakes is
               | strongly correlated to the value ChatGPT creates. I think
               | both will need to be improved but there's probably more
               | opportunity in creating higher value.
               | 
               | I don't have a horse in the race.
        
               | jazzyjackson wrote:
               | > Do you think lawyers who leverage ChatGPT will take the
               | made up cases and present them to a judge without doing
               | some additional research?
               | 
               | You don't?
               | 
               | https://fortune.com/2023/06/23/lawyers-fined-filing-
               | chatgpt-...
        
               | modeless wrote:
               | > Do you think lawyers who leverage ChatGPT will take the
               | made up cases and present them to a judge without doing
               | some additional research?
               | 
               | I generally agree with you, but it's funny that you use
               | this as an example when it already happened.
               | https://arstechnica.com/tech-policy/2023/06/lawyers-have-
               | rea...
        
               | jmathai wrote:
               | _facepalm_
        
               | freejazz wrote:
               | What would be the point of a lawyer using chatGPT if it
               | had to root through every single reference chatGPT relied
               | upon? I don't have to doublecheck every reference of a
               | junior attorney, because they actually know what they are
               | doing, and when they don't, it's easy to tell and wont
               | come with fraudulently created decisions/pleadings, etc
        
               | anon373839 wrote:
               | > Do you think lawyers who leverage ChatGPT will take the
               | made up cases and present them to a judge without doing
               | some additional research
               | 
               | I really don't recommend using ChatGPT (even GPT-4) for
               | legal research or analysis. It's simply terrible at it if
               | you're examining anything remotely novel. I suspect there
               | is a valuable RAG application to be built for searching
               | and summarizing case law, but the "reasoning" ability and
               | stored knowledge of these models is worse than useless.
        
           | binwiederhier wrote:
           | I'm a software engineer, and I more or less stopped asking
           | ChatGPT for stuff that isn't mainstream. It just hallucinates
           | answers and invents config file options or language
           | constructs. Google will maybe not find it, or give you an
           | occasional outdated result, but it rarely happens that it
           | just finds stuff that's flat out wrong (in technology at
           | least).
           | 
           | For mainstream stuff on the other hand ChatGPT is great. And
           | I'm sure that Gemini will be even better.
        
             | jmathai wrote:
             | > it rarely happens that it just finds stuff that's flat
             | out wrong
             | 
             | "Flat out wrong" implies determinism. For answers which are
             | deterministic such as "syntax checking" and "correctness of
             | code" - this already happens.
             | 
             | ChatGPT, for example, will write and execute code. If the
             | code has an error or returns the wrong result it will try a
             | different approach. This is in production today (I use the
             | paid version).
        
               | bongodongobob wrote:
               | Dollars to doughnuts says they are using GPT3.5.
        
               | tomjakubowski wrote:
               | I'm currently working with some relatively obscure but
               | open source stuff (JupyterLite and Pyodide) and ChatGPT 4
               | confidently hallucinates APIs and config options when I
               | ask it for help.
               | 
               | With more mainstream libraries it's pretty good though
        
             | yieldcrv wrote:
             | I use chatgpt4 for very obscure things
             | 
             | If I ever worried about being quoted then I'll verify the
             | information
             | 
             | otherwise I'm conversational, have taken an abstract idea
             | into a concrete one and can build on top of it
             | 
             | But I'm quickly migrating over to mistral and if that
             | starts going off the rails I get an answer from chatgpt4
             | instead
        
             | potatolicious wrote:
             | The important thing is that with Web Search as a user you
             | can learn to adapt to varying information quality. I have a
             | higher trust for Wikipedia.org than I do for SEO-R-US.com,
             | and Google gives me these options.
             | 
             | With a chatbot that's largely impossible, or at least
             | impractical. I don't know where it's getting anything from
             | - maybe it trained on a shitty Reddit post that's 100%
             | wrong, but I have no way to tell.
             | 
             | There has been some work (see: Bard, Bing) where the LLM
             | attempts to cite its sources, but even then that's of
             | limited use. If I get a paragraph of text as an answer, is
             | the expectation really that I crawl through each substring
             | to determine their individual provenances and
             | trustworthiness?
             | 
             | The _shape_ of a product matters. Google as a linker
             | introduces the ability to adapt to imperfect information
             | quality, whereas a chatbot does not.
             | 
             | As an exemplar of this point - I _don 't_ trust when Google
             | simply pulls answers from other sites and shows it in-line
             | in the search results. I don't know if I should trust the
             | source! At least there I can find out the source from a
             | single click - with a chatbot that's largely impossible.
        
           | dylan604 wrote:
           | > I'm unsure where this expectation of 100% absolute
           | correctness comes from.
           | 
           | It's a computer. That's why. Change the concept slightly:
           | would you use a calculator if you had to wonder if the answer
           | was correct or maybe it just made it up? Most people feel the
           | same way about any computer based anything. I personally feel
           | these inaccuracies/hallucinations/whatevs are only allowing
           | them to be one rung up from practical jokes. Like I honestly
           | feel the devs are fucking with us.
        
             | bostonpete wrote:
             | Speech to text is often wrong too. So is autocorrect. And
             | object detection. Computers don't have to be 100% correct
             | in order to be useful, as long as we don't put too much
             | faith in them.
        
               | dylan604 wrote:
               | Your caveat is not the norm though, as everyone _is_
               | putting a lot of faith in them. So, that 's part of the
               | problem. I've talked with people that aren't developers,
               | but they are otherwise smart individuals that have
               | absolutely not considered that the info is not correct.
               | The readers here are a bit too close to the subject, and
               | sometimes I think it is easy to forget that the vast
               | majority of the population do not truly understand what
               | is happening.
        
               | dr_dshiv wrote:
               | Nah, I don't think anything has the potential to build
               | critical thinking like LLMs en masse. I only worry that
               | they will get better. It's when they are 99.9% correct we
               | should worry.
        
               | clumpthump wrote:
               | Call me old fashioned, but I would absolutely like to see
               | autocorrect turned off in many contexts. I much prefer to
               | read messages with 30% more transparent errors rather
               | than any increase in opaque errors. I can tell what
               | someone meant if I see "elephent in the room", but not
               | "element in the room" (not an actual example, autocorrect
               | would likely get that one right).
        
               | llbeansandrice wrote:
               | People put too much faith in conspiracy theories they
               | find on YT, TikTok, FB, Twitter, etc. What you're
               | claiming is already not the norm. People already put too
               | much faith into all kinds of things.
        
             | kamikaz1k wrote:
             | Okay, but search is done on a computer, and like the person
             | you're replying to said, we accept close enough.
             | 
             | I don't necessarily disagree with your interpretation, but
             | there's a revealed preference thing going on.
             | 
             | The number of non-tech ppl I've heard directly reference
             | ChatGPT now is absolutely shocking.
        
               | bitvoid wrote:
               | > The number of non-tech ppl I've heard directly
               | reference ChatGPT now is absolutely shocking.
               | 
               | The problem is that a lot of those people will take
               | ChatGPT output at face value. They are wholly unaware
               | that of its inaccuracies or that it hallucinates. I've
               | seen it too many times in the relatively short amount of
               | time that ChatGPT has been around.
        
               | bongodongobob wrote:
               | So what? People do this with Facebook news too. That's a
               | people problem, not an LLM problem.
        
               | janalsncm wrote:
               | If we rewind a little bit to the mid to late 2010s,
               | filter bubbles, recommendation systems and unreliable
               | news being spread on social media was a big problem. It
               | was a simpler time, but we never really solved the
               | problem. Point is, I don't see the existence of other
               | problems as an excuse for LLM hallucination, and writing
               | it off as a "people problem" really undersells how hard
               | it is to solve people problems.
        
               | dylan604 wrote:
               | People on social media are absolutely 100% posting things
               | deliberately to fuck with people. They are actively
               | seeking to confuse people, cause chaos, divisiveness, and
               | other ill intended purposes. Unless you're saying that
               | the LLM developers are actively doing the same thing, I
               | don't think comparing what people find on the socials vs
               | getting back as a response from a chatBot is a logical
               | comparison at all
        
               | zozbot234 wrote:
               | How is that any different from what these AI chatbots are
               | doing? They make stuff up that they predict will be
               | rewarded highly by humans who look at it. This is exactly
               | what leads to truisms like "rubber duckies are made of a
               | material that floats over water" - which _looks_ like it
               | should be correct, even though it 's wrong. It really is
               | no different from Facebook memes that are devised to get
               | a rise out of people and be widely shared.
        
               | dylan604 wrote:
               | Because we shouldn't be striving to make mediocrity. We
               | should be striving to build better. Unless the devs of
               | the bots are wanting to have a bot built on trying to
               | deceive people, I just don't see the purpose of this. If
               | we can "train" a bot and fine tune it, we should be fine
               | tuning truth and telling it what absolutely is bullshit.
               | 
               | To avoid the darker topics to keep the conversation on
               | the rails, if there were a misinformation campaign that
               | was trying to state that the Earth's sky is red, then the
               | fine tuning should be able to inform that this is clearly
               | fake so when quoting this it should be stated as
               | incorrect information that is out there. This kind of
               | development should be how we can clean up the fake, but
               | nope, we're seemingly quite happy at accepting it. At
               | least that's how your question comes off to me.
        
               | zozbot234 wrote:
               | Sure, but current AI bots are just following the human
               | feedback they get. If the feedback is naive enough to
               | score the factoid about rubber duckys as correct, guess
               | what, that's the kind of thing these AI's will target.
               | You can try to address this by prompting them with
               | requests like "do you think this answer is correct and
               | ethical? Think through this step by step" ('reinforcement
               | learning from AI feedback') but that's _very_ ad hoc and
               | uncertain - ultimately, the humans in the loop call the
               | shots.
        
               | p1esk wrote:
               | There are far more people who post obviously wrong,
               | confusing and dangerous things online with total
               | conviction. There are people who seriously believe Earth
               | is flat, for example.
        
               | lm28469 wrote:
               | Literally everything is a "people problem"
               | 
               | You can kill people with a fork, it doesn't mean you
               | should legally be allowed to own a nuclear bomb "because
               | it's just the same". The problem always come from scale
               | and accessibility
        
               | umvi wrote:
               | So you're saying we need a Ministry of Truth to protect
               | people from themselves? This is the same argument used to
               | suppress "harmful" speech on any medium.
        
               | dylan604 wrote:
               | I've gotten to the point where I want "advertisment"
               | stamped on anything that is, and I'm getting to the point
               | I want "fiction" stamped on anything that is. I have no
               | problem with fiction existing. It can be quite fun.
               | People trying to pass fiction as fact is a problem
               | though. Trying to force a "fact" stamp would be
               | problematic though, so I'd rather label everything else.
               | 
               | How to enforce it is the real sticky wicket though, so
               | it's only something best discussed at places like this or
               | while sitting around chatting while consuming
        
             | jen20 wrote:
             | "Computer says no" is not a meme for no reason.
        
             | altruios wrote:
             | why should all computing be deterministic?
             | 
             | let me show you this "genius"/"wrong-thinking" person as to
             | say about AL(artificial life) and deterministic computing.
             | 
             | https://www.cs.unm.edu/~ackley/
             | 
             | https://www.youtube.com/user/DaveAckley
             | 
             | To sum up a bunch of their content: You can make
             | intractable problems solvable/crunchable if you allow just
             | a little error into the result (which is reduced the longer
             | the calculation calculates). And this is acceptable for a
             | number of use cases where initial accuracy is less
             | important that instant feedback.
             | 
             | It is radically different from a Von Neumann model of a
             | computer - where there is a deterministic 'totalitarian
             | finger pointer' pointing to some registry (and only one
             | registry at a time) is an inherently limited factor. In
             | this model - each computational resource (a unit of ram,
             | and a processing unit) fights for and coordinates reality
             | with it's neighbors without any central coordination.
             | 
             | Really interesting stuff. still in its infancy...
        
           | modeless wrote:
           | Honestly I agree. Humans make errors all the time. Perfection
           | is not necessary and requiring perfection blocks deployment
           | of systems that represent a substantial improvement over the
           | status quo despite their imperfections.
           | 
           | The problem is a matter of degree. These models are
           | _substantially_ less reliable than humans and far below the
           | threshold of acceptability in most tasks.
           | 
           | Also, it seems to me that AI can and will surpass the
           | reliability of humans by a lot. Probably not by simply
           | scaling up further or by clever prompting, although those
           | will help, but by new architectures and training techniques.
           | Gemini represents no progress in that direction as far as I
           | can see.
        
           | epalm wrote:
           | I know exactly where the expectation comes from. The whole
           | world has demanded absolute precision from computers for
           | decades.
           | 
           | Of course, I agree that if we want computers to "think on
           | their own" or otherwise "be more human" (whatever that means)
           | we should expect a downgrade in correctness, because humans
           | are wrong all the time.
        
             | jmathai wrote:
             | > The whole world has demanded absolute precision from
             | computers for decades.
             | 
             | Computer engineers maybe. I think the general population is
             | quite tolerant of mistakes as long as the general value is
             | high.
             | 
             | People generally assign very high value to things computers
             | do. To test this hypothesis all you have to do is ask folks
             | to go a few days without their computer or phone.
        
             | creer wrote:
             | > The whole world has demanded absolute precision from
             | computers
             | 
             | The opposite. Far too tolerant of the excuse "sorry,
             | computer mistake." (But yeah, just at the same time as "the
             | computer says so".)
        
             | lancesells wrote:
             | Is it less reliable than an encyclopedia? It is less
             | reliable than Wikipedia? Those aren't infallible but what's
             | the expectation if it's wrong on something relatively
             | simple?
             | 
             | With the rush of investment in dollars and to use these in
             | places like healthcare, government, security, etc. there
             | should be absolute precision.
        
           | toxik wrote:
           | Aside: this is not what impartial means.
        
           | SkyBelow wrote:
           | Humans are imperfect, but this comes with some benefits to
           | make up for it.
           | 
           | First, we know they are imperfect. People seem to put more
           | faith into machines, though I do sometimes see people being
           | too trusting of other people.
           | 
           | Second, we have methods for measuring their imperfection.
           | Many people develop ways to tell when someone is answering
           | with false or unjustified confidence, at least in fields they
           | spend significant time in. Talk to a scientist about cutting
           | edge science and you'll get a lot of 'the data shows', 'this
           | indicates', or 'current theories suggest'.
           | 
           | Third, we have methods to handle false information that
           | causes harm. Not always perfect methods, but there are
           | systems of remedies available when experts get things wrong,
           | and these even include some level of judging reasonable
           | errors from unreasonable errors. When a machine gets it
           | wrong, who do we blame?
        
             | howenterprisey wrote:
             | Absolutely! And fourth, we have ways to make sure the same
             | error doesn't happen again; we can edit Wikipedia, or tell
             | the person they were wrong (and stop listening to them if
             | they keep being wrong).
        
           | snowwrestler wrote:
           | If it's no better than asking a random person, then where is
           | the hype? I already know lots of people who can give me free,
           | maybe incorrect guesses to my questions.
           | 
           | At least we won't have to worry about it obtaining god-like
           | powers over our society...
        
           | sorokod wrote:
           | Guessing from the last sentence that you are one of those
           | "most" who "can tolerate larger than expected inaccuracies".
           | 
           | How much inaccuraciy would that be ?
        
           | pid-1 wrote:
           | Most people I worked with either tell me "I don't know" or "I
           | think x, but with not sure" when they are not sure about
           | something, the issue with LLMs is they don't have this
           | concept.
        
           | taurath wrote:
           | I find it ironic that computer scientists and technologists
           | are frequently uberrationalists to the point of self parody
           | but they get hyped about a technology that is often
           | confidently wrong.
           | 
           | Just like the hype with AI and the billions of dollars going
           | into it. There's something there but it's a big fat unknown
           | right now whether any part of the investment will actually
           | pay off - everyone needs it to work to justify any amount of
           | the growth of the tech industry right now. When everyone
           | needs a thing to work, it starts to really lose the
           | fundamentals of being an actual product. I'm not saying it's
           | not useful, but is it as useful as the valuations and
           | investments need it to be? Time will tell.
        
           | latexr wrote:
           | If a human expert gave wrong answers as often and as
           | confidently as LLMs, most would consider no longer asking
           | them. Yet people keep coming back to the same LLM despite the
           | wrong answers to ask again in a different way (try that with
           | a human).
           | 
           | This insistence on comparing machines to humans to excuse the
           | machine is as tiring as it is fallacious.
        
           | kaffeeringe wrote:
           | 1. Hunans may also never be 100% - but it seems they are more
           | often correct. 2. When AI is wrong it's often not only
           | slighty off, but completely off the rails. 3. Humans often
           | tell you when they are not sure. Even if it's only their
           | tone. AI is always 100% convinced it's correct.
        
           | Frost1x wrote:
           | >I'm unsure where this expectation of 100% absolute
           | correctness comes from. I'm sure there are use cases, but I
           | assume it's the vast minority and most can tolerate larger
           | than expected inaccuracies.
           | 
           | As others hinted at, there's some bias because it's coming
           | from a computer, but I think it's far more nuanced than that.
           | 
           | I've worked with many experts and professionals through my
           | career ranging across medicine, various types of engineers,
           | scientists, academics, researchers and so on and the pattern
           | I often see is the level of certainty presented that always
           | bothers me and the same is often embedded in LLM responses.
           | 
           | While humans don't typically quantify the certainty of their
           | statements, the best SMEs I've ever worked with make it very
           | clear what level of certainty they have when making
           | professional statements. The SMEs who seem to be more often
           | wrong than not speak in certainty quite often (some of this
           | is due to cultural pressures and expectations surrounding
           | being an "expert").
           | 
           | In this case, I would expect a seasoned scientist to say
           | something in response to the duck question that: "many rubber
           | ducks exist and are designed to float, this one very well
           | might, we'd really need to test it or have far more
           | information about the composition of the duck, the design,
           | the medium we want it in (Water? Mecury? Helium?)" and so on.
           | It's not an exact answer but you understand there's
           | uncertainty there and we need to better clarify our question
           | and the information surrounding that question. The fact is,
           | it's really complex to know if it'll float or not from visual
           | information alone.
           | 
           | It could have an osmimum ball inside that overcomes most the
           | assumed buoyancy the material contains, including the air
           | demonstrated to make it squeak. It's not transparent. You
           | don't _know_ for sure and the easiest way to alleviate
           | uncertainty in this case is simply to test it.
           | 
           | There's _so_ much uncertainty in the world, around what seem
           | like the most certain and obvious things. LLMs seem to have
           | grabbed some of this bad behavior from human language and
           | culture where projecting confidence is often better (for
           | humans) than being correct.
        
           | eviks wrote:
           | Where did you get the 100% number from? It's not in the
           | original comment, it's not in a lot of similar criticisms of
           | the models.
        
         | brookst wrote:
         | Is it possible for humans to be wrong about something, without
         | lying?
        
           | windowshopping wrote:
           | I don't agree with the argument that "if a human can fail in
           | this way, we should overlook this failing in our tooling as
           | well." Because of course that's what LLMs are, tools, like
           | any other piece of software.
           | 
           | If a tool is broken, you seek to fix it. You don't just say
           | "ah yeah it's a broken tool, but it's better than nothing!"
           | 
           | All these LLM releases are amazing pieces of technology and
           | the progress lately is incredible. But don't rag on people
           | critiquing it, how else will it get better? Certainly not by
           | accepting its failings and overlooking them.
        
             | rkeene2 wrote:
             | If a broken tool is useful, do you not use it because it is
             | broken ?
             | 
             | Overpowered LLMs like GPT-4 are both broken (according to
             | how you are defining it) and useful -- they're just not the
             | idealized version of the tool.
        
               | freejazz wrote:
               | Maybe not if its the case that your use of the broken
               | tool would result in the eventual undoing of your work.
               | Like, lets say your staple gun is defective and doesn't
               | shoot the staples deep enough, but it still shoots. You
               | can keep using the gun, but it's not going to actually do
               | its job. It seems useful and functional, but it isn't and
               | its liable to create a much bigger mess.
        
             | freedomben wrote:
             | I think you're reading a lot into GP's comment that isn't
             | there. I don't see any ragging on people critiquing it. I
             | think it's perfectly compatible to think we should
             | continually improve on these things while also recognizing
             | that things can be useful without being perfect
        
             | stocknoob wrote:
             | "Broken" is word used by pedants. A broken tool doesn't
             | work. This works, most of the time.
             | 
             | Is a drug "broken" because it only cures a disease 80% of
             | the time?
             | 
             | The framing most critics seem to have is "it must be
             | perfect".
             | 
             | It's ok though, their negativity just means they'll miss
             | out on using a transformative technology. No skin off the
             | rest of us.
        
             | bee_rider wrote:
             | I think the comparison to humans is just totally useless.
             | It isn't even just that, as a tool, it should be better
             | than humans at the thing it does, necessarily. My monitor
             | is on an arm, the arm is pretty bad at positioning things
             | compared to all the different positions my human arms could
             | provide. But it is good enough, and it does it tirelessly.
             | A tool is fit for a purpose or not, the relative
             | performance compared to humans is basically irrelevant.
             | 
             | I think the folks making these tools tend to oversell their
             | capabilities because they want us to imagine the
             | applications we can come up with for them. They aren't
             | selling the tool, they are selling the ability to make
             | tools based on their platform, which means they need to be
             | speculative about the types of things their platform might
             | enable.
        
           | lxgr wrote:
           | Lying implies an intent to deceive despite, or giving a
           | response despite having better knowledge, which I'd argue
           | LLMs can't do, at least not yet. It just requires a more
           | robust theory of mind than I'd consider them to realistically
           | be capable of.
           | 
           | They might have been trained/prompted with misinformation,
           | but then it's the people doing the training/prompting who are
           | lying, still not the LLM.
        
             | og_kalu wrote:
             | Not to say this example was lying but they can lie just
             | fine - https://arxiv.org/abs/2311.07590
        
               | lxgr wrote:
               | They're lying in the same way that a sign that says "free
               | cookies" is lying when there are actually no cookies.
               | 
               | I think this is a different usage of the word, and we're
               | pretty used to making the distinction, but it gets
               | confusing with LLMs.
        
               | og_kalu wrote:
               | You are making an imaginary distinction that doesn't
               | exist. It doesn't even make any sense in the context of
               | the paper i linked.
               | 
               | The model consistently and purposefully withheld
               | knowledge it was directly aware of. This is lying under
               | any useful definition of the word. You're veering off
               | into meaningless philosophy that has no bearing on
               | outcomes and results.
        
             | hunter2_ wrote:
             | To the question of whether it could have intent to deceive,
             | going to the dictionary, we find that intent essentially
             | means a plan (and computer software in general could be
             | described as a plan being executed) and deceive essentially
             | means saying something false. Furthermore, its plan is to
             | talk in ways that humans talk, emulating their
             | intelligence, and some intelligent human speech is false.
             | Therefore, I do believe it can lie, and will whenever
             | statistically speaking a human also typically would.
             | 
             | Perhaps some humans never lie, but should the LLM be
             | trained only on that tiny slice of people? It's part of
             | life, even non-human life! Evolution works based on things
             | lying: natural camouflage, for example. Do octopuses and
             | chameleons "lie" when they change color to fake out
             | predators? They have intent to deceive!
        
           | vkou wrote:
           | Most humans I professionally interact with don't double down
           | on their mistakes when presented with evidence to the
           | contrary.
           | 
           | The ones that do are people I do my best to avoid interacting
           | with.
           | 
           | LLMs act more like the latter, than the former.
        
         | ugh123 wrote:
         | I don't see it as a problem with most non-critical uses cases
         | (critical being things like medical diagnoses, controlling
         | heavy machinery or robotics, etc).
         | 
         | LLMs right now are most practical for generating templated text
         | and images, which when paired with an experienced worker, can
         | make them orders of magnitude more productive.
         | 
         | Oh, DALL-E created graphic images with a person with 6 fingers?
         | How long would it have taken a pro graphic artist to come up
         | with all the same detail but with perfect fingers? Nothing
         | there they couldn't fix in a few minutes and then SHIP.
        
           | zer00eyz wrote:
           | >> Nothing there they couldn't fix in a few minutes and then
           | SHIP.
           | 
           | If by ship, you mean put directly into the public domain then
           | yes.
           | 
           | https://www.goodwinlaw.com/en/insights/publications/2023/08/.
           | ..
           | 
           | and for more interesting takes:
           | https://www.youtube.com/watch?v=5WXvfeTPujU&
        
         | awongh wrote:
         | I'm not an expert but I suspect that this aspect of lack of
         | correctness in these models might be fundamental to how they
         | work.
         | 
         | I suppose there's two possible solutions: one is a new training
         | or inference architecture that somehow understand "facts". I'm
         | not an expert so I'm not sure how that would work, but from
         | what I understand about how a model generates text, "truth"
         | can't really be a element in the training or inference that
         | affects the output.
         | 
         | the second would be a technology built on top of the inference
         | to check correctness, some sort of complex RAG. Again not sure
         | how that would work in a real world way.
         | 
         | I say it might be fundamental to how the model works because as
         | someone pointed out below, the meaning of the word "material"
         | could be interpreted as the air inside the duck. The model's
         | answer was correct in a human sort of way, or to be more
         | specific in a way that is consistent with how a model actually
         | produces an answer- it outputs in the context of the input. If
         | you asked it if PVC is heavier than water it would answer
         | correctly.
         | 
         | Because language itself is inherently ambiguous and the model
         | doesn't actually understand anything about the world, it might
         | turn out that there's no universal way for a model to know
         | what's true or not.
         | 
         | I could also see a version of a model that is "locked down" but
         | can verify the correctness of its statements, but in a way that
         | limits its capabilities.
        
           | ajkjk wrote:
           | > this aspect of lack of correctness in these models might be
           | fundamental to how they work.
           | 
           | Is there some sense in which this _isn 't_ obvious to the
           | point of triviality? I keep getting confused because other
           | people seem to keep being surprised that LLMs don't have
           | correctness as a property. Even the most cursory
           | understanding of what they're doing understands that it is,
           | fundamentally, predicting words from other words. I am also
           | capable of predicting words from other words, so I can guess
           | how well that works. It doesn't seem to include correctness
           | even as a concept.
           | 
           | Right? I am actually genuinely confused by this. How is that
           | people think it _could_ be correct in a systematic way?
        
             | carstenhag wrote:
             | Because it is assumed that it can think or/and reason. In
             | this case, knowing the concepts of density, the density of
             | a material, detecting the material from an image, detecting
             | what object this image is. And, most importantly, knowing
             | that this object is not solid. Because then it could not
             | float.
        
             | awongh wrote:
             | Yeah. I think there's some ambiguity around the _meaning_
             | of reasoning- because it is a kind of reasoning to say a
             | Duck 's material is less dense than water. In a way it's
             | reasoned that out, and it might actually say something
             | about the way a lot of human reasoning works....
             | (especially if you've ever listened to certain people talk
             | out loud and say to yourself... huh?)
        
             | janalsncm wrote:
             | Just to play devil's advocate: we can train neural networks
             | to model some functions exactly, given sufficient
             | parameters. For example simple functions like ax^2 + bx +
             | c.
             | 
             | The issue is that "correctness" isn't a differentiable
             | concept. So there's no gradient to descend. In general,
             | there's no way to say that a sentence is more or less
             | correct. Some things are just wrong. If I say that human
             | blood is orange that's not more incorrect than saying it's
             | purple.
        
             | spadufed wrote:
             | > Is there some sense in which this isn't obvious to the
             | point of triviality?
             | 
             | This is maybe a pedantic "yes", but is also extremely
             | relevant to the outstanding performance we see in tasks
             | like programming. The issue is primarily the size of the
             | correct output space (that is, the output space we are
             | trying to model) and how that relates to the number of
             | parameters. Basically, there is a fixed upper bound on the
             | amount of complexity that can be encoded by a given number
             | of parameters (obvious in principle, but we're starting to
             | get some theory about how this works). Simple systems or
             | rather systems with simple rules may be below that upper
             | bound, and correctness is achievable. For more complex
             | systems (relative to parameters) it will still learn an
             | approximation, but error is guaranteed.
             | 
             | I am speculating now, but I seriously suspect the size of
             | the space of not only one or more human language but also
             | every fact that we would want to encode into one of these
             | models is far too big a space for correctness to ever be
             | possible without RAG. At least without some massive pooling
             | of compute, which long term may not be out of the question
             | but likely never intended for individual use.
             | 
             | If you're interested, I highly recommend checking out some
             | of the recent work around monosemanticity for what fleshing
             | out the relationship between model-size and complexity
             | looks like in the near term.
        
             | michaelt wrote:
             | I think very few people on this forum believe LLMs are
             | _correct in a systematic way_ , but a lot of people seem to
             | think there's something more than predicting words from
             | other words.
             | 
             | Modern machine learning models contain a lot of inscrutable
             | inner layers, with far too many billions of parameters for
             | any human to comprehend, so we can only speculate about
             | what's going on. A lot of people think that, in order to be
             | so good at generating text, there _must_ be a bunch of
             | understanding of the world in those inner layers.
             | 
             | If a model can write convincingly about a soccer game,
             | producing output that's consistent with the rules, the
             | normal flow of the game and the passage of time - to a lot
             | of people, that implies the inner layers 'understand'
             | soccer.
             | 
             | And anyone who noodled around with the text prediction
             | models of a few decades ago, like Markov chains, Bayesian
             | text processing, sentiment detection and things like that
             | can see that LLMs are massively, massively better than the
             | output from the traditional ways of predicting the next
             | word.
        
           | ilaksh wrote:
           | Bing chat uses gpt-4 and sites sources from it's retrieval.
        
         | freedomben wrote:
         | I think this problem needs to be solved at a higher level, and
         | in fact Bard is doing exactly that. The model itself generates
         | its output, and then higher-level systems can fact check it.
         | I've heard promising things about feeding back answers to the
         | model itself to check for consistency and stuff, but that
         | should be a higher level function (and seems important to avoid
         | infinite recursion or massive complexity stemming from the
         | self-check functionality).
        
           | modeless wrote:
           | I'm not a fan of current approaches here. "Chain of thought"
           | or other approaches where the model does all its thinking
           | using a literal internal monologue in text seem like a dead
           | end. Humans do most of their thinking non-verbally and we
           | need to figure out how to get these models to think non-
           | verbally too. Unfortunately it seems that Gemini represents
           | no progress in this direction.
        
             | freedomben wrote:
             | > _Humans do most of their thinking non-verbally and we
             | need to figure out how to get these models to think non-
             | verbally too._
             | 
             | That's a very interesting point, both technically and
             | philosophically.
             | 
             | Where Gemini is "multi-modal" from training, how close do
             | you think that gets? Do we know enough about neurology to
             | identical a native language in which we think? (not
             | rhetorical questions, I'm really wondering)
        
               | janalsncm wrote:
               | Neural networks are only similar to brains on the
               | surface. Their learning process is entirely different and
               | their internal architecture is different as well.
               | 
               | We don't use neural networks because they're similar to
               | brains. We use them because they are arbitrary function
               | approximators and we have an efficient algorithm
               | (backprop) coupled with hardware (GPUs) to optimize them
               | quickly.
        
             | janalsncm wrote:
             | The point of "verbalizing" the chain of thought isn't that
             | it's the most effective method. And frankly I don't think
             | it matters that humans think non verbally. The goal isn't
             | to create a human in a box. Verbalizing the chain of
             | thought allows us to audit the thought process, and also
             | create further labels for training.
        
               | modeless wrote:
               | No, the point of verbalizing the chain of thought is that
               | it's all we know how to do right now.
               | 
               | > And frankly I don't think it matters that humans think
               | non verbally
               | 
               | You're right, that's not the _reason_ non-verbal is
               | better, but it is _evidence_ that non-verbal is probably
               | better. I think the reason it 's better is that language
               | is extremely lossy and ambiguous, which makes a poor
               | medium for reasoning and precise thinking. It would
               | clearly be better to think without having to translate to
               | language and back all the time.
               | 
               | Imagine you had to solve a complicated multi-step physics
               | problem, but after every step of the solution process
               | your short term memory was wiped and you had to read your
               | entire notes so far as if they were someone else's before
               | you could attempt the next step, like the guy from
               | Memento. That's what I imagine being an LLM using CoT is
               | like.
        
               | Davidzheng wrote:
               | I mean a lot of problems are amenable to subdivision into
               | parts where the process of each part is not needed for
               | the other parts. It's not even clear that humans usually
               | hold in memory all of process of the previous parts
               | especially the it won't be used later.
        
             | dragonwriter wrote:
             | > "Chain of thought" or other approaches where the model
             | does all its thinking using a literal internal monologue in
             | text seem like a dead end. Humans do most of their thinking
             | non-verbally and we need to figure out how to get these
             | models to think non-verbally too.
             | 
             | Insofar as we can say that models think _at all_ between
             | the input and the stream of tokens output, they do it
             | nonverbally. Forcing the structure of reduce _some of it_
             | to verbal form short of the actual response-of-concern does
             | not change that, just as the fact that humans reduce some
             | of their thought to verbal form to work through problems
             | doesn 't change that human thought is mostly nonverbal.
             | 
             | (And if you don't consider what goes on between input and
             | output thought, than chain of thought doesn't force all LLM
             | thought to be verbal, because only the part that comes out
             | in words is "thought" to start with in that case -- you are
             | then saying that the basic architecture, not chain of
             | thought prompting, forces all thought to be verbal.)
        
               | modeless wrote:
               | You're right, the models do think non-verbally. However,
               | crucially, they can only do so for a fixed amount of time
               | for each output token. What's needed is a way for them to
               | think non-verbally continuously, and decide for
               | themselves when they've done enough thinking to output
               | the next token.
        
               | Davidzheng wrote:
               | Is it clear that humans can think nonverbally (including
               | internal monologue) continuously? As in, for difficult
               | reasoning tasks, do humans benefit a lot from extra time
               | if they are not allowed internal monologue. Genuine
               | question
        
         | __s wrote:
         | It also says the attribute of squeaking means it'll definitely
         | float
        
           | bongodongobob wrote:
           | That's actually pretty clever because if it squeaks, there is
           | air inside. How many squeaking ducks have you come across
           | that don't float?
        
             | davesque wrote:
             | You could call it clever or you could call it a spurious
             | correlation.
        
         | bitshiftfaced wrote:
         | There's nothing wrong with what you're saying, but what do you
         | suggest? Factuality is an area of active research, and Deepmind
         | goes into some detail in their technical paper.
         | 
         | The models are too useful to say, "don't use them at all."
         | Hopefully people will heed the warnings of how they can
         | hallucinate, but further than that I'm not sure what more you
         | can expect.
        
           | modeless wrote:
           | The problem is not with the model, but with its portrayal in
           | the marketing materials. It's not even the fact that it lied,
           | which is actually realistic. The problem is the lie was not
           | called out as such. A better demo would have had the user
           | note the issue and give the model the opportunity to correct
           | itself.
        
             | bitshiftfaced wrote:
             | But you yourself said that it was so convincing that the
             | people doing the demo didn't recognize it as false, so how
             | would they know to call it out as such?
             | 
             | I suppose they could've deliberately found a hallucination
             | and showcased it in the demo. In which case, pretty much
             | every company's promo material is guilty of not showcasing
             | negative aspects of their product. It's nothing new or
             | unique to this case.
        
               | modeless wrote:
               | They should have looked more carefully, clearly.
               | Especially since they were criticized for the exact same
               | thing in their last launch.
        
         | twobitshifter wrote:
         | I, a non-AGI, just 'hallucinated' yesterday. I hallucinated
         | that my plan was to take all of Friday off and started
         | wondering why I had scheduled morning meetings. I started
         | canceling them in a rush. In fact, all week I had been planning
         | to take a half day, but somehow my brain replaced the idea of a
         | half day off with a full day off. You could have asked me and I
         | would have been completely sure that I was taking all of friday
         | off.
        
         | margorczynski wrote:
         | LLMs do not lie, nor do they tell the truth. They have no goal
         | as they are not agents.
        
           | modeless wrote:
           | With apologies to Dijkstra, the question of whether LLMs can
           | lie is about as relevant as the question of whether
           | submarines can swim.
        
         | rowanG077 wrote:
         | The duck is indeed made of a material that is less dense.
         | Namely water and air.
         | 
         | If you go to such technical routes your definition is wrong
         | too. It doesn't float because it contains air. If you poke in
         | the head of the duck it will sink. Even though at all times it
         | contains air.
        
           | recursive wrote:
           | The duck is made of water and air? Which duck are we talking
           | about here.
        
         | dogprez wrote:
         | That's a tricky one though since the question is, is the air
         | inside of the rubber duck part of the material that makes it?
         | If you removed the air it definitely wouldn't look the same or
         | be considered a rubber duck. I gave it to the bot since when
         | taking ALL the material that makes it a rubber duck, it is less
         | dense than water.
        
           | bee_rider wrote:
           | If you hold a rubber duck under water and squeeze out the
           | air, it will fill with water and still be a rubber duck. If
           | you send a rubber duck into space, it will become almost
           | completely empty but still be a rubber duck. Therefore, the
           | liquid used to fill the empty space inside it is not part of
           | the duck.
           | 
           | I mean apply this logic to a boat, right? Is the entire
           | atmosphere part of the boat? Are we all on this boat as well?
           | Is it a cruise boat? If so, where is my drink?
        
           | modeless wrote:
           | A rubber duck in a vacuum is still a rubber duck and it still
           | floats (though water would evaporate too quickly in a vacuum,
           | it could float on something else of the same density).
        
             | dogprez wrote:
             | A rubber duck with a vacuum inside (removing the air
             | material) of it is just a piece of rubber with eyes.
             | Assuming OP's point about the rubber not being less dense
             | than water, it would sink, no?
        
         | WhitneyLand wrote:
         | Agree, then the question becomes how will this issue play out?
         | 
         | Maybe AI correctness will be similar to automobile safety. It
         | didn't take long for both to be recognized as fundamental
         | issues with new transformative technologies.
         | 
         | In both cases there seems to be no silver bullet. Mitigations
         | and precautions will continue to evolve, with varying degrees
         | of effectiveness. Public opinion and legislation will play some
         | role.
         | 
         | Tragically accidents will happen and there will be a cost to
         | pay, which so far has been much higher and more grave for
         | transportation.
        
         | crazygringo wrote:
         | EDIT: never mind, I missed the exact wording about being "made
         | of a material..." which is definitely false then. Thanks for
         | the correction below.
         | 
         | Preserving the original comment so the replies make sense:
         | 
         | ---
         | 
         | I think it's a stretch to say that's false.
         | 
         | In a conversational human context, saying it's made of rubber
         | _implies_ it 's a rubber shell with air inside.
         | 
         | It floats because it's rubber [with air] as opposed to being a
         | ceramic figurine or painted metal.
         | 
         | I can imagine most non-physicist humans saying it floats
         | because it's rubber.
         | 
         | By analogy, we talk about houses being "made of wood" when
         | everybody knows they're made of plenty of other materials too.
         | But the context is instead of brick or stone or concrete. It's
         | not _false_ to say a house is made of wood.
        
           | furyofantares wrote:
           | This is what the reply was:
           | 
           | > Oh, it it's squeaking then it's definitely going to float.
           | 
           | > It is a rubber duck.
           | 
           | > It is made of a material that is less dense than water.
           | 
           | Full points for saying if it's squeaking then it's going to
           | float.
           | 
           | Full points for saying it's a rubber duck, with the
           | implication that rubber ducks float.
           | 
           | Even with all that context though, I don't see how "it is
           | made of a material that is less dense than water" scores any
           | points at all.
        
             | yowzadave wrote:
             | Yeah, I think arguing the logic behind these responses
             | misses the point, since an LLM doesn't use any kind of
             | logic--it just responds in a pattern that mimics the way
             | people respond. It says "it is made of a material that is
             | less dense than water" because that is a thing that is
             | similar to what the samples in its training corpus have
             | said. It has no way to judge whether it is correct, or even
             | what the concept of "correct" is.
             | 
             | When we're grading the "correctness" of these answers,
             | we're really just judging the average correctness of
             | Google's training data.
             | 
             | Maybe the next step in making LLM's more "correct" is not
             | to give them _more_ training data, but to find a way to
             | _remove_ the bad training data from the set?
        
           | modeless wrote:
           | > In a conversational human context, saying it's made of
           | rubber implies it's a rubber shell with air inside.
           | 
           | Disagree. It could easily be solid rubber. Also, it's _not_
           | made of rubber, and the model didn 't claim it was made of
           | rubber either, so it's irrelevant.
           | 
           | > It floats because it's rubber [with air] as opposed to
           | being a ceramic figurine or painted metal.
           | 
           | A ceramic figurine or painted metal in the same shape would
           | float too. The claim that it floats because of the density of
           | the material is false. It floats because the shape is hollow.
           | 
           | > It's not false to say a house is made of wood.
           | 
           | It's false to say a house is made of air simply because its
           | shape contains air.
        
         | omginternets wrote:
         | People seem to want to use LLMs to mine knowledge, when really
         | it appears to be a next-gen word-processor.
        
         | eurleif wrote:
         | To be fair, one could describe the duck as being made of air
         | and vinyl polymer, which in combination are less dense than
         | water. That's not how humans would normally describe it, but
         | that's kind of arbitrary; consider how aerogel is often
         | described as being mostly made of air.
        
           | colonwqbang wrote:
           | Is an aircraft carrier made of a material that is less dense
           | than water?
        
             | leeoniya wrote:
             | only if you average it out over volume :P
        
             | andrewmutz wrote:
             | Is an aircraft carrier made of metal and air? Or just
             | metal?
        
               | bee_rider wrote:
               | Where's the distinction between the air that is part of
               | the boat, and the air that is not? If the air is included
               | in the boat, should we all be wearing life vests?
        
           | oh_sigh wrote:
           | If I take all of the air out of a toy duck, it is still a toy
           | duck. If I take all of the vinyl/rubber out of a toy duck, it
           | is just the atmosphere remaining
        
           | modeless wrote:
           | The _material_ of the duck is not air. It 's not sealed. It
           | would still be a duck in a vacuum and it would still float on
           | a liquid the density of water too.
        
         | PepperdineG wrote:
         | >It's the single biggest problem with LLMs and Gemini isn't
         | solving it.
         | 
         | I loved it when the lawyers got busted for using a
         | hallucinating LLM to write their briefs.
        
         | glitchc wrote:
         | Well this seems like a huge nitpick. If a person said that, you
         | would afford them some leeway, maybe they meant the whole duck,
         | which includes the hollow part in the middle.
         | 
         | As an example, when most people say a balloon's lighter than
         | air, they mean an inflated balloon with hot air or helium, but
         | you catch their meaning and don't rush to correct them.
        
           | modeless wrote:
           | The model specifically said that the _material_ is less dense
           | than water. If you said that the _material_ of a balloon is
           | less dense than air, very few people would interpret that as
           | a correct statement, and it could be misleading to people who
           | don 't know better.
           | 
           | Also, lighter-than-air balloons are intentionally filled with
           | helium and sealed; rubber ducks are not sealed and contain
           | air only incidentally. A balloon in a vacuum would still
           | contain helium (if strong enough) but would not rise, while a
           | rubber duck in a vacuum would not contain air but would still
           | easily float on a liquid of similar density to water.
        
         | eviks wrote:
         | Given the misleading presentation by real humans in these
         | "whole teams" that this tweet corrects, this doesn't illustrate
         | any underlying powers by the model
        
         | catchnear4321 wrote:
         | language models do not lie. (this pedantic distinction being
         | important, because language models.)
        
         | lemmsjid wrote:
         | I did some reading and it seems that rubber's relative density
         | to water has to do with its manufacturing process. I see a
         | couple of different quotes on the specific gravity of so-called
         | 'natural rubber', and most claim it's lower than water.
         | 
         | Am I missing something?
         | 
         | I asked both Bard (Gemini at this point I think?) and GPT-4 why
         | ducks float, and they both seemed accurate: they talked about
         | the density of the material plus the increased buoyancy from
         | air pockets and went into depth on the principles behind
         | buoyancy. When pressed they went into the fact that "rubber"'s
         | density varies by the process and what it was adulterated with,
         | and if it was foamed.
         | 
         | I think this was a matter of the video being a brief summary
         | rather than a falsehood. But please do point out if I'm wrong
         | on the rubber bit, I'm genuinely interested.
         | 
         | I agree that hallucinations are the biggest problems with LLMs,
         | I'm just seeing them get less commonplace and clumsy. Though,
         | to your point, that can make them harder to detect!
        
           | modeless wrote:
           | Someone on Twitter was also skeptical that the material is
           | more dense than water. I happened to have a rubber duck handy
           | so I cut a sample of material and put it in water. It sinks
           | to the bottom.
           | 
           | Of course the ultimate skeptic would say one test doesn't
           | prove that all rubber ducks are the same. I invite you to try
           | it yourself.
           | 
           | Yes, the models will frequently give accurate answers if you
           | ask them this question. That's kind of the point. Despite
           | knowing that they know the answer, you still can't trust them
           | to be correct.
        
         | bbarnett wrote:
         | Devil's advocate. It is made of a material less dense than
         | water. Air.
         | 
         | It certainly isn't how I would phrase it, and I wouldn't count
         | air as what something is made of, but...
         | 
         | Soda pop is chocked full of air, it's part of it! And I'd say
         | carbon dioxide is a part of the recipe, of pop.
         | 
         | So it's a confusing world for a young LLM.
         | 
         | (I realise it may have referenced rubber prior, but it may have
         | meant air... again, Devil's advocate)
        
       | neilv wrote:
       | I missed the disclaimer. So, when watching it, I started to think
       | "Wow, so Google is releasing their best stuff".
       | 
       | But then I soon noticed some things that were too smooth, so
       | seemed at best to be cherry-picked interactions occasionally
       | leaning on hand-crafted situation handlers. Or, it turns out,
       | faked.
       | 
       | Regardless of disclaimers, this video seems misleading to be
       | releasing right now, in the context of OpenAI eating Google's
       | lunch.
       | 
       | Everyone is expecting Google to try to show they can do better.
       | This isn't that. This isn't even an mocked-up interaction future
       | of HCI concept video, because it's not showing a vision of what
       | people want to do --- it's only showing a demo of technical
       | capabilities.
       | 
       | It's saying "This is what a contrived tech demo (not application
       | vision concept) _could_ look like, but we can 't do it yet, so we
       | faked it. Hopefully, the viewer will get the message that we're
       | competitive with OpenAI."
       | 
       | (This fake demo could just be an isolated oops of a small group,
       | not representative of Google's ability to rise to the current
       | disruption challenge, I don't know.)
        
         | miraculixx wrote:
         | I knew immediately this was just overhyped PR when I noticed
         | the author of the blogpost is Sundar.
        
       | milofeynman wrote:
       | I looked at is as if it were a good aspirational target for 5
       | years from now. It was obvious the whole video was edited
       | together not real time.
        
       | Alifatisk wrote:
       | The bloomberg article gives 404 for me
        
       | dramm wrote:
       | The more Google tries to over-hype stuff the more that keeps
       | giving me a greater impression they are well behind OpenAI. Time
       | to STFU and focus on working on stuff.
        
       | SheinhardtWigCo wrote:
       | Just how many lives does Sundar have? Where is the board?
        
         | miraculixx wrote:
         | Counting their bonusses?
        
       | miraculixx wrote:
       | rofl
       | 
       | C'mon that was obvious. Be real.
        
       | 1024core wrote:
       | For more details about how the video was created, see this blog
       | post: https://developers.googleblog.com/2023/12/how-its-made-
       | gemin...
        
       | onemoresoop wrote:
       | It seems like the fake video did the trick, their stock is up
       | 5.5% today.
        
       | eh_why_not wrote:
       | There was also the cringey "niiice!", "sweeeet!", "that's
       | greaatt", "that's actually pretty good" responses from the
       | narrator in a few of the demo videos that gave them the feel of a
       | cheap 1980's TV ad.
        
         | carabiner wrote:
         | It really reminds me of the Black Mirror episode Smithereens
         | with the tech CEO talking with the shooter. Tech people really
         | struggle with empathy, not just 1 on 1 but with the rest of the
         | outside world which is predominantly low income relatively,
         | with no college education. Paraphrased, Black Mirror ep was
         | like:
         | 
         | [Tech CEO read instructions to "show empathy" from his
         | assistant via Slack]
         | 
         | CEO: I hear you. It must be very hard for you.
         | 
         | Shooter: Of course you fucking hear me, we're on the phone!
         | Talk like a normal person!
        
       | seydor wrote:
       | I thought it was implied and obvious that the video was edited.
       | 
       | So what?
        
       | frozenlettuce wrote:
       | too little, too late. my impression is that google is not one,
       | but two steps behind what MS can offer (they need a larger leap
       | if they want to get ahead)
        
       | golly_ned wrote:
       | If you've seen the video, it's very apparent it's a product
       | video, not a tech demo. They cut out the latencies to make a
       | compelling product video.
       | 
       | I wasn't at all under the impression they were showcasing TTS or
       | low latencies as product features. I don't find the marketing
       | misleading at all, and find these criticisms don't hit the mark.
       | 
       | https://www.youtube.com/watch?v=UIZAiXYceBI
        
         | DominikPeters wrote:
         | It's not just cutting. The answers were obtained by taking
         | still photos and inputting them into the model together with
         | detailed text instructions explaining the context and the task
         | to the model, giving some examples first and using careful
         | chain-of-thought style prompting. (see e.g.
         | https://developers.googleblog.com/2023/12/how-its-made-
         | gemin...) My guess is that the video was fully produced _after_
         | the Gemini outputs were generated by a different team, instead
         | of while or before.
        
       | retox wrote:
       | AI: artificial incompetence
        
       | jbverschoor wrote:
       | Well, google has a history for faking things.. so I'm not not
       | surprised. I expected that..
       | 
       | All companies are just yelling that they're "in" the AI/LLM
       | game.. If they don't, share prices will drop.
        
       | hifreq wrote:
       | The red flag for me was that they started that demo video with a
       | background noise to make it seem like it's a raw video. A subtle
       | manipulation for no reason, it's obviously not a raw video.
       | 
       | The fact that they did not fact check the videos _again_ makes me
       | not particularly confident in the quality of Google 's work. The
       | bit where the model misinterpreted music notation (the circled
       | area does not mean "piano"), and the "less dense than water"
       | rubber duck are beyond the pale. The SVG demo where they generate
       | a South Park looking tree looks like a parody.
        
       | crazygringo wrote:
       | Does it matter at all with regards to its AI capabilities though?
       | 
       | The video has a disclaimer that it was edited for latency.
       | 
       | And good speech-to-text and text-to-speech already exists, so
       | building that part is trivial. There's no deception.
       | 
       | So then it seems like somebody is pressing a button to submit
       | stills from a video feed, rather than live video. It's still just
       | as useful.
       | 
       | My main question then is about the cup game, because that
       | absolutely requires video. Does that mean the model takes short
       | video inputs as well? I'm assuming so, and that it generates
       | audio outputs for the music sections as well. If _those_ things
       | are not real, _then_ I think there 's a problem here. The
       | Bloomberg article doesn't mention those, though.
        
         | beering wrote:
         | Even your skeptical take doesn't fully show how faked this was.
         | 
         | > The video has a disclaimer that it was edited for latency.
         | 
         | There was no disclaimer that the prompts were different from
         | what's shown.
         | 
         | > And good speech-to-text and text-to-speech already exists, so
         | building that part is trivial. There's no deception.
         | 
         | Look at how many people thought it can react to voice in real-
         | time - the net result is that a lot of people (maybe most?)
         | were deceived. And the text prompts were actually longer and
         | more specific than what was said in the video!
         | 
         | > somebody is pressing a button to submit stills from a video
         | feed, rather than live video.
         | 
         | Somebody hand-picked images to convey exactly the right amount
         | of information to Gemini.
         | 
         | > Does that mean the model takes short video inputs as well?
         | I'm assuming so
         | 
         | It was given a hand-picked series of still images with the
         | hands still on the cups so that it was easier to understand
         | what cup moved where.
         | 
         | Source for the above:
         | https://developers.googleblog.com/2023/12/how-its-made-gemin...
        
       | skilled wrote:
       | fake benchmarks, fake stitched together videos, disingenuous
       | charts, no developer API on launch, announcements stuffed with
       | marketing fluff.
       | 
       | As soon as I saw that opening paragraph from Sundar and how it
       | was written I knew that Gemini is going to be a steaming pile of
       | shit.
       | 
       | They should have watched the GPT-4 announcement from OpenAI
       | again. That demo Greg Brockman did with converting a sketch on a
       | piece of paper to a CodePen from a Discord channel, with all the
       | error correcting and whatnot, is how you launch a product that's
       | appealing to users.
       | 
       | TechCrunch, Twitter and some other sites (including HN i guess)
       | are already piling on to this and by Monday things will go back
       | to how they were and Google will have to go back to the drawing
       | board to figure out another way to relaunch Gemini in the future.
        
       | taspeotis wrote:
       | Google Gemi-lie
        
       | vjerancrnjak wrote:
       | There is a possibility of dataset contamination on the
       | competitive programming benchmark. A nice discussion on the page
       | where AlphaCode2 was solving the problems
       | https://codeforces.com/blog/entry/123035
       | 
       | Problem showed in the video was reused in a recent competition
       | (so could have been available in the dataset).
        
       | mtrovo wrote:
       | I guess a much better next step is to compare how GPT4V performs
       | when asked similar prompts. Even if mostly staged this is very
       | impressive to me, not much on the current tech but more on how
       | much leverage Google has to win this race on the long run because
       | of its hardware presence.
       | 
       | The more these models improve the more we will want less friction
       | and faster interactions, this means that in the long term having
       | to open an app and ask a question is not gonna fly compared to
       | just pointing your phone camera to something, asking a question
       | and getting an answer that's tailored to everything Google knows
       | about you in real time.
       | 
       | Apple will most likely also roll their own in house solution for
       | Siri instead of relying on an external company. This leaves
       | OpenAI and the other small companies not just competing for the
       | best models but also on how to put them in front of people in the
       | first place and how to get access to their personal information.
        
         | bradhe wrote:
         | > Even if mostly staged this is very impressive to me, not much
         | on the current tech but more on how much leverage Google has to
         | win this race on the long run because of its hardware presence.
         | 
         | I think you have too much information to form a reasonable
         | opinion on the situation. Google is using editing techniques
         | and specific scripting to try to demonstrate they have a
         | sufficiently powerful general AI. The magnitude of this claim
         | is huge, and the fact that they're faking it should be a
         | likewise enormous scandal.
         | 
         | To sum this up "well I guess they're doing better than XYZ"
         | discounts the absurd context of all this.
        
       | DonnyV wrote:
       | This is so crazy. Google invented transformers which is the bases
       | for all these models. How do they keep fumbling like this over
       | and over. Google Docs created in 2006! Microsoft is eating their
       | lunch. Google creates the ability to change VM's in place and
       | makes a fully automated datacenter. Amazon and Microsoft are
       | killing them in the cloud. Google has been working on self
       | driving longer than anyone. Tesla is catching up and will most
       | likely beat them.
       | 
       | The amount of fumbles is monumental.
        
         | bradhe wrote:
         | Microsoft eating Google's lunch on documents is laughable at
         | best. Not to mention it confuses the entire timeline of office
         | productivity software??
        
           | hot_gril wrote:
           | Is paid MS Teams is more or less common than paid GSuite?
           | It's hard to find stats on this. GSuite is the better product
           | IMO, but MS has a stronger b2b reputation, and anecdotally I
           | hear more about people using Teams.
        
             | UrineSqueegee wrote:
             | I worked at many companies in my times and all of them used
             | teams except from one that used slack but all used MS
             | products, none used googles.
        
             | abustamam wrote:
             | Does anyone use paid GSuite for anything other than
             | docs/drive/Gmail ? In all companies I've worked at, we've
             | used GSuite exclusively for those, and used slack/discord
             | for chat, and zoom/discord for video/meetings.
             | 
             | I know that MS Teams is a more full-featured product suite,
             | but even at companies that used it, we still used Zoom for
             | meetings.
        
               | hot_gril wrote:
               | GSuite for calendar makes sense too. Chat sucks, and Meet
               | would be alright if it weren't so laggy, but those are
               | two things you can easily not use.
        
             | bbarnett wrote:
             | Teams will likely still be around in 20 years. I doubt
             | gsuite will exist in 5... or even 1.
        
               | hot_gril wrote:
               | GSuite has existed since 2006, so it's not like Google
               | lacks focus on it.
        
               | bbarnett wrote:
               | That's ancient by google metrics!!!
        
         | w10-1 wrote:
         | Isn't it always easier to learn from others' mistakes?
         | 
         | Google has the problem that it's typically the first to
         | encounter a problem, and it has the resources to approach it
         | (from search), but the incentive to monetize it (to get away
         | from depending entirely on search revenue). And, management.
        
           | rurp wrote:
           | I don't know if that really excuses Google in this case
           | because it's a productization problem. Google never tried to
           | release a ChatGPT competitor until after OpenAI had. OpenAI
           | has been wildly successful as the first mover, despite having
           | to blaze some new product trails. Even after months of
           | watching them and with near-infinite resources, Google is
           | still struggling to catch up.
        
             | hosh wrote:
             | Outside of outliers like gmail, Google didn't get their
             | success with product. The organization is set up for
             | engineering to carry the day, funded by search.
             | 
             | An AI product that makes search irrelevant is an
             | existential threat, but I don't think Google has the
             | product DNA to pull it off. I heard it has been taken over
             | by more business / management types, but it is still
             | missing product as a core pillar.
        
           | rtsil wrote:
           | Considerng the number of messaging apps they tried to launch,
           | if there's at least one thing that can be concluded, it's
           | that it isn't easier to learn from their own mistakes.
        
         | hot_gril wrote:
         | Engineer-driven company. Not enough top-down direction on the
         | products. Too much self-perceived moral high ground. But lately
         | they've been changing this.
        
           | Slackwise wrote:
           | Uhh, no, not really; quite the opposite in fact.
           | 
           | Under Eric Schmidt they were engineer-driven, during the
           | golden era of the 2000s. Nowadays they're MBA driven, which
           | is why they had 4 different messaging apps from different
           | product managers.
        
             | hot_gril wrote:
             | Lack of top-down direction is what allowed that situation.
             | Microsoft is MBA-driven and usually has a coherent product
             | lineup, including messaging.
             | 
             | Also, "had." Google cleaned things up. They still sometimes
             | do stuff just cause, but it's a lot less now. I still feel
             | like Meet using laggy VP9 (vs H.264 like everyone else) is
             | entirely due to engineer stubbornness.
        
               | robertlagrant wrote:
               | I would say that Microsoft's craziness around buying Kin
               | and Nokia, and Windows 8, RT edition, etc etc, was far
               | more fundamental product misdirection than anything
               | Google has ever done.
        
               | hot_gril wrote:
               | Microsoft failed to enter the mobile space, yeah. Google
               | fumbled with the Nexus stuff, even though they succeeded
               | with the Android software. But bigger picture, Microsoft
               | was still able to diversify their revenue sources a lot
               | while Google failed to do so.
        
         | _the_inflator wrote:
         | I say it again and again: sales, sales. Money is earned in
         | enterprise domains.
         | 
         | And this business is so totally different to Google in every
         | way imaginable.
         | 
         | Senior Managers love customer support, SLAs - Google loves
         | automation. Two worlds collide.
        
           | hot_gril wrote:
           | Google customer support says "Won't Fix [Skill Issue]"
        
           | ASalazarMX wrote:
           | Google Workspace works through resellers, they train less
           | people, and those people give the customer support instead.
           | IMO Google's bad reputation comes from their public customer
           | support.
        
         | sourcegrift wrote:
         | I was at MS in 2008 September and internally they had a very
         | beautiful and well functioning Office web already (named
         | differently, forgot the name but it wasn't sharepoint if I
         | recall correctly, I think it had to do something with expense
         | reports?) that would put Google Docs to shame today. They just
         | didn't want to cannibalize their own product.
        
         | rurp wrote:
         | While it is crazy, it's not too surprising. Google has become
         | as notorious for product ineptitude as they have been for
         | technical prowess. Dominating the fundamental research for
         | GenAI but face planting on the resulting consumer products is
         | right in line with the company that built Stadia, GMail/Inbox,
         | and 17 different chat apps.
        
         | ren_engineer wrote:
         | >Google Docs created in 2006
         | 
         | tech was based on an acquired company, Google just abused their
         | search monopoly to make it more popular(same thing they did
         | with YT). This has been the strategy for every service they've
         | ever made, Google really hasn't launched a decent in-house
         | product since Gmail and even that was grown using their search
         | monopoly as free advertising
         | 
         | >Google Docs originated from Writely, a web-based word
         | processor created by the software company Upstartle and
         | launched in August 2005
        
           | robertlagrant wrote:
           | > Google really hasn't launched a decent in-house product
           | since Gmail
           | 
           | What about Chrome? And Chromebooks?
        
             | camflan wrote:
             | mmm, WebKit?
        
         | holoduke wrote:
         | They are an ads company. Focus is never on "core" products.
        
         | lern_too_spel wrote:
         | I was with you until the Tesla hot take. I'd bet dollars to
         | donuts that Tesla doesn't get to level 4 by the end of the
         | decade. Waymo is already there.
        
           | bendbro wrote:
           | Space man bad.
        
           | hot_gril wrote:
           | I agree, but I also bet Waymo doesn't exist by the end of the
           | decade. Not just because it's Google but because it's hard to
           | profit from.
        
         | renegade-otter wrote:
         | Google doesn't know how to do anything else.
         | 
         | A product requires commitment, it requires grind. That 10% is
         | the most critical one, and Google persistently refuses to push
         | products across the finish line, just giving up on them and
         | adding to the infamous Google Product Graveyard.
         | 
         | Honestly, what is the point? They could just maintain the core
         | search/ads and not pay billions of dollars for tens of
         | thousands of expensive engineers who have to go through a
         | bullshit interview process and achieve nothing.
        
       | davesque wrote:
       | The hype really is drowning out the simple fact that basically no
       | one really knows what these models are doing. Why does it matter
       | so much that we include auto-correlation of embedding vectors as
       | the "attention" mechanism in these models? And that we do this
       | sufficiently many times across all the layers? And that we
       | blindly smoosh values together with addition and call it a "skip"
       | connection? Yes, you can tell me a bunch of stuff about gradients
       | and residual information, but tell me why any of this stuff is or
       | isn't a good model of causality.
        
       | stormfather wrote:
       | So what? Voice to text is a solved problem. And in cases where
       | realtime is important, just throw more compute at it. I'm missing
       | the damning gotcha moment here.
        
       | davesque wrote:
       | A big red flag for me was that Sundar was prompting the model to
       | report lots of facts that can be either true or false. We all saw
       | the benchmark figures that they published and the results mostly
       | showed marginal improvements. In other words, the issue of
       | hallucination has not been solved. But the demo seemed to imply
       | that it had. My conclusion was that they had mostly cherry picked
       | instances in which the model happened to report correct or
       | consistent information.
       | 
       | They oversold its capabilities, but it does still seem that
       | multi-modal models are going to be a requirement for AI to
       | converge on a consistent idea of what kinds of phenomena are
       | truly likely to be observed across modalities. So it's a good
       | step forward. Now if they can just show us convincingly that a
       | given architecture is actually modeling causality.
        
         | LesZedCB wrote:
         | i think this was demonstrated in that mark rober promo video[1]
         | where he asked why the paper airplane stalled by blatantly
         | leading the witness.
         | 
         | "do you believe that a pocket of hot air would lead to lower
         | air pressure causing my plane to stall?"
         | 
         | he could barely even phrase the question correctly because it
         | was so awkward. just embarrassing.
         | 
         | [1] https://www.youtube.com/watch?v=mHZSrtl4zX0&t=277s
        
         | calf wrote:
         | Ever since the "stochastic parrots" and "super-autocomplete"
         | criticisms of LLMs, the question is whether hallucinations are
         | solvable in principle at all. And if hallucinations are
         | solvable, it would of such basic and fundamental scientific
         | importance that I think would be another mini-breakthrough in
         | AI.
        
         | plaidfuji wrote:
         | These LLMs do not have a concept of factual correctness and are
         | not trained/optimized as such. I find it laughable that people
         | expect these things to act like quiz bots - this misunderstands
         | the nature of a generative LLM entirely.
         | 
         | It simply spits out whatever output sequence it feels is most
         | likely to occur after your input sequence. How it defines "most
         | likely" is the subject of much research, but to optimize for
         | factual correctness is a completely different endeavor. In
         | certain cases (like coding problems) it can sound smart enough
         | because for certain prompts, the approximate consensus of all
         | available text on the internet is pretty much true and is
         | unpolluted by garbage content from laypeople. It is also good
         | at generating generic fluffy "content" although the value of
         | this feature escapes me.
         | 
         | In the end the quality of the information it will get back to
         | you is no better than the quality of a thorough google search..
         | it will just get you a more concise and well-formatted answer
         | faster.
        
           | eurekin wrote:
           | The first question I always ask myself in such cases: how
           | much input data has a simple "I don't know" lines? This is
           | clearly a concept (not knowing sth) that has to be learned in
           | order to be expressed in the output.
        
       | bradhe wrote:
       | Fucking. Shocking.
       | 
       | Anyone with half a brain could see through this "demo." It was
       | vastly too uncanny to be real, to the point that it was poorly
       | setup. Google should be ashamed.
        
       | FartyMcFarter wrote:
       | Unpaywalled Bloomberg article linked in the tweet:
       | 
       | https://archive.is/4H1fB
        
       | mirkodrummer wrote:
       | I didn't believed Google presentation off-hand because I don't
       | care anymore, especially because it comes from them. I just use
       | tools and adapt. Copilot helps me automating boring tasks, can't
       | help much at new stuff, so I actually discovered I often do
       | "interesting" work. I use gpt 3.5/4 for everything but work, it's
       | been a bless, best suggestion engine for movies, books and music
       | with just a prompt and without the need of tons of data about my
       | watch history(looking at you youtube). In these strange times I'm
       | actually learning a lot more, productivity is more or less the
       | same as before llms, but annoying tasks are relieved a bit. All
       | of that without the hype. Sometimes I laugh at Google, it must be
       | a real shit show inside that mega corporation, but I kinda
       | understand the need of a marketing editing, having a first class
       | ticket on the AI train is so important for them as it seems they
       | see it as an existential threat. At least it seems so since they
       | decided to take the risk of lying.
        
       | zdrummond wrote:
       | This is just a tweet that makes a claim without backing, and
       | links to an article that was pulled.
       | 
       | Can we change the URL to the real article if it still exists?
        
       | dilawar wrote:
       | Bloomberg link in Xeet is 404 for me (Bangalore).
        
       | gsuuon wrote:
       | Wow - my first thought was I wonder what framerate they're
       | sending video at. The whole demo seems significantly less
       | impressive in that case.
        
       | ElijahLynn wrote:
       | Link to the Bloomberg article from the Tweet is 404 now.
        
       ___________________________________________________________________
       (page generated 2023-12-07 23:00 UTC)