[HN Gopher] Gemini "duck" demo was not done in realtime or with ...
___________________________________________________________________
Gemini "duck" demo was not done in realtime or with voice
Author : apsec112
Score : 623 points
Date : 2023-12-07 18:03 UTC (4 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| throwitaway222 wrote:
| link to duck video: https://www.youtube.com/watch?v=UIZAiXYceBI
| recursive wrote:
| Thanks. This is the first I'm hearing of a duck demo, and
| couldn't figure out what it was.
| Garrrrrr wrote:
| Timestamp for the duck demo:
| https://youtu.be/UIZAiXYceBI?si=pNT74PXjyDataF1T&t=246
| Inward wrote:
| Yes, that was obvious as soon as I saw it wasn't live I clicked
| off. You can train any LLM to perform a certain task(s) well and
| google engineers are not that dense. This was obvious marketing
| PR as open AI has completely made google basically obsolete with
| 90% of my queries can be answered without wading through LLM
| generated text for a simple answer.
| nirvael wrote:
| >without wading through LLM generated text
|
| ...OpenAI solved this by generating LLM text for you to wade
| through?
| rose_ann_ wrote:
| No. It solved it by (most of the time) giving the OP and I
| the answer to our queries, without us needing to wade through
| spammy SERP links.
| kweingar wrote:
| If LLMs can replace 90% of your queries, then you have very
| different search patterns from me. When I search on Kagi,
| much of the time I'm looking for the website of a business,
| a public figure's social media page, a restaurant's hours
| of operation, a software library's official documentation,
| etc.
|
| LLMs have been very useful, but regular search is still a
| big part of everyday life for me.
| GolfPopper wrote:
| How do you tell a plausible wrong answer from a real one?
| rose_ann_ wrote:
| By testing the code it returns (I mostly use it as a
| coding assistant) to see if it works. 95% of the time it
| does.
|
| For technical questions, ChatGPT has almost completely
| replaced Google & Stack Overflow for me.
| 13415 wrote:
| In my experience, testing code in a way that ensures that
| it works is often harder and takes more time than writing
| it.
| data-ottawa wrote:
| GPT4 search is a very good experience.
|
| Though because you don't see the answers it doesn't show you,
| it's hard to really validate the quality, so I'm still wary,
| but when I look for specific stuff it tends to find it.
| kweingar wrote:
| The video itself and the video description give a disclaimer to
| this effect. Agreed that some will walk away with an incorrect
| view of how Gemini functions, though.
|
| Hopefully realtime interaction will be part of an app soon.
| Doesn't seem like there would be too many technical hurdles
| there.
| billconan wrote:
| performance and cost are hurdles?
| kweingar wrote:
| It can be realtime while still having more latency than
| depicted in the video (and the video clearly stated that
| Gemini does not respond that quickly).
|
| A local model could send relevant still images from the
| camera feed to Gemini, along with the text transcript of the
| user's speech. Then Gemini's output could be read aloud with
| text-to-speech. Seems doable within the present cost and
| performance constraints.
| anigbrowl wrote:
| People don't really pay attention to disclaimers. Google made a
| choice knowing people would remember the hype, not the
| disclaimer.
| lainga wrote:
| :%s/Google/the team :%s/people/the promotion board
|
| Conway's law applied to the corporate-public interface :)
| 3pt14159 wrote:
| I remember watching it and I was pretty impressed, but as I
| was walking around thinking to myself I came to the
| conclusion that there was something fishy about the demo. I
| didn't know exactly what they fudged, but it was far too
| polished to explain how well their current AI demos preform.
|
| I'm not saying there have been no improvements in AI. There
| is and this includes Google. But the reason why ChatGPT has
| really taken over the world is that the demo is in your own
| hands and it does quite well there.
| peteradio wrote:
| If there weren't serious technical hurdles they wouldn't have
| faked it.
| jefftk wrote:
| The disclaimer in the description is "For the purposes of this
| demo, latency has been reduced and Gemini outputs have been
| shortened for brevity."
|
| That's different from "Gemini was shown selected still images
| and not video".
| tobr wrote:
| What I found impressive about it was the voice, the fast
| real-time response to video, and the succinct responses. So
| apparently all of that was fake. You got me, Google.
| TillE wrote:
| The entirety of the disclaimer is "sequences shortened
| throughout", in tiny text at the bottom for two seconds.
|
| They do disclose most of the details elsewhere, but the video
| itself is produced and edited in such a way that it's extremely
| misleading. They really want you to think that it's responding
| in complex ways to simple voice prompts and a video feed, and
| it's just not.
| dogprez wrote:
| Yea, of all the edits in the video, the editing for timing is
| the least of concern. My gripe is that the prompting was
| different and in order to get that information you have to
| watch the video only on YouTube, expand the description and
| click on a link to a different blog article. Linking a
| "making of" video where they show this and interview some of
| the minds behind Gemini would have been better PR.
| Jagerbizzle wrote:
| They were just parroting this video on CNBC without any
| disclaimers, so the viewers who don't happen to also read
| hacker news will likely form a different opinion than those of
| us who do.
| skepticATX wrote:
| No. The disclaimer was not nearly enough.
|
| The video fooled many people, including myself. This was not
| your typical super optimized and scripted demo.
|
| This was blatant false advertising. Showing capabilities that
| do not exist. It's shameful behavior from Google, to be
| perfectly honest.
| titzer wrote:
| Yeah, and ads on Google search have the teeniest, tiniest
| little "ad" chip on them, a long progression of making ads more
| in-your-face and less well-distinguished.
|
| In my estimation, given the context around AI-generated content
| and general fakery, this video was deceptive. The only
| impressive thing about the video (to me) was how snappy and
| fluid it seemed to be, presumably processing video in real
| time. None of that was real. It's borderline fraudulent.
| kaoD wrote:
| How is this not false advertising?
| barbazoo wrote:
| Or worse, fraud to make their stock go up
|
| edit: s/stuck/stock
| drcode wrote:
| I suppose it's not false advertising, since they don't even
| claim to have a product released yet that can do this, since
| Trojans Ultra won't be available until an unspecified time next
| year
| stephbu wrote:
| You're right, it's astroturfing a placeholder in the market
| in the absence of product. The difference is probably just
| the target audience - feels like this one is more aimed at
| share-holders and internal politics.
| empath-nirvana wrote:
| possibly securities fraud though. Their stock popped a few
| percent on the back of that faked demo.
| imiric wrote:
| It's still false advertising.
|
| This is common in all industries. Take gaming, for example.
| Game publishers love this kind of publicity, as it creates
| hype, which leads to sales. There have been numerous examples
| of this over the years: Watch Dogs, No Man's Sky, Cyberpunk
| 2077, etc. There's a period of controversy once consumers
| realize they've been duped, the company releases some fake
| apology and promises or doubles down, but they still walk out
| of it richer, and ready to do it again next time.
|
| It's absolutely insidious, and should be heavily fined and
| regulated.
| Tao3300 wrote:
| It's a software demo. If you ever gave an honest demo, you gave
| a bad demo. If you ever saw a good and honest demo, you were
| fooled.
| dragontamer wrote:
| As a programmer, I'd say that all the demos of my code were
| honest and representative of what my code was doing.
|
| But I recognize we're all different programmers in different
| circumstances. But at a minimum, I'd like to be honest with
| my work. My bosses seem to agree with me and I've never been
| pressured into hosting a fake demo or lie about the features.
|
| In most cases, demos are needed because there's that dogfood
| problem. Its just not possible for me to know how my
| (prospective) customers will use my code. So I need to show
| off what has been coded, my progress, and my intentions for
| the feature set. In response, the (prospective) customer may
| walk away, they may have some comments that increases the
| odds of adoption, or they think its cool and amazing and take
| it on the spot. We can go back and forth with regards to
| feature changes or what is possible, but that's how things
| should work.
|
| ------------
|
| I've done a few "I could do it like this" demos, where
| everyone in the room knew that I didn't finish the code yet
| and its just me projecting into the future of how code would
| work and/or how it'd be used. But everyone knew the code
| wasn't done yet (despite that, I've always delivered on what
| I've promised).
|
| There is a degree of professional ethics I'd expect from my
| peers. Hosting honest demos is one of them, especially with
| technical audience members.
| saagarjha wrote:
| I prefer to let my software be good enough to let it speak
| for itself without resorting to fraud, thank you ver much.
| qwertox wrote:
| https://www.youtube.com/watch?v=OPUq31JZFsA
| drcongo wrote:
| Remember when they faked that Google Assistant booking a
| restaurant thing too.
| miraculixx wrote:
| Mhm
| umeshunni wrote:
| How was that fake?
| valine wrote:
| It's not live, but it's in the realm of outputs I would expect
| from a GPT trained on video embeddings.
|
| Implying they've solved single token latency, however, is very
| distasteful.
| zozbot234 wrote:
| OP says that Gemini had still images as input, not video - and
| the dev blog post shows it was instructed to reply to each
| input in relevant terms. Needless to say, that's quite
| different from what's implied in the demo, and at least
| theoretically is already within GPT's abilities.
| valine wrote:
| How do you think the cup demo works? Lots of still images?
| watusername wrote:
| A few hand-picked images (search for "cup shuffling"):
| https://developers.googleblog.com/2023/12/how-its-made-
| gemin...
| valine wrote:
| Holy crap that demo is misleading. Thanks for the link.
| Animats wrote:
| The Twitter-linked Bloomberg page is now down.[1] Alternative
| page: [2] New page says it was partly faked. Can't find old page
| in archives.
|
| [1]
| https://www.bloomberg.com/opinion/articles/2023-12-07/google...
|
| [2]
| https://www.bloomberg.com/opinion/articles/2023-12-07/google...
| sowbug wrote:
| I am similarly enraged when TV show characters respond to text
| messages faster than humans can type. It destroys the realism
| of my favorite rom-coms.
| imacomputertoo wrote:
| it was obviously marketing material, but if this tweet is right,
| then it was just blatant false advertising.
| kjkjadksj wrote:
| Google always does fake advertising. "Unlimited" google drive
| accounts for example. They just have such a beastly legal team
| no one is going to challenge them on anything like that.
| Dylan16807 wrote:
| What was fake about unlimited google drive? There were some
| people using petabytes.
|
| The eventual removal of that tier and anything even close
| speaks to Google's general issues with cancelling services,
| but that doesn't mean it was less real while it existed.
| peteradio wrote:
| Lol, could have done without the cocky narration. "I think we're
| done here."
| cedws wrote:
| The whole launch is cocky. Bleh. Stick to the engineering.
| cryptoz wrote:
| I'll admit I was fooled. I didn't read the description of the
| video. The most impressive thing they showed was the real-time
| responses to watching a video. Everything else was about
| expected.
|
| Very misleading and sad Google would so obviously fake a demo
| like this. Mentioning in the description that it's edited is not
| really in the realm of doing enough to make clear the fakery.
| LesZedCB wrote:
| i too was excited and duped about the real-time implications.
| though i'm not surprised at all to find out it's false.
|
| mea cupla i should have looked at the bottom of the description
| box on youtube where it probably says "this demonstration is
| based on an actual interaction with an LLM"
| wvenable wrote:
| I'm surprised it was false. It was made to look realistic and
| I wouldn't expect Google to fake this kind of thing.
|
| All they've done is completely destroy my trust in anything
| they present.
| zozbot234 wrote:
| Good, that video was mostly annoying and creepy. The AI responses
| as shown in the linked Google dev blogpost are a lot more
| reasonable and helpful. BTW I agree that the way the original
| video was made seems quite misleading in retrospect. But that's
| also par for the course for AI "demos", it's an enduring
| tradition in that field and part of its history. You really have
| to look at production systems and ignore "demos" and pointless
| proofs of concept.
| peteradio wrote:
| What the Quack? I found it tasty as pate.
| danielbln wrote:
| The GPT-4 demo early this year when it was released was a lot
| less.. fake, and in fact very much indicative of it's feature
| set. The same is true for what OpenAI showed during their dev
| days, so at the very least those demos don't have too much
| fakery going on, as far as I could tell.
| Frost1x wrote:
| >You really have to look at production systems and ignore
| "demos" and pointless proofs of concept.
|
| While I agree, I wouldn't call proofs or concepts and demos
| pointless. They often illustrate a goal or target functionality
| you're working towards. In some cases it's really just a matter
| of allotting some time and resources to go from a concept to a
| product, no real engineering is needed, it all exists, but
| there's capital needed to get there.
|
| Meanwhile some proof of concepts skip steps and show higher
| level function that needs some serious breakthrough work to get
| to, maybe multiple steps of that. Even this is useful because
| it illustrates a vision that may be possible so people can
| understand and internalize things you're trying to do or the
| real potential impact of something. That wasn't done here, it
| was embedded in a side note. That information needs to be
| before the demo to some degree without throwing a wet blanket
| on everything and needs to be in the same medium as the demo
| itself so it's very clear what you're seeing.
|
| I have no problem with any of that. I have a lot of problems
| when people don't make it explicitly clear beforehand that it's
| a demo and explain earnestly what's needed. Is it really
| something that exists today in working systems someone just
| needs to invest money and wire it up without new research
| needed? Or is it missing some breakthroughs, how many/what are
| they, how long have these things been pursued, how many people
| are working on them... what does recent progress look like and
| so on (in a nice summarized fashion).
|
| Any demo/poc should come up front with an earnest general
| feasibility assessment. When a breakthrough or two are needed
| then that should skyrocket. If it's just a lot of expensive
| engineering then that's also a challenge but tractable.
|
| I've given a lot of scientific tech demonstrations over the
| years and the businesses behind me obviously want me to be as
| vague as possible to pull money in. I of course have some of
| those same incentives (I need to eat and pay my mortgage like
| everyone else). None-the-less the draw of science to me has
| always been pulling the veil from deception and mystery and I'm
| a firm believer in being as upfront as possible. If you don't
| lead with disclaimers, imaginations run wild into what can be
| done today. Adding disclaimers helps imaginations run wild
| about what can be done tomorrow, which I think is great.
| borissk wrote:
| Google did the same with Pixel 8 Pro advertising - they showed
| stuff like photo and video editing, that people couldn't
| replicate on their phones.
| pizzafeelsright wrote:
| I suppose this is a great example of how trust in authentic
| videos, audio, images, company marketing must be questioned and,
| until verified, assumed to be 'generated'.
|
| I am curious, if the voice, email, chat, and shortly video can
| all be entirely generated in real or near real time, how can we
| be sure that remote employee is actually not a full or partially
| generated entity?
|
| Shared secrets are great when verifying but when the bodies are
| fully remote - what is the solution?
|
| I am traveling at the moment. How can my family validate that it
| is ME claiming lost luggage and requesting a Venmo request?
| takoid wrote:
| >I am traveling at the moment. How can my family validate that
| it is ME claiming lost luggage and requesting a Venmo request?
|
| PGP
| tadfisher wrote:
| Now you have two problems.
|
| (I say this in jest, as a PGP user)
| adewinter wrote:
| Make up a code phrase/word for emergencies, share it with your
| family, then use it for these types of situations.
| mikepurvis wrote:
| Fair, but that also assumes the recipients ("family") are in
| a mindset of constantly thinking about the threat model in
| this type of situation and will actually insist on hearing
| the passphrase.
| pizzafeelsright wrote:
| This will only work once.
| vasco wrote:
| Ask for information that only the actual person would know.
| pizzafeelsright wrote:
| That will only work once if the channels are monitored.
| vasco wrote:
| You only know one piece of information about your family? I
| feel like I could reference many childhood facts or random
| things that happened years ago in social situations.
| raincole wrote:
| If you can't verify whether your employee is AI, then you fire
| them and replace them with AI.
| vasco wrote:
| The question is if an attacker tells you they lost access can
| you please reset some credential, and your security process
| is getting on a video call because you're a fully remote
| company let's say.
| kjkjadksj wrote:
| At this point, probably a handwritten letter. Back to the 20th
| century we go.
| robbomacrae wrote:
| I think it's also why we as a community should speak out when
| we catch them for doing this as they are discrediting tech
| demos. It won't be enough because a lie will be around the
| world before the truth gets out the starting gates but we can't
| just let this go unchecked.
| GaggiX wrote:
| I imagine the model also has some video embeddings like for the
| example when it needed to find where the ball was hiding.
| AndrewKemendo wrote:
| I have used Swype texting since the t9 days.
|
| If I demoed swype texting as it functions in my day to day life
| to someone used to a querty keyboard they would never adopt it
|
| The rate at which it makes wrong assumptions about the word, or I
| have to fix it is probably 10% to 20% of the time
|
| However because it's so easy to fix this is not an issue and it
| doesn't slow me down at all. So within the context of the
| different types of text Systems out there, I t's the best thing
| going for me personally, but it takes some time to learn how to
| use it.
|
| This is every product.
|
| If you demonstrated to people how something will actually work
| after 100 hours of habituation and compensation for edge cases,
| nobody would ever adopt anything.
|
| I'm not sure how to solve this because both are bad.
|
| (Edit: I'm keeping all my typos as meta-comment on this given
| that I'm posting via swype on my phone :))
| mulmen wrote:
| Does swype make editing easier somehow? iOS spellcheck has
| negative value. I turned it off years ago and it reduced errors
| but there are still typos to fix.
|
| Unfortunately iOS text editing is also completely worthless. It
| forces strange selections and inserts edited text in awkward
| ways.
|
| I'm a QWERTY texter but text entry on iOS is a complete
| disaster that has only gotten worse over time.
| mikepurvis wrote:
| I'm an iOS user and prefer the swipe input implementation in
| GBoard over the one in the native keyboard. I'm not sure what
| the differences are, but GBoard just seems to overall make
| fewer mistakes and do a better job correcting itself from
| context.
| wlesieutre wrote:
| Have you tried the native keyboard since iOS 17? It's quite
| a lot better than older versions.
| nozzlegear wrote:
| As I was reading Andrew's comment to myself, I was trying
| to figure out when and why I stopped using swype typing on
| my phone. Then it hit me - I stopped after I switched from
| Android to iOS a few years ago. Something about the iOS
| implementation just doesn't feel right.
| rochak wrote:
| Apple's version is shit. Period. That's why.
| pb7 wrote:
| Hard disagree. I could type your whole comment without any
| typos completely blindly (except maybe "QWERTY" because
| uppercaps don't get autocorrected).
| newaccount74 wrote:
| Apple autocorrect has a tendency to replace technical terms
| with similar words, eg. rvm turns into rum or ram or
| something.
|
| It's even worse on the watch somehow. I take care to hit
| every key exactly, the correct word is there, I hit space,
| boom replaced with a completely different word. On the
| watch it seems to replace almost every word with bullshit,
| not just technical terms.
| rootusrootus wrote:
| > seems to replace almost every word with bullshit
|
| Sort of related, it also doesn't let you cuss. It will
| insist on replacing fuck with pretty much anything else.
| I had to add fuck to the custom replacement dictionary so
| it would let me be. What language I choose to use is mine
| and mine alone, I don't want Nanny to clean it up.
| peteradio wrote:
| What is the latency is Swype? < 10ms? Not at all comparable to
| the video.
| kjkjadksj wrote:
| Its honestly pretty mind boggling that we'd even use querty on
| a smartphone. The entire point of the layout is to keep your
| fingers on the home row. Meanwhile people text with a single or
| two thumbs 100% of the time.
| hiccuphippo wrote:
| I use 8vim[0] from time to time, it's a good idea but needs a
| dictionary/autocompletion. You can get ok speeds after an
| hour of usage.
|
| [0] https://f-droid.org/en/packages/inc.flide.vi8/
| jerf wrote:
| "The entire point of the layout is to keep your fingers on
| the home row."
|
| No, that is how you're _told to type_. You have to be told to
| type that way precisely because QWERTY is _not_ designed to
| keep your fingers on the home row. If you type in a layout
| that is designed to do that, you don 't need to be _told_ to
| keep your fingers on the home row, because you naturally
| will.
|
| Nobody really knows what the designers were thinking, which I
| do not mean as sarcasm, I mean it straight. History lost that
| information. But whatever they were thinking that is clearly
| not it because it is plainly obvious just by looking at it
| how bad it is at that. Nobody trying to design a layout for
| "keeping your fingers on the home row" would leave
| hjkl(semicolon) under the resting position of the dominant
| hand for ~90% of the people.
|
| This, perhaps in one of technical history's great ironies,
| makes it a fairly good keyboard for swype-like technologies!
| A keyboard layout like Dvorak that has "aoeui" all right next
| to each other and "dhtns" on the other would be _constantly_
| having trouble figuring out which one you meant between
| "hat" and "ten" to name just one example. "uio" on qwerty
| could probably stand a bit more separation, but "a" and "e"
| are generally far enough apart that at least for me they
| don't end up confused, and pushing the most common consonants
| towards the outer part of the keyboard rather than clustering
| them next to each other in the center (on the home row) helps
| them be distinguishable too. "fghjkl" is almost a probability
| dead zone, and the "asd" on the left are generally reasonably
| distinct even if you kinda miss one of them badly.
|
| I don't know what an optimal swype keyboard would be, and
| there's probably still a good 10% gain to be made if someone
| tried to make one, but it wouldn't be enough to justify
| learning a new layout.
| mvdtnz wrote:
| > Nobody really knows what the designers were thinking,
| which I do not mean as sarcasm, I mean it straight. History
| lost that information.
|
| My understanding of QWERTY layout is that it was designed
| so that characters frequently used in succession should not
| be able to be typed in rapid succession, so that typewriter
| hammers had less chance of colliding. Or is this an urban
| myth?
| kjkjadksj wrote:
| You have to be taught to use the home row because the
| natural inclination for most people is to peck and hunt
| with their two index fingers. Watch how old people or young
| kids type. That being said staying on the home row is how
| you type fast and make the most of the layout. Everything
| is comfortably reachable for the most part unless you are a
| windows user ime.
| jerf wrote:
| If you learn a keyboard layout where the home row is
| actually the most common keys you use, you will not have
| to be encouraged to use the home row. You just will. I
| know, because I have, and I never "tried" to use the home
| row.
|
| People don't hunt and peck after years of keyboard use
| because of the keyboard; they do it because of the
| keyboard _layout_.
|
| If you want to prove I'm wrong, go learn Dvorak or
| Colemak and show me that once you're comfortable you
| still hunt and peck. You won't be, because it wouldn't
| even make sense. Or, less effort, find a hunt & peck
| Dvorak or Colemak user who is definitely at the
| "comfortable" phase.
| bigtunacan wrote:
| Hold up young one. The reason for QWERTYs design has
| absolutely not been lost to history yet.
|
| The design was to spread out the hammers of the most
| frequently used letters to reduce the frequency of hammer
| jamming back when people actually used typewriters and not
| computers.
|
| The problem it attempted to improve upon, and which is was
| pretty effective at, is just a problem that no longer
| exists.
| saagarjha wrote:
| I'm curious how this works because all the common letters
| seem to be next to each other on the left side of the
| keyboard
| zlg_codes wrote:
| The original intent I do believe was not separating the
| hammers per se, but also helping the hands alternate, so
| they would naturally not jam as much.
|
| However, I use a Dvorak layout and my hands feel like
| they alternate better on that due to the vowels being all
| on one hand. The letters are also in more sensical
| locations, at least for English writing.
|
| It can get annoying when G and C are next to each other,
| and M and W, but most of the time I type faster on Dvorak
| than I ever did on Qwerty. It helps that I learned during
| a time where I used qwerty at work and Dvorak at home, so
| the mental switch only takes a few seconds now.
| jerf wrote:
| Also apocryphal: https://en.wikipedia.org/wiki/QWERTY#Con
| temporaneous_alterna...
|
| And it does a bad job at it, which is further evidence
| that it was not the design consideration. People may not
| have been able to run a quick perl script over a few
| gigabytes of English text, but they would have gotten
| much closer if that was the desire. I don't believe that
| was their goal but they were just too stupid to get it
| even close to right.
| heleninboodler wrote:
| The reason we use qwerty on a smartphone is extremely
| straightforward: people tend to know where to look for the
| keys already, so it's easy to adopt to even though it's not
| "efficient". We know it better than we know the positions of
| letters in the alphabet. You can easily see the difference if
| you're ever presented with an onscreen keyboard that's in
| alphabetical order instead of qwerty (TVs do this a lot, for
| some reason, and it's a different physical input method but
| alpha order really does make you have to stop and hunt). It
| slows you down quite a bit.
| swores wrote:
| That's definitely a good reason why, but perhaps if iOS or
| Android were to research what the best layout is for
| typical touch screen typing and release that as a new
| default, people would find it quite quick to learn a second
| layout and soon get just the benefits?
|
| After all, with TVs I've had the same experience as you
| with the annoying alphabetical keyboard, but we type into
| they maybe a couple of times a year, or maybe once in 5
| years, whereas if we changed our phone keyboard layout we'd
| likely get used to it quite quickly.
|
| Even if not going so far as to push it as a new default for
| all users (I'm willing to accept the possibility that I'm
| speaking for myself as the kind of geeky person who
| wouldn't mind the initial inconvenience of a new kb layout
| if it meant saving time in the long run, and that maybe a
| large majority of people would just hate it too much to be
| willing to give it a chance), they could at least figure
| out what the best layout is (maybe this has been studied
| and decided already, by somebody?) and offer that as an
| option for us geeks.
| mellinoe wrote:
| Even most technically-minded people still use QWERTY on
| full-size computer keyboards despite it being a terrible
| layout for a number of reasons. I really doubt a new,
| nonstandard keyboard would get much if any traction on
| phones.
| aaronax wrote:
| T9 was fine for typing and probably hundreds of millions
| of people used it.
| rurp wrote:
| Path dependency is the reason for this, and is the reason why
| a lot of things are the way they are. An early goal with
| smart phone keyboards was to take a tool that everyone
| already knew how to use, and port it over with as little
| friction as possible. If smart phones happened to be invented
| before external keyboards the layouts probably would have
| been quite different.
| skywhopper wrote:
| I know marketing is marketing, but it's bad form IMO to "demo"
| something in a manner totally detached from its actual manner
| of use. A swype keyboard takes practice to use, but the demos
| of that sort of input typically show it being used in a
| realistic way, even if the demo driver is an "expert".
|
| This is the sort of demo that 1) gives people a misleading idea
| of what the product can actually do; and 2) ultimately
| contributes to the inevitable cynical backlash.
|
| If the product is really great, people can see it in a
| realistic demo of its capabilities.
| mvdtnz wrote:
| Showing a product in its best light is one thing. Demonstrating
| a mode of operation that doesn't exist is entirely another. It
| would be like if a demo of your swipe keyboard included
| telepathic mind control for correcting errors.
| AndrewKemendo wrote:
| I'm not sure I'd agree that what they showed will never be
| possible and in fact my whole point is that I think Google
| can most likely deliver on that in this specific case. Chalk
| it up to my experience in the space, but from what I can see
| it looks like something Google can actually execute on
| (unlike many areas where they fail on product regularly).
|
| I would agree completely that it's not ready for consumers
| the way it was displayed, which is my point.
|
| I do want to add that I believe that the right way to do
| these types of new product rollout is not with these giant
| public announcements.
|
| In fact, I think generally speaking the "right" way to do
| something like this demonstrates only things that are
| possible robustly. However that's not the market that Google
| lives in. They're capitalists trying to make as much money as
| possible. I'm simply evaluating that what they're showing I
| think is absolutely technically possible and I think Google
| can deliver it even if its not ready today.
|
| Do I think it's supremely ethical the way that they did it?
| No I don't.
| robbomacrae wrote:
| The voice interaction part didn't look a far cry from what
| we are doing with Dynamic Interaction at SoundHound.
| Because of this I assumed (like many it seems) that they
| had caught up.
|
| And it's dangerous to assume they can just "deliver later".
| It's not that simple. If it is why not bake it in right now
| instead of committing fraud?
|
| This is damaging to companies that walk the walk and then
| people have literally said to me "but what about that
| Gemini"? and dismiss our work.
| AndrewKemendo wrote:
| I feel that more than you realize
|
| That was basically what magic leap did to the whole AR
| development market. Everyone deep in it knew they
| couldn't do it but they messed up so badly that it
| basically killed the entire industry
| mvdtnz wrote:
| I don't care what google could, in theory, deliver on some
| time in the future maybe. That's irrelevant. They are
| demonstrating something that can't be done with the product
| as they are selling it.
| Aurornis wrote:
| > However because it's so easy to fix this is not an issue and
| it doesn't slow me down at all.
|
| But that's a different issue than LLM hallucinations.
|
| With Swype, you already know what the correct output looks
| like. If the output doesn't match what you wanted, you
| immediately understand and fix it.
|
| When you ask an LLM a question, you don't necessarily know the
| right answer. If the output looks confident enough, people take
| it as the truth. Outside of experimenting and testing, people
| aren't using LLMs to ask questions for which they already know
| the correct answer.
| cja wrote:
| I think you mean swipe. Swype was a brilliant third party
| keyboard app for Android which was better at text prediction
| and manual correction than Gboard is today. If however you
| really do still use Swype then please tell me how because I
| miss it.
| AndrewKemendo wrote:
| Ha good point, and yes I agree Swype continues to be the best
| text input technology that I'll never be able to use again. I
| guess I just committed genericide here but I meant the
| general "swiping" process at this point
| snowwrestler wrote:
| The insight here is that the speed of correction is a crucial
| component of the perceived long-term value of an interface
| technology.
|
| It is the main reason that handwriting recognition did not
| displace keyboards. Once the handwriting is converted to text,
| it's easier to fix errors with a pointer and keyboard. So after
| a few rounds of this most people start thinking: might as well
| just start with the pointer and keyboard and save some time.
|
| So the question is, how easy is it to detect and correct errors
| in generative AI output? And the unfortunate answer is that
| unless you already know the answer you're asking for, it can be
| very difficult to pick out the errors.
| AndrewKemendo wrote:
| I think this is a good rebuttal.
|
| Yeah the feedback loop with consumers has a higher likelihood
| of being detrimental, so even if the iteration rate is high,
| it's potentially high cost at each step.
|
| I think the current trend is to nerf the models or otherwise
| put bumpers on them so people can't hurt themselves. That's
| one approach that is brittle at best and someone with more
| risk tolerance (OpenAI) will exploit that risk gap.
|
| It's a contradiction then at best and depending on the level
| of unearned trust from the misleading marketing, will
| certainly lead to some really odd externalities
|
| Think "man follows google maps directions into pond" but for
| vastly more things.
|
| I really hated marketing before but yeah this really proves
| the warning I make in the AI addendum to my scarcity theory
| (in my bio).
| CamperBob2 wrote:
| Any sufficiently-advanced technology is indistinguishable from a
| rigged demo.
| peteradio wrote:
| Fake it til you make it, then keep faking it.
| rollulus wrote:
| I watched this video, impressed, and thought: what if it's fake.
| But then dismissed the thought because it would come out and the
| damage wouldn't be worth it. I was wrong.
| imiric wrote:
| The worst part is that there won't be any damage. They'll
| release a blog post with PR apologies, but the publicity they
| got from this stunt will push up their brand in mainstream AI
| conversations regardless.
|
| "There's no such thing as bad publicity."
| steego wrote:
| There's no such thing as bad publicity only applies to people
| and companies that know how to spin it.
|
| Reading the comments of all these disillusioned developers,
| it's already damaged them because now smart people will be
| extra dubious when Google starts making claims.
|
| They just made it harder for themselves to convince
| developers to even try their APIs, let alone bet on them.
|
| This was stupid.
| h0rv wrote:
| https://nitter.unixfox.eu/parmy/status/1732811357068615969?f...
| Nekorosu wrote:
| Gemini demo looks like ChatGPT with a video feed, except it
| doesn't exist, like ChatGPT. I have ChatGPT on my phone right
| now, and it works (and it can process images, audio, and audio
| feed in). This means Google has shown nothing of substance. In my
| world, it's a classic stock price manipulation move.
| onlyrealcuzzo wrote:
| Gemini Pro is available on Bard now.
|
| Ultra is not yet available.
| replwoacause wrote:
| Yeah and have you tried it? It's as dogshit as the original
| Bard.
| Kim_Bruning wrote:
| Even a year ago, this advert would have been obvious puffery in
| advertising.
|
| But right now, all the bits needed to do this already exist (just
| need to be assembled and -to be fair- given a LOT of polish), so
| it would be somewhat reasonable to think that someone had
| actually Put In The Work already.
| xnx wrote:
| That demo was much further on the "marketing" end of the spectrum
| when compared to some of their other videos from yesterday which
| even included debug views: https://youtu.be/v5tRc_5-8G4?t=43
| karaterobot wrote:
| This is endemic to public product demos. The thing never works as
| it does in the video. I'm not excusing it, I'm saying: don't
| trust public product demos. They are commercials, they exist to
| sell to you, not to document objectively and accurately, and they
| will always lie and mislead within the limits of the law.
| k__ wrote:
| I really thought this was a realtime demo.
|
| Shame on them :(
| suriyaG wrote:
| The bloomberg article seems to have been taken down and is now
| going to 404.
| https://www.bloomberg.com/opinion/articles/2023-12-07/google...
| dilap wrote:
| Just an error in the link, here's the corrected version:
| https://www.bloomberg.com/opinion/articles/2023-12-07/google...
| thrtythreeforty wrote:
| and here's a readable version: https://archive.ph/ABhZi
| dougmwne wrote:
| I was fooled. The model release announcement said it could accept
| video and audio multi-modal input. I understood that there was a
| lot of editing and cutting, but I really believed I was looking
| at an example of video and audio input. I was completely
| impressed since it's quite a leap to go from text and still
| images to "eyes and ears." There's even the segment where
| instruments are drown and music was generated. I thought I was
| looking at a model that could generate music based on language
| prompts, as we have seen specialized models do.
|
| This was all fake. You are taking a collection of cherry picked
| prompt engineered examples, then dramatizing them for maximum
| shareholder hype. The music example was just outputting a
| description of a song, not the generated music we heard in the
| video.
|
| It's one thing to release a hype video with what-ifs and quite
| another to claim that your new multi-modal model is king of the
| hill then game all the benchmarks and fake all the demos.
|
| Google seems to be in an evil phase. OpenAI and MS must be quite
| pleased with themselves.
| skepticATX wrote:
| Exactly. Personally I'm fine with both:
|
| 1) Forward looking demoes that demonstrate the future of your
| product, where it's clear that you're not there yet but working
| in that direction
|
| or
|
| 2) Demoes that show off current capabilities, but are scripted
| and edited to do so in the best light possible.
|
| Both of those are standard practice and acceptable. What Google
| did was just wrong. They deserve to face backlash for this.
| miraculixx wrote:
| Do you believe everything verbatim that companies tell you in
| advertising?
| gregshap wrote:
| If they show a car driving I believe it's capable of self-
| propulsion and not just rolling downhill.
| dylan604 wrote:
| Hmm, might I interest you in a video of an electric semi-
| truck?
| macNchz wrote:
| A marketing trick that has, in fact, been tried:
| https://arstechnica.com/cars/2020/09/nikola-admits-
| prototype...
| daveguy wrote:
| Used to be "marketing tricks" were prosecuted as fraud.
| jes5199 wrote:
| still is. Nikola's CEO, Trevor Milton, was convicted of
| fraud and is awaiting sentencing.
| olliej wrote:
| If I recall correctly, that led to literal criminal fraud
| charges.
|
| And iirc Tesla is also being investigated for fraudulent
| claims for faking the safety of their self driving cars.
| sp332 wrote:
| When a company invents tech that can do this, how would their
| ad be different?
| slim wrote:
| this was plausible
| steego wrote:
| No, but most people tend to make a mental note of which
| companies tend to deliver and which ones work hard to mislead
| them.
|
| You do understand the concept of reputation, right?
| replwoacause wrote:
| Well put. I'm not touching anything Google does any more.
| They're far too dishonest. This failed attempt at a release
| (which turns out was all sizzle and no steak) only underscored
| how far behind OpenAI they actually are. I'd love to have been
| a fly on the wall in the OAI offices when this demo video went
| live.
| renegade-otter wrote:
| This kind of moral fraud - unethical behavior - is tolerated
| for some reason. It's almost like investors want to be fooled.
| There is no room for due diligence. They squeel like excited
| Taylor Swift fans as they are being lied to.
| rdedev wrote:
| Seems reminiscent of a video where the lead research department
| within Google is an animation studio (wish I could remember
| more about that video)
|
| Doing all these hype videos just for the sake of satisfying
| shareholders or whatever is just making me loose trust in their
| research division. I don't think they did anything like this
| when they released Bert.
| Davidzheng wrote:
| I agree completely. When alphazero was announced I remember
| feeling like shocked over how they stated this revolutionary
| breakthrough as if it was like a regular thing. Alphafold and
| Alphacode are also impressive but this one just sounds like
| it was forced from Sundar and not the usual deepmind
| hanspeter wrote:
| I too thought it was able to accept video.
|
| Given the massive data volume in videos, I assumed it processed
| video into pictures by extracting a frame per second or
| something along those lines, while still taking the entire
| video as the initial input.
|
| Turns out, it wasn't even doing that!
| iamleppert wrote:
| You can tell whoever put together that demo video gave no f*cks
| whatsoever. This is the quality of work you can expect under an
| uninspiring leader (Sundar) in a culture of constant layoff fear
| and bureaucracy.
|
| Literally everyone I know who works at Google hates their job and
| are completely checked out.
| CamperBob2 wrote:
| Huh? It was a GREAT demo video!
|
| If it had been real, that is.
| htk wrote:
| The whole Gemini webpage and contents felt weird to me, it's in
| the uncanny valley of trying to look and feel like an Apple
| marketing piece. The hyperbolic language, surgically precise
| ethnic/gender diversity, unnecessary animations and the sales
| pitch from the CEO felt like a small player in the field trying
| to pass as a big one.
| kjkjadksj wrote:
| I'm imagining the project managers are patting themselves on
| the back for checking all the performative boxes, blind to the
| absolute satire of it all.
| cedws wrote:
| I got the same vibes. Ultra and Pro. It feels tacky that it
| declares the "Gemini era" before it's even available. Google
| _really_ want to be seen as level on the playing field.
| crazygringo wrote:
| > _surgically precise ethnic /gender diversity_
|
| What does that mean and why is it bad?
|
| Diversity in marketing is used because, well, your desired
| market is diverse.
|
| I don't know what it means for it to be surgically precise,
| though.
| cheeze wrote:
| Agreed with your comment. This is every marketing department
| on the planet right now, and it's not a bad thing IMO. Can
| feel a bit forced at times, but it's better than the
| alternative.
| kozikow wrote:
| It's funny because now the OpenAI keynote feels like it's
| emulating the Google keynotes from 5 years ago.
|
| Google Keynote feels like it's emulating the Apple keynote from
| 5 years ago.
|
| And the Apple keynote looks like robots just out of an uncanny
| valley pretending to be humans - just like keynotes might look
| in 5 years, but actually made by AI. Apple is always ahead of
| the curve in keynote trends.
| robbomacrae wrote:
| The more I think about this the more it rings true...
| wharvle wrote:
| I hadn't thought about it until just now, but the most recent
| Apple events really are the closest real-person thing I've
| ever seen to some of the "good" computer generated
| photorealistic (kinda...) humans "reading" with text-to-
| speech that I've seen.
|
| It's the stillness between "beats" that does it, I think, and
| the very-constrained and repetitive motion.
| sheepscreek wrote:
| I don't understand why is Gemini even considered "jaw-dropping"
| to begin with. GPT-4V has set the bar so high that all their
| demos and presentations paled in comparison. And it's available
| for anyone to use. People have already build mind-blowing demos
| with it (like https://arstechnica.com/information-
| technology/2023/11/ai-po...).
|
| The entire launch felt like a concentrated effort to "appear"
| competitive to OpenAI. Google was splitting hairs talking about
| low single digit percentage improvement in benchmarks. Against a
| model that has been out for over 6 months.
|
| I have never been so unimpressed with them. Not only has OpenAI
| managed to snag this one from under Google's nose, IMO - they
| seem to be defending their lead quite well. Now that is something
| unmistakably remarkable. Color me impressed!
| kjkjadksj wrote:
| Some other commenter, a former googler, a while back alluded to
| figuring out the big secret and being thrown for a tizzy at the
| resulting cognitive dissonance they realize they've been buying
| into. Its never about making a good product. Its about keeping
| up with the joneses in the eyes of tech investors. And just
| look at the movement on the stock today as a result of this
| probable lemon of a product: nothing else mattered except
| keeping up appearances. CEOs make historic careers optimizing
| companies for appearances over function like this.
| modeless wrote:
| That's not the only thing wrong. Gemini makes a false statement
| in the video, serving as a great demonstration of how these
| models still outright lie so frequently, so casually, and so
| convincingly that you won't notice, even if you have a whole team
| of researchers and video editors reviewing the output.
|
| It's the single biggest problem with LLMs and Gemini isn't
| solving it. You simply can't rely on them when correctness is
| important. Even when the model has the knowledge it would need to
| answer correctly, as in this case, it will still lie.
|
| The false statement is after it says the duck floats, it
| continues "It is made of a material that is less dense than
| water." This is false; "rubber" ducks are made of vinyl polymers
| which are more dense than water. It floats because the hollow
| shape contains air, of course.
| ace2358 wrote:
| I totally agree with you on the confident lies. And it's really
| tough. Technically the duck is made out of air and plastic
| right?
|
| If I pushed the model further on the composition of a rubber
| duck, and it failed to mention its construction, then it'd be
| lying.
|
| However there is this disgusting part of language where a
| statement can be misleading, technically true, not the whole
| truth, missing caveats etc.
|
| Very challenging problem. Obviously Google decided to mislead
| the audience and basically cover up the shortcomings. Terrible
| behaviour.
| mechagodzilla wrote:
| No, the density of the object is less than water, not the
| density of the material. The Duck is made of plastic, and it
| traps air. Similarly, you can make a boat that floats in
| water out of concrete or metal. It is an important
| distinction when trying to understand buoyancy.
| modeless wrote:
| Calling the air inside the duck (which is not sealed inside)
| part of its "material" would be misleading. That's not how
| most people would interpret the statement and I'm confident
| that's not the explanation for why the statement was made.
| onedognight wrote:
| The air doesn't matter. Even with a vacuum inside it would
| float. It's the overall density of "the duck" that matters,
| not the density of the plastic.
| hunter2_ wrote:
| A canoe floats, and that doesn't even command any thought
| regarding whether you can replace trapped air with a
| vacuum. If you had a giant cube half full of water, with
| a boat on the water, the boat would float regardless of
| whether the rest of the cube contained air or vacuum, and
| regardless of whether the boat traps said air (like a
| pontoon) or is totally vented (like a canoe). The overall
| density of the canoe is NOT influenced by its shape or
| any air, though. The canoe is strictly more dense than
| water (it will sink if it capsizes) yet in the correct
| orientation it floats.
|
| What does matter, however, is the overall density of the
| space that was water and became displaced by the canoe.
| That space can be populated with dense water, or with a
| less dense canoe+air (or canoe+vacuum) combination.
| That's what a rubber duck also does: the duck+air (or
| duck+vacuum) combination is less dense than the displaced
| water.
| jmathai wrote:
| This seems to be a common view among some folks. Personally,
| I'm impartial.
|
| Search or even asking other expert human beings are prone to
| provide incorrect results. I'm unsure where this expectation of
| 100% absolute correctness comes from. I'm sure there are use
| cases, but I assume it's the vast minority and most can
| tolerate larger than expected inaccuracies.
| stefan_ wrote:
| Let's see, so we exclude law, we exclude medical.. it's
| certainly not a "vast minority" and the failure cases are
| nothing at all like search or human experts.
| jmathai wrote:
| Are you suggesting that failure cases are lower when
| interacting with humans? I don't think that's my experience
| at all.
|
| Maybe I've only ever seen terrible doctors but I _always_
| cross reference what doctors say with reputable sources
| like WebMD (which I understand likely contain errors).
| Sometimes I 'll go straight to WebMD.
|
| This isn't a knock on doctors - they're humans and prone to
| errors. Lawyers, engineers, product managers, teachers too.
| stefan_ wrote:
| You think you ask your legal assistant to find some
| precedents related to your current case and they will
| come back with an A4 page full of made up cases that
| sound vaguely related and convincing but _are not real_?
| I don 't think you understand the failure case at all.
| jmathai wrote:
| That example seems a bit hyperbolic. Do you think lawyers
| who leverage ChatGPT will take the made up cases and
| present them to a judge without doing some additional
| research?
|
| What I'm saying is that the tolerance for mistakes is
| strongly correlated to the value ChatGPT creates. I think
| both will need to be improved but there's probably more
| opportunity in creating higher value.
|
| I don't have a horse in the race.
| jazzyjackson wrote:
| > Do you think lawyers who leverage ChatGPT will take the
| made up cases and present them to a judge without doing
| some additional research?
|
| You don't?
|
| https://fortune.com/2023/06/23/lawyers-fined-filing-
| chatgpt-...
| modeless wrote:
| > Do you think lawyers who leverage ChatGPT will take the
| made up cases and present them to a judge without doing
| some additional research?
|
| I generally agree with you, but it's funny that you use
| this as an example when it already happened.
| https://arstechnica.com/tech-policy/2023/06/lawyers-have-
| rea...
| jmathai wrote:
| _facepalm_
| freejazz wrote:
| What would be the point of a lawyer using chatGPT if it
| had to root through every single reference chatGPT relied
| upon? I don't have to doublecheck every reference of a
| junior attorney, because they actually know what they are
| doing, and when they don't, it's easy to tell and wont
| come with fraudulently created decisions/pleadings, etc
| anon373839 wrote:
| > Do you think lawyers who leverage ChatGPT will take the
| made up cases and present them to a judge without doing
| some additional research
|
| I really don't recommend using ChatGPT (even GPT-4) for
| legal research or analysis. It's simply terrible at it if
| you're examining anything remotely novel. I suspect there
| is a valuable RAG application to be built for searching
| and summarizing case law, but the "reasoning" ability and
| stored knowledge of these models is worse than useless.
| binwiederhier wrote:
| I'm a software engineer, and I more or less stopped asking
| ChatGPT for stuff that isn't mainstream. It just hallucinates
| answers and invents config file options or language
| constructs. Google will maybe not find it, or give you an
| occasional outdated result, but it rarely happens that it
| just finds stuff that's flat out wrong (in technology at
| least).
|
| For mainstream stuff on the other hand ChatGPT is great. And
| I'm sure that Gemini will be even better.
| jmathai wrote:
| > it rarely happens that it just finds stuff that's flat
| out wrong
|
| "Flat out wrong" implies determinism. For answers which are
| deterministic such as "syntax checking" and "correctness of
| code" - this already happens.
|
| ChatGPT, for example, will write and execute code. If the
| code has an error or returns the wrong result it will try a
| different approach. This is in production today (I use the
| paid version).
| bongodongobob wrote:
| Dollars to doughnuts says they are using GPT3.5.
| tomjakubowski wrote:
| I'm currently working with some relatively obscure but
| open source stuff (JupyterLite and Pyodide) and ChatGPT 4
| confidently hallucinates APIs and config options when I
| ask it for help.
|
| With more mainstream libraries it's pretty good though
| yieldcrv wrote:
| I use chatgpt4 for very obscure things
|
| If I ever worried about being quoted then I'll verify the
| information
|
| otherwise I'm conversational, have taken an abstract idea
| into a concrete one and can build on top of it
|
| But I'm quickly migrating over to mistral and if that
| starts going off the rails I get an answer from chatgpt4
| instead
| potatolicious wrote:
| The important thing is that with Web Search as a user you
| can learn to adapt to varying information quality. I have a
| higher trust for Wikipedia.org than I do for SEO-R-US.com,
| and Google gives me these options.
|
| With a chatbot that's largely impossible, or at least
| impractical. I don't know where it's getting anything from
| - maybe it trained on a shitty Reddit post that's 100%
| wrong, but I have no way to tell.
|
| There has been some work (see: Bard, Bing) where the LLM
| attempts to cite its sources, but even then that's of
| limited use. If I get a paragraph of text as an answer, is
| the expectation really that I crawl through each substring
| to determine their individual provenances and
| trustworthiness?
|
| The _shape_ of a product matters. Google as a linker
| introduces the ability to adapt to imperfect information
| quality, whereas a chatbot does not.
|
| As an exemplar of this point - I _don 't_ trust when Google
| simply pulls answers from other sites and shows it in-line
| in the search results. I don't know if I should trust the
| source! At least there I can find out the source from a
| single click - with a chatbot that's largely impossible.
| dylan604 wrote:
| > I'm unsure where this expectation of 100% absolute
| correctness comes from.
|
| It's a computer. That's why. Change the concept slightly:
| would you use a calculator if you had to wonder if the answer
| was correct or maybe it just made it up? Most people feel the
| same way about any computer based anything. I personally feel
| these inaccuracies/hallucinations/whatevs are only allowing
| them to be one rung up from practical jokes. Like I honestly
| feel the devs are fucking with us.
| bostonpete wrote:
| Speech to text is often wrong too. So is autocorrect. And
| object detection. Computers don't have to be 100% correct
| in order to be useful, as long as we don't put too much
| faith in them.
| dylan604 wrote:
| Your caveat is not the norm though, as everyone _is_
| putting a lot of faith in them. So, that 's part of the
| problem. I've talked with people that aren't developers,
| but they are otherwise smart individuals that have
| absolutely not considered that the info is not correct.
| The readers here are a bit too close to the subject, and
| sometimes I think it is easy to forget that the vast
| majority of the population do not truly understand what
| is happening.
| dr_dshiv wrote:
| Nah, I don't think anything has the potential to build
| critical thinking like LLMs en masse. I only worry that
| they will get better. It's when they are 99.9% correct we
| should worry.
| clumpthump wrote:
| Call me old fashioned, but I would absolutely like to see
| autocorrect turned off in many contexts. I much prefer to
| read messages with 30% more transparent errors rather
| than any increase in opaque errors. I can tell what
| someone meant if I see "elephent in the room", but not
| "element in the room" (not an actual example, autocorrect
| would likely get that one right).
| llbeansandrice wrote:
| People put too much faith in conspiracy theories they
| find on YT, TikTok, FB, Twitter, etc. What you're
| claiming is already not the norm. People already put too
| much faith into all kinds of things.
| kamikaz1k wrote:
| Okay, but search is done on a computer, and like the person
| you're replying to said, we accept close enough.
|
| I don't necessarily disagree with your interpretation, but
| there's a revealed preference thing going on.
|
| The number of non-tech ppl I've heard directly reference
| ChatGPT now is absolutely shocking.
| bitvoid wrote:
| > The number of non-tech ppl I've heard directly
| reference ChatGPT now is absolutely shocking.
|
| The problem is that a lot of those people will take
| ChatGPT output at face value. They are wholly unaware
| that of its inaccuracies or that it hallucinates. I've
| seen it too many times in the relatively short amount of
| time that ChatGPT has been around.
| bongodongobob wrote:
| So what? People do this with Facebook news too. That's a
| people problem, not an LLM problem.
| janalsncm wrote:
| If we rewind a little bit to the mid to late 2010s,
| filter bubbles, recommendation systems and unreliable
| news being spread on social media was a big problem. It
| was a simpler time, but we never really solved the
| problem. Point is, I don't see the existence of other
| problems as an excuse for LLM hallucination, and writing
| it off as a "people problem" really undersells how hard
| it is to solve people problems.
| dylan604 wrote:
| People on social media are absolutely 100% posting things
| deliberately to fuck with people. They are actively
| seeking to confuse people, cause chaos, divisiveness, and
| other ill intended purposes. Unless you're saying that
| the LLM developers are actively doing the same thing, I
| don't think comparing what people find on the socials vs
| getting back as a response from a chatBot is a logical
| comparison at all
| zozbot234 wrote:
| How is that any different from what these AI chatbots are
| doing? They make stuff up that they predict will be
| rewarded highly by humans who look at it. This is exactly
| what leads to truisms like "rubber duckies are made of a
| material that floats over water" - which _looks_ like it
| should be correct, even though it 's wrong. It really is
| no different from Facebook memes that are devised to get
| a rise out of people and be widely shared.
| dylan604 wrote:
| Because we shouldn't be striving to make mediocrity. We
| should be striving to build better. Unless the devs of
| the bots are wanting to have a bot built on trying to
| deceive people, I just don't see the purpose of this. If
| we can "train" a bot and fine tune it, we should be fine
| tuning truth and telling it what absolutely is bullshit.
|
| To avoid the darker topics to keep the conversation on
| the rails, if there were a misinformation campaign that
| was trying to state that the Earth's sky is red, then the
| fine tuning should be able to inform that this is clearly
| fake so when quoting this it should be stated as
| incorrect information that is out there. This kind of
| development should be how we can clean up the fake, but
| nope, we're seemingly quite happy at accepting it. At
| least that's how your question comes off to me.
| zozbot234 wrote:
| Sure, but current AI bots are just following the human
| feedback they get. If the feedback is naive enough to
| score the factoid about rubber duckys as correct, guess
| what, that's the kind of thing these AI's will target.
| You can try to address this by prompting them with
| requests like "do you think this answer is correct and
| ethical? Think through this step by step" ('reinforcement
| learning from AI feedback') but that's _very_ ad hoc and
| uncertain - ultimately, the humans in the loop call the
| shots.
| p1esk wrote:
| There are far more people who post obviously wrong,
| confusing and dangerous things online with total
| conviction. There are people who seriously believe Earth
| is flat, for example.
| lm28469 wrote:
| Literally everything is a "people problem"
|
| You can kill people with a fork, it doesn't mean you
| should legally be allowed to own a nuclear bomb "because
| it's just the same". The problem always come from scale
| and accessibility
| umvi wrote:
| So you're saying we need a Ministry of Truth to protect
| people from themselves? This is the same argument used to
| suppress "harmful" speech on any medium.
| dylan604 wrote:
| I've gotten to the point where I want "advertisment"
| stamped on anything that is, and I'm getting to the point
| I want "fiction" stamped on anything that is. I have no
| problem with fiction existing. It can be quite fun.
| People trying to pass fiction as fact is a problem
| though. Trying to force a "fact" stamp would be
| problematic though, so I'd rather label everything else.
|
| How to enforce it is the real sticky wicket though, so
| it's only something best discussed at places like this or
| while sitting around chatting while consuming
| jen20 wrote:
| "Computer says no" is not a meme for no reason.
| altruios wrote:
| why should all computing be deterministic?
|
| let me show you this "genius"/"wrong-thinking" person as to
| say about AL(artificial life) and deterministic computing.
|
| https://www.cs.unm.edu/~ackley/
|
| https://www.youtube.com/user/DaveAckley
|
| To sum up a bunch of their content: You can make
| intractable problems solvable/crunchable if you allow just
| a little error into the result (which is reduced the longer
| the calculation calculates). And this is acceptable for a
| number of use cases where initial accuracy is less
| important that instant feedback.
|
| It is radically different from a Von Neumann model of a
| computer - where there is a deterministic 'totalitarian
| finger pointer' pointing to some registry (and only one
| registry at a time) is an inherently limited factor. In
| this model - each computational resource (a unit of ram,
| and a processing unit) fights for and coordinates reality
| with it's neighbors without any central coordination.
|
| Really interesting stuff. still in its infancy...
| modeless wrote:
| Honestly I agree. Humans make errors all the time. Perfection
| is not necessary and requiring perfection blocks deployment
| of systems that represent a substantial improvement over the
| status quo despite their imperfections.
|
| The problem is a matter of degree. These models are
| _substantially_ less reliable than humans and far below the
| threshold of acceptability in most tasks.
|
| Also, it seems to me that AI can and will surpass the
| reliability of humans by a lot. Probably not by simply
| scaling up further or by clever prompting, although those
| will help, but by new architectures and training techniques.
| Gemini represents no progress in that direction as far as I
| can see.
| epalm wrote:
| I know exactly where the expectation comes from. The whole
| world has demanded absolute precision from computers for
| decades.
|
| Of course, I agree that if we want computers to "think on
| their own" or otherwise "be more human" (whatever that means)
| we should expect a downgrade in correctness, because humans
| are wrong all the time.
| jmathai wrote:
| > The whole world has demanded absolute precision from
| computers for decades.
|
| Computer engineers maybe. I think the general population is
| quite tolerant of mistakes as long as the general value is
| high.
|
| People generally assign very high value to things computers
| do. To test this hypothesis all you have to do is ask folks
| to go a few days without their computer or phone.
| creer wrote:
| > The whole world has demanded absolute precision from
| computers
|
| The opposite. Far too tolerant of the excuse "sorry,
| computer mistake." (But yeah, just at the same time as "the
| computer says so".)
| lancesells wrote:
| Is it less reliable than an encyclopedia? It is less
| reliable than Wikipedia? Those aren't infallible but what's
| the expectation if it's wrong on something relatively
| simple?
|
| With the rush of investment in dollars and to use these in
| places like healthcare, government, security, etc. there
| should be absolute precision.
| toxik wrote:
| Aside: this is not what impartial means.
| SkyBelow wrote:
| Humans are imperfect, but this comes with some benefits to
| make up for it.
|
| First, we know they are imperfect. People seem to put more
| faith into machines, though I do sometimes see people being
| too trusting of other people.
|
| Second, we have methods for measuring their imperfection.
| Many people develop ways to tell when someone is answering
| with false or unjustified confidence, at least in fields they
| spend significant time in. Talk to a scientist about cutting
| edge science and you'll get a lot of 'the data shows', 'this
| indicates', or 'current theories suggest'.
|
| Third, we have methods to handle false information that
| causes harm. Not always perfect methods, but there are
| systems of remedies available when experts get things wrong,
| and these even include some level of judging reasonable
| errors from unreasonable errors. When a machine gets it
| wrong, who do we blame?
| howenterprisey wrote:
| Absolutely! And fourth, we have ways to make sure the same
| error doesn't happen again; we can edit Wikipedia, or tell
| the person they were wrong (and stop listening to them if
| they keep being wrong).
| snowwrestler wrote:
| If it's no better than asking a random person, then where is
| the hype? I already know lots of people who can give me free,
| maybe incorrect guesses to my questions.
|
| At least we won't have to worry about it obtaining god-like
| powers over our society...
| sorokod wrote:
| Guessing from the last sentence that you are one of those
| "most" who "can tolerate larger than expected inaccuracies".
|
| How much inaccuraciy would that be ?
| pid-1 wrote:
| Most people I worked with either tell me "I don't know" or "I
| think x, but with not sure" when they are not sure about
| something, the issue with LLMs is they don't have this
| concept.
| taurath wrote:
| I find it ironic that computer scientists and technologists
| are frequently uberrationalists to the point of self parody
| but they get hyped about a technology that is often
| confidently wrong.
|
| Just like the hype with AI and the billions of dollars going
| into it. There's something there but it's a big fat unknown
| right now whether any part of the investment will actually
| pay off - everyone needs it to work to justify any amount of
| the growth of the tech industry right now. When everyone
| needs a thing to work, it starts to really lose the
| fundamentals of being an actual product. I'm not saying it's
| not useful, but is it as useful as the valuations and
| investments need it to be? Time will tell.
| latexr wrote:
| If a human expert gave wrong answers as often and as
| confidently as LLMs, most would consider no longer asking
| them. Yet people keep coming back to the same LLM despite the
| wrong answers to ask again in a different way (try that with
| a human).
|
| This insistence on comparing machines to humans to excuse the
| machine is as tiring as it is fallacious.
| kaffeeringe wrote:
| 1. Hunans may also never be 100% - but it seems they are more
| often correct. 2. When AI is wrong it's often not only
| slighty off, but completely off the rails. 3. Humans often
| tell you when they are not sure. Even if it's only their
| tone. AI is always 100% convinced it's correct.
| Frost1x wrote:
| >I'm unsure where this expectation of 100% absolute
| correctness comes from. I'm sure there are use cases, but I
| assume it's the vast minority and most can tolerate larger
| than expected inaccuracies.
|
| As others hinted at, there's some bias because it's coming
| from a computer, but I think it's far more nuanced than that.
|
| I've worked with many experts and professionals through my
| career ranging across medicine, various types of engineers,
| scientists, academics, researchers and so on and the pattern
| I often see is the level of certainty presented that always
| bothers me and the same is often embedded in LLM responses.
|
| While humans don't typically quantify the certainty of their
| statements, the best SMEs I've ever worked with make it very
| clear what level of certainty they have when making
| professional statements. The SMEs who seem to be more often
| wrong than not speak in certainty quite often (some of this
| is due to cultural pressures and expectations surrounding
| being an "expert").
|
| In this case, I would expect a seasoned scientist to say
| something in response to the duck question that: "many rubber
| ducks exist and are designed to float, this one very well
| might, we'd really need to test it or have far more
| information about the composition of the duck, the design,
| the medium we want it in (Water? Mecury? Helium?)" and so on.
| It's not an exact answer but you understand there's
| uncertainty there and we need to better clarify our question
| and the information surrounding that question. The fact is,
| it's really complex to know if it'll float or not from visual
| information alone.
|
| It could have an osmimum ball inside that overcomes most the
| assumed buoyancy the material contains, including the air
| demonstrated to make it squeak. It's not transparent. You
| don't _know_ for sure and the easiest way to alleviate
| uncertainty in this case is simply to test it.
|
| There's _so_ much uncertainty in the world, around what seem
| like the most certain and obvious things. LLMs seem to have
| grabbed some of this bad behavior from human language and
| culture where projecting confidence is often better (for
| humans) than being correct.
| eviks wrote:
| Where did you get the 100% number from? It's not in the
| original comment, it's not in a lot of similar criticisms of
| the models.
| brookst wrote:
| Is it possible for humans to be wrong about something, without
| lying?
| windowshopping wrote:
| I don't agree with the argument that "if a human can fail in
| this way, we should overlook this failing in our tooling as
| well." Because of course that's what LLMs are, tools, like
| any other piece of software.
|
| If a tool is broken, you seek to fix it. You don't just say
| "ah yeah it's a broken tool, but it's better than nothing!"
|
| All these LLM releases are amazing pieces of technology and
| the progress lately is incredible. But don't rag on people
| critiquing it, how else will it get better? Certainly not by
| accepting its failings and overlooking them.
| rkeene2 wrote:
| If a broken tool is useful, do you not use it because it is
| broken ?
|
| Overpowered LLMs like GPT-4 are both broken (according to
| how you are defining it) and useful -- they're just not the
| idealized version of the tool.
| freejazz wrote:
| Maybe not if its the case that your use of the broken
| tool would result in the eventual undoing of your work.
| Like, lets say your staple gun is defective and doesn't
| shoot the staples deep enough, but it still shoots. You
| can keep using the gun, but it's not going to actually do
| its job. It seems useful and functional, but it isn't and
| its liable to create a much bigger mess.
| freedomben wrote:
| I think you're reading a lot into GP's comment that isn't
| there. I don't see any ragging on people critiquing it. I
| think it's perfectly compatible to think we should
| continually improve on these things while also recognizing
| that things can be useful without being perfect
| stocknoob wrote:
| "Broken" is word used by pedants. A broken tool doesn't
| work. This works, most of the time.
|
| Is a drug "broken" because it only cures a disease 80% of
| the time?
|
| The framing most critics seem to have is "it must be
| perfect".
|
| It's ok though, their negativity just means they'll miss
| out on using a transformative technology. No skin off the
| rest of us.
| bee_rider wrote:
| I think the comparison to humans is just totally useless.
| It isn't even just that, as a tool, it should be better
| than humans at the thing it does, necessarily. My monitor
| is on an arm, the arm is pretty bad at positioning things
| compared to all the different positions my human arms could
| provide. But it is good enough, and it does it tirelessly.
| A tool is fit for a purpose or not, the relative
| performance compared to humans is basically irrelevant.
|
| I think the folks making these tools tend to oversell their
| capabilities because they want us to imagine the
| applications we can come up with for them. They aren't
| selling the tool, they are selling the ability to make
| tools based on their platform, which means they need to be
| speculative about the types of things their platform might
| enable.
| lxgr wrote:
| Lying implies an intent to deceive despite, or giving a
| response despite having better knowledge, which I'd argue
| LLMs can't do, at least not yet. It just requires a more
| robust theory of mind than I'd consider them to realistically
| be capable of.
|
| They might have been trained/prompted with misinformation,
| but then it's the people doing the training/prompting who are
| lying, still not the LLM.
| og_kalu wrote:
| Not to say this example was lying but they can lie just
| fine - https://arxiv.org/abs/2311.07590
| lxgr wrote:
| They're lying in the same way that a sign that says "free
| cookies" is lying when there are actually no cookies.
|
| I think this is a different usage of the word, and we're
| pretty used to making the distinction, but it gets
| confusing with LLMs.
| og_kalu wrote:
| You are making an imaginary distinction that doesn't
| exist. It doesn't even make any sense in the context of
| the paper i linked.
|
| The model consistently and purposefully withheld
| knowledge it was directly aware of. This is lying under
| any useful definition of the word. You're veering off
| into meaningless philosophy that has no bearing on
| outcomes and results.
| hunter2_ wrote:
| To the question of whether it could have intent to deceive,
| going to the dictionary, we find that intent essentially
| means a plan (and computer software in general could be
| described as a plan being executed) and deceive essentially
| means saying something false. Furthermore, its plan is to
| talk in ways that humans talk, emulating their
| intelligence, and some intelligent human speech is false.
| Therefore, I do believe it can lie, and will whenever
| statistically speaking a human also typically would.
|
| Perhaps some humans never lie, but should the LLM be
| trained only on that tiny slice of people? It's part of
| life, even non-human life! Evolution works based on things
| lying: natural camouflage, for example. Do octopuses and
| chameleons "lie" when they change color to fake out
| predators? They have intent to deceive!
| vkou wrote:
| Most humans I professionally interact with don't double down
| on their mistakes when presented with evidence to the
| contrary.
|
| The ones that do are people I do my best to avoid interacting
| with.
|
| LLMs act more like the latter, than the former.
| ugh123 wrote:
| I don't see it as a problem with most non-critical uses cases
| (critical being things like medical diagnoses, controlling
| heavy machinery or robotics, etc).
|
| LLMs right now are most practical for generating templated text
| and images, which when paired with an experienced worker, can
| make them orders of magnitude more productive.
|
| Oh, DALL-E created graphic images with a person with 6 fingers?
| How long would it have taken a pro graphic artist to come up
| with all the same detail but with perfect fingers? Nothing
| there they couldn't fix in a few minutes and then SHIP.
| zer00eyz wrote:
| >> Nothing there they couldn't fix in a few minutes and then
| SHIP.
|
| If by ship, you mean put directly into the public domain then
| yes.
|
| https://www.goodwinlaw.com/en/insights/publications/2023/08/.
| ..
|
| and for more interesting takes:
| https://www.youtube.com/watch?v=5WXvfeTPujU&
| awongh wrote:
| I'm not an expert but I suspect that this aspect of lack of
| correctness in these models might be fundamental to how they
| work.
|
| I suppose there's two possible solutions: one is a new training
| or inference architecture that somehow understand "facts". I'm
| not an expert so I'm not sure how that would work, but from
| what I understand about how a model generates text, "truth"
| can't really be a element in the training or inference that
| affects the output.
|
| the second would be a technology built on top of the inference
| to check correctness, some sort of complex RAG. Again not sure
| how that would work in a real world way.
|
| I say it might be fundamental to how the model works because as
| someone pointed out below, the meaning of the word "material"
| could be interpreted as the air inside the duck. The model's
| answer was correct in a human sort of way, or to be more
| specific in a way that is consistent with how a model actually
| produces an answer- it outputs in the context of the input. If
| you asked it if PVC is heavier than water it would answer
| correctly.
|
| Because language itself is inherently ambiguous and the model
| doesn't actually understand anything about the world, it might
| turn out that there's no universal way for a model to know
| what's true or not.
|
| I could also see a version of a model that is "locked down" but
| can verify the correctness of its statements, but in a way that
| limits its capabilities.
| ajkjk wrote:
| > this aspect of lack of correctness in these models might be
| fundamental to how they work.
|
| Is there some sense in which this _isn 't_ obvious to the
| point of triviality? I keep getting confused because other
| people seem to keep being surprised that LLMs don't have
| correctness as a property. Even the most cursory
| understanding of what they're doing understands that it is,
| fundamentally, predicting words from other words. I am also
| capable of predicting words from other words, so I can guess
| how well that works. It doesn't seem to include correctness
| even as a concept.
|
| Right? I am actually genuinely confused by this. How is that
| people think it _could_ be correct in a systematic way?
| carstenhag wrote:
| Because it is assumed that it can think or/and reason. In
| this case, knowing the concepts of density, the density of
| a material, detecting the material from an image, detecting
| what object this image is. And, most importantly, knowing
| that this object is not solid. Because then it could not
| float.
| awongh wrote:
| Yeah. I think there's some ambiguity around the _meaning_
| of reasoning- because it is a kind of reasoning to say a
| Duck 's material is less dense than water. In a way it's
| reasoned that out, and it might actually say something
| about the way a lot of human reasoning works....
| (especially if you've ever listened to certain people talk
| out loud and say to yourself... huh?)
| janalsncm wrote:
| Just to play devil's advocate: we can train neural networks
| to model some functions exactly, given sufficient
| parameters. For example simple functions like ax^2 + bx +
| c.
|
| The issue is that "correctness" isn't a differentiable
| concept. So there's no gradient to descend. In general,
| there's no way to say that a sentence is more or less
| correct. Some things are just wrong. If I say that human
| blood is orange that's not more incorrect than saying it's
| purple.
| spadufed wrote:
| > Is there some sense in which this isn't obvious to the
| point of triviality?
|
| This is maybe a pedantic "yes", but is also extremely
| relevant to the outstanding performance we see in tasks
| like programming. The issue is primarily the size of the
| correct output space (that is, the output space we are
| trying to model) and how that relates to the number of
| parameters. Basically, there is a fixed upper bound on the
| amount of complexity that can be encoded by a given number
| of parameters (obvious in principle, but we're starting to
| get some theory about how this works). Simple systems or
| rather systems with simple rules may be below that upper
| bound, and correctness is achievable. For more complex
| systems (relative to parameters) it will still learn an
| approximation, but error is guaranteed.
|
| I am speculating now, but I seriously suspect the size of
| the space of not only one or more human language but also
| every fact that we would want to encode into one of these
| models is far too big a space for correctness to ever be
| possible without RAG. At least without some massive pooling
| of compute, which long term may not be out of the question
| but likely never intended for individual use.
|
| If you're interested, I highly recommend checking out some
| of the recent work around monosemanticity for what fleshing
| out the relationship between model-size and complexity
| looks like in the near term.
| michaelt wrote:
| I think very few people on this forum believe LLMs are
| _correct in a systematic way_ , but a lot of people seem to
| think there's something more than predicting words from
| other words.
|
| Modern machine learning models contain a lot of inscrutable
| inner layers, with far too many billions of parameters for
| any human to comprehend, so we can only speculate about
| what's going on. A lot of people think that, in order to be
| so good at generating text, there _must_ be a bunch of
| understanding of the world in those inner layers.
|
| If a model can write convincingly about a soccer game,
| producing output that's consistent with the rules, the
| normal flow of the game and the passage of time - to a lot
| of people, that implies the inner layers 'understand'
| soccer.
|
| And anyone who noodled around with the text prediction
| models of a few decades ago, like Markov chains, Bayesian
| text processing, sentiment detection and things like that
| can see that LLMs are massively, massively better than the
| output from the traditional ways of predicting the next
| word.
| ilaksh wrote:
| Bing chat uses gpt-4 and sites sources from it's retrieval.
| freedomben wrote:
| I think this problem needs to be solved at a higher level, and
| in fact Bard is doing exactly that. The model itself generates
| its output, and then higher-level systems can fact check it.
| I've heard promising things about feeding back answers to the
| model itself to check for consistency and stuff, but that
| should be a higher level function (and seems important to avoid
| infinite recursion or massive complexity stemming from the
| self-check functionality).
| modeless wrote:
| I'm not a fan of current approaches here. "Chain of thought"
| or other approaches where the model does all its thinking
| using a literal internal monologue in text seem like a dead
| end. Humans do most of their thinking non-verbally and we
| need to figure out how to get these models to think non-
| verbally too. Unfortunately it seems that Gemini represents
| no progress in this direction.
| freedomben wrote:
| > _Humans do most of their thinking non-verbally and we
| need to figure out how to get these models to think non-
| verbally too._
|
| That's a very interesting point, both technically and
| philosophically.
|
| Where Gemini is "multi-modal" from training, how close do
| you think that gets? Do we know enough about neurology to
| identical a native language in which we think? (not
| rhetorical questions, I'm really wondering)
| janalsncm wrote:
| Neural networks are only similar to brains on the
| surface. Their learning process is entirely different and
| their internal architecture is different as well.
|
| We don't use neural networks because they're similar to
| brains. We use them because they are arbitrary function
| approximators and we have an efficient algorithm
| (backprop) coupled with hardware (GPUs) to optimize them
| quickly.
| janalsncm wrote:
| The point of "verbalizing" the chain of thought isn't that
| it's the most effective method. And frankly I don't think
| it matters that humans think non verbally. The goal isn't
| to create a human in a box. Verbalizing the chain of
| thought allows us to audit the thought process, and also
| create further labels for training.
| modeless wrote:
| No, the point of verbalizing the chain of thought is that
| it's all we know how to do right now.
|
| > And frankly I don't think it matters that humans think
| non verbally
|
| You're right, that's not the _reason_ non-verbal is
| better, but it is _evidence_ that non-verbal is probably
| better. I think the reason it 's better is that language
| is extremely lossy and ambiguous, which makes a poor
| medium for reasoning and precise thinking. It would
| clearly be better to think without having to translate to
| language and back all the time.
|
| Imagine you had to solve a complicated multi-step physics
| problem, but after every step of the solution process
| your short term memory was wiped and you had to read your
| entire notes so far as if they were someone else's before
| you could attempt the next step, like the guy from
| Memento. That's what I imagine being an LLM using CoT is
| like.
| Davidzheng wrote:
| I mean a lot of problems are amenable to subdivision into
| parts where the process of each part is not needed for
| the other parts. It's not even clear that humans usually
| hold in memory all of process of the previous parts
| especially the it won't be used later.
| dragonwriter wrote:
| > "Chain of thought" or other approaches where the model
| does all its thinking using a literal internal monologue in
| text seem like a dead end. Humans do most of their thinking
| non-verbally and we need to figure out how to get these
| models to think non-verbally too.
|
| Insofar as we can say that models think _at all_ between
| the input and the stream of tokens output, they do it
| nonverbally. Forcing the structure of reduce _some of it_
| to verbal form short of the actual response-of-concern does
| not change that, just as the fact that humans reduce some
| of their thought to verbal form to work through problems
| doesn 't change that human thought is mostly nonverbal.
|
| (And if you don't consider what goes on between input and
| output thought, than chain of thought doesn't force all LLM
| thought to be verbal, because only the part that comes out
| in words is "thought" to start with in that case -- you are
| then saying that the basic architecture, not chain of
| thought prompting, forces all thought to be verbal.)
| modeless wrote:
| You're right, the models do think non-verbally. However,
| crucially, they can only do so for a fixed amount of time
| for each output token. What's needed is a way for them to
| think non-verbally continuously, and decide for
| themselves when they've done enough thinking to output
| the next token.
| Davidzheng wrote:
| Is it clear that humans can think nonverbally (including
| internal monologue) continuously? As in, for difficult
| reasoning tasks, do humans benefit a lot from extra time
| if they are not allowed internal monologue. Genuine
| question
| __s wrote:
| It also says the attribute of squeaking means it'll definitely
| float
| bongodongobob wrote:
| That's actually pretty clever because if it squeaks, there is
| air inside. How many squeaking ducks have you come across
| that don't float?
| davesque wrote:
| You could call it clever or you could call it a spurious
| correlation.
| bitshiftfaced wrote:
| There's nothing wrong with what you're saying, but what do you
| suggest? Factuality is an area of active research, and Deepmind
| goes into some detail in their technical paper.
|
| The models are too useful to say, "don't use them at all."
| Hopefully people will heed the warnings of how they can
| hallucinate, but further than that I'm not sure what more you
| can expect.
| modeless wrote:
| The problem is not with the model, but with its portrayal in
| the marketing materials. It's not even the fact that it lied,
| which is actually realistic. The problem is the lie was not
| called out as such. A better demo would have had the user
| note the issue and give the model the opportunity to correct
| itself.
| bitshiftfaced wrote:
| But you yourself said that it was so convincing that the
| people doing the demo didn't recognize it as false, so how
| would they know to call it out as such?
|
| I suppose they could've deliberately found a hallucination
| and showcased it in the demo. In which case, pretty much
| every company's promo material is guilty of not showcasing
| negative aspects of their product. It's nothing new or
| unique to this case.
| modeless wrote:
| They should have looked more carefully, clearly.
| Especially since they were criticized for the exact same
| thing in their last launch.
| twobitshifter wrote:
| I, a non-AGI, just 'hallucinated' yesterday. I hallucinated
| that my plan was to take all of Friday off and started
| wondering why I had scheduled morning meetings. I started
| canceling them in a rush. In fact, all week I had been planning
| to take a half day, but somehow my brain replaced the idea of a
| half day off with a full day off. You could have asked me and I
| would have been completely sure that I was taking all of friday
| off.
| margorczynski wrote:
| LLMs do not lie, nor do they tell the truth. They have no goal
| as they are not agents.
| modeless wrote:
| With apologies to Dijkstra, the question of whether LLMs can
| lie is about as relevant as the question of whether
| submarines can swim.
| rowanG077 wrote:
| The duck is indeed made of a material that is less dense.
| Namely water and air.
|
| If you go to such technical routes your definition is wrong
| too. It doesn't float because it contains air. If you poke in
| the head of the duck it will sink. Even though at all times it
| contains air.
| recursive wrote:
| The duck is made of water and air? Which duck are we talking
| about here.
| dogprez wrote:
| That's a tricky one though since the question is, is the air
| inside of the rubber duck part of the material that makes it?
| If you removed the air it definitely wouldn't look the same or
| be considered a rubber duck. I gave it to the bot since when
| taking ALL the material that makes it a rubber duck, it is less
| dense than water.
| bee_rider wrote:
| If you hold a rubber duck under water and squeeze out the
| air, it will fill with water and still be a rubber duck. If
| you send a rubber duck into space, it will become almost
| completely empty but still be a rubber duck. Therefore, the
| liquid used to fill the empty space inside it is not part of
| the duck.
|
| I mean apply this logic to a boat, right? Is the entire
| atmosphere part of the boat? Are we all on this boat as well?
| Is it a cruise boat? If so, where is my drink?
| modeless wrote:
| A rubber duck in a vacuum is still a rubber duck and it still
| floats (though water would evaporate too quickly in a vacuum,
| it could float on something else of the same density).
| dogprez wrote:
| A rubber duck with a vacuum inside (removing the air
| material) of it is just a piece of rubber with eyes.
| Assuming OP's point about the rubber not being less dense
| than water, it would sink, no?
| WhitneyLand wrote:
| Agree, then the question becomes how will this issue play out?
|
| Maybe AI correctness will be similar to automobile safety. It
| didn't take long for both to be recognized as fundamental
| issues with new transformative technologies.
|
| In both cases there seems to be no silver bullet. Mitigations
| and precautions will continue to evolve, with varying degrees
| of effectiveness. Public opinion and legislation will play some
| role.
|
| Tragically accidents will happen and there will be a cost to
| pay, which so far has been much higher and more grave for
| transportation.
| crazygringo wrote:
| EDIT: never mind, I missed the exact wording about being "made
| of a material..." which is definitely false then. Thanks for
| the correction below.
|
| Preserving the original comment so the replies make sense:
|
| ---
|
| I think it's a stretch to say that's false.
|
| In a conversational human context, saying it's made of rubber
| _implies_ it 's a rubber shell with air inside.
|
| It floats because it's rubber [with air] as opposed to being a
| ceramic figurine or painted metal.
|
| I can imagine most non-physicist humans saying it floats
| because it's rubber.
|
| By analogy, we talk about houses being "made of wood" when
| everybody knows they're made of plenty of other materials too.
| But the context is instead of brick or stone or concrete. It's
| not _false_ to say a house is made of wood.
| furyofantares wrote:
| This is what the reply was:
|
| > Oh, it it's squeaking then it's definitely going to float.
|
| > It is a rubber duck.
|
| > It is made of a material that is less dense than water.
|
| Full points for saying if it's squeaking then it's going to
| float.
|
| Full points for saying it's a rubber duck, with the
| implication that rubber ducks float.
|
| Even with all that context though, I don't see how "it is
| made of a material that is less dense than water" scores any
| points at all.
| yowzadave wrote:
| Yeah, I think arguing the logic behind these responses
| misses the point, since an LLM doesn't use any kind of
| logic--it just responds in a pattern that mimics the way
| people respond. It says "it is made of a material that is
| less dense than water" because that is a thing that is
| similar to what the samples in its training corpus have
| said. It has no way to judge whether it is correct, or even
| what the concept of "correct" is.
|
| When we're grading the "correctness" of these answers,
| we're really just judging the average correctness of
| Google's training data.
|
| Maybe the next step in making LLM's more "correct" is not
| to give them _more_ training data, but to find a way to
| _remove_ the bad training data from the set?
| modeless wrote:
| > In a conversational human context, saying it's made of
| rubber implies it's a rubber shell with air inside.
|
| Disagree. It could easily be solid rubber. Also, it's _not_
| made of rubber, and the model didn 't claim it was made of
| rubber either, so it's irrelevant.
|
| > It floats because it's rubber [with air] as opposed to
| being a ceramic figurine or painted metal.
|
| A ceramic figurine or painted metal in the same shape would
| float too. The claim that it floats because of the density of
| the material is false. It floats because the shape is hollow.
|
| > It's not false to say a house is made of wood.
|
| It's false to say a house is made of air simply because its
| shape contains air.
| omginternets wrote:
| People seem to want to use LLMs to mine knowledge, when really
| it appears to be a next-gen word-processor.
| eurleif wrote:
| To be fair, one could describe the duck as being made of air
| and vinyl polymer, which in combination are less dense than
| water. That's not how humans would normally describe it, but
| that's kind of arbitrary; consider how aerogel is often
| described as being mostly made of air.
| colonwqbang wrote:
| Is an aircraft carrier made of a material that is less dense
| than water?
| leeoniya wrote:
| only if you average it out over volume :P
| andrewmutz wrote:
| Is an aircraft carrier made of metal and air? Or just
| metal?
| bee_rider wrote:
| Where's the distinction between the air that is part of
| the boat, and the air that is not? If the air is included
| in the boat, should we all be wearing life vests?
| oh_sigh wrote:
| If I take all of the air out of a toy duck, it is still a toy
| duck. If I take all of the vinyl/rubber out of a toy duck, it
| is just the atmosphere remaining
| modeless wrote:
| The _material_ of the duck is not air. It 's not sealed. It
| would still be a duck in a vacuum and it would still float on
| a liquid the density of water too.
| PepperdineG wrote:
| >It's the single biggest problem with LLMs and Gemini isn't
| solving it.
|
| I loved it when the lawyers got busted for using a
| hallucinating LLM to write their briefs.
| glitchc wrote:
| Well this seems like a huge nitpick. If a person said that, you
| would afford them some leeway, maybe they meant the whole duck,
| which includes the hollow part in the middle.
|
| As an example, when most people say a balloon's lighter than
| air, they mean an inflated balloon with hot air or helium, but
| you catch their meaning and don't rush to correct them.
| modeless wrote:
| The model specifically said that the _material_ is less dense
| than water. If you said that the _material_ of a balloon is
| less dense than air, very few people would interpret that as
| a correct statement, and it could be misleading to people who
| don 't know better.
|
| Also, lighter-than-air balloons are intentionally filled with
| helium and sealed; rubber ducks are not sealed and contain
| air only incidentally. A balloon in a vacuum would still
| contain helium (if strong enough) but would not rise, while a
| rubber duck in a vacuum would not contain air but would still
| easily float on a liquid of similar density to water.
| eviks wrote:
| Given the misleading presentation by real humans in these
| "whole teams" that this tweet corrects, this doesn't illustrate
| any underlying powers by the model
| catchnear4321 wrote:
| language models do not lie. (this pedantic distinction being
| important, because language models.)
| lemmsjid wrote:
| I did some reading and it seems that rubber's relative density
| to water has to do with its manufacturing process. I see a
| couple of different quotes on the specific gravity of so-called
| 'natural rubber', and most claim it's lower than water.
|
| Am I missing something?
|
| I asked both Bard (Gemini at this point I think?) and GPT-4 why
| ducks float, and they both seemed accurate: they talked about
| the density of the material plus the increased buoyancy from
| air pockets and went into depth on the principles behind
| buoyancy. When pressed they went into the fact that "rubber"'s
| density varies by the process and what it was adulterated with,
| and if it was foamed.
|
| I think this was a matter of the video being a brief summary
| rather than a falsehood. But please do point out if I'm wrong
| on the rubber bit, I'm genuinely interested.
|
| I agree that hallucinations are the biggest problems with LLMs,
| I'm just seeing them get less commonplace and clumsy. Though,
| to your point, that can make them harder to detect!
| modeless wrote:
| Someone on Twitter was also skeptical that the material is
| more dense than water. I happened to have a rubber duck handy
| so I cut a sample of material and put it in water. It sinks
| to the bottom.
|
| Of course the ultimate skeptic would say one test doesn't
| prove that all rubber ducks are the same. I invite you to try
| it yourself.
|
| Yes, the models will frequently give accurate answers if you
| ask them this question. That's kind of the point. Despite
| knowing that they know the answer, you still can't trust them
| to be correct.
| bbarnett wrote:
| Devil's advocate. It is made of a material less dense than
| water. Air.
|
| It certainly isn't how I would phrase it, and I wouldn't count
| air as what something is made of, but...
|
| Soda pop is chocked full of air, it's part of it! And I'd say
| carbon dioxide is a part of the recipe, of pop.
|
| So it's a confusing world for a young LLM.
|
| (I realise it may have referenced rubber prior, but it may have
| meant air... again, Devil's advocate)
| neilv wrote:
| I missed the disclaimer. So, when watching it, I started to think
| "Wow, so Google is releasing their best stuff".
|
| But then I soon noticed some things that were too smooth, so
| seemed at best to be cherry-picked interactions occasionally
| leaning on hand-crafted situation handlers. Or, it turns out,
| faked.
|
| Regardless of disclaimers, this video seems misleading to be
| releasing right now, in the context of OpenAI eating Google's
| lunch.
|
| Everyone is expecting Google to try to show they can do better.
| This isn't that. This isn't even an mocked-up interaction future
| of HCI concept video, because it's not showing a vision of what
| people want to do --- it's only showing a demo of technical
| capabilities.
|
| It's saying "This is what a contrived tech demo (not application
| vision concept) _could_ look like, but we can 't do it yet, so we
| faked it. Hopefully, the viewer will get the message that we're
| competitive with OpenAI."
|
| (This fake demo could just be an isolated oops of a small group,
| not representative of Google's ability to rise to the current
| disruption challenge, I don't know.)
| miraculixx wrote:
| I knew immediately this was just overhyped PR when I noticed
| the author of the blogpost is Sundar.
| milofeynman wrote:
| I looked at is as if it were a good aspirational target for 5
| years from now. It was obvious the whole video was edited
| together not real time.
| Alifatisk wrote:
| The bloomberg article gives 404 for me
| dramm wrote:
| The more Google tries to over-hype stuff the more that keeps
| giving me a greater impression they are well behind OpenAI. Time
| to STFU and focus on working on stuff.
| SheinhardtWigCo wrote:
| Just how many lives does Sundar have? Where is the board?
| miraculixx wrote:
| Counting their bonusses?
| miraculixx wrote:
| rofl
|
| C'mon that was obvious. Be real.
| 1024core wrote:
| For more details about how the video was created, see this blog
| post: https://developers.googleblog.com/2023/12/how-its-made-
| gemin...
| onemoresoop wrote:
| It seems like the fake video did the trick, their stock is up
| 5.5% today.
| eh_why_not wrote:
| There was also the cringey "niiice!", "sweeeet!", "that's
| greaatt", "that's actually pretty good" responses from the
| narrator in a few of the demo videos that gave them the feel of a
| cheap 1980's TV ad.
| carabiner wrote:
| It really reminds me of the Black Mirror episode Smithereens
| with the tech CEO talking with the shooter. Tech people really
| struggle with empathy, not just 1 on 1 but with the rest of the
| outside world which is predominantly low income relatively,
| with no college education. Paraphrased, Black Mirror ep was
| like:
|
| [Tech CEO read instructions to "show empathy" from his
| assistant via Slack]
|
| CEO: I hear you. It must be very hard for you.
|
| Shooter: Of course you fucking hear me, we're on the phone!
| Talk like a normal person!
| seydor wrote:
| I thought it was implied and obvious that the video was edited.
|
| So what?
| frozenlettuce wrote:
| too little, too late. my impression is that google is not one,
| but two steps behind what MS can offer (they need a larger leap
| if they want to get ahead)
| golly_ned wrote:
| If you've seen the video, it's very apparent it's a product
| video, not a tech demo. They cut out the latencies to make a
| compelling product video.
|
| I wasn't at all under the impression they were showcasing TTS or
| low latencies as product features. I don't find the marketing
| misleading at all, and find these criticisms don't hit the mark.
|
| https://www.youtube.com/watch?v=UIZAiXYceBI
| DominikPeters wrote:
| It's not just cutting. The answers were obtained by taking
| still photos and inputting them into the model together with
| detailed text instructions explaining the context and the task
| to the model, giving some examples first and using careful
| chain-of-thought style prompting. (see e.g.
| https://developers.googleblog.com/2023/12/how-its-made-
| gemin...) My guess is that the video was fully produced _after_
| the Gemini outputs were generated by a different team, instead
| of while or before.
| retox wrote:
| AI: artificial incompetence
| jbverschoor wrote:
| Well, google has a history for faking things.. so I'm not not
| surprised. I expected that..
|
| All companies are just yelling that they're "in" the AI/LLM
| game.. If they don't, share prices will drop.
| hifreq wrote:
| The red flag for me was that they started that demo video with a
| background noise to make it seem like it's a raw video. A subtle
| manipulation for no reason, it's obviously not a raw video.
|
| The fact that they did not fact check the videos _again_ makes me
| not particularly confident in the quality of Google 's work. The
| bit where the model misinterpreted music notation (the circled
| area does not mean "piano"), and the "less dense than water"
| rubber duck are beyond the pale. The SVG demo where they generate
| a South Park looking tree looks like a parody.
| crazygringo wrote:
| Does it matter at all with regards to its AI capabilities though?
|
| The video has a disclaimer that it was edited for latency.
|
| And good speech-to-text and text-to-speech already exists, so
| building that part is trivial. There's no deception.
|
| So then it seems like somebody is pressing a button to submit
| stills from a video feed, rather than live video. It's still just
| as useful.
|
| My main question then is about the cup game, because that
| absolutely requires video. Does that mean the model takes short
| video inputs as well? I'm assuming so, and that it generates
| audio outputs for the music sections as well. If _those_ things
| are not real, _then_ I think there 's a problem here. The
| Bloomberg article doesn't mention those, though.
| beering wrote:
| Even your skeptical take doesn't fully show how faked this was.
|
| > The video has a disclaimer that it was edited for latency.
|
| There was no disclaimer that the prompts were different from
| what's shown.
|
| > And good speech-to-text and text-to-speech already exists, so
| building that part is trivial. There's no deception.
|
| Look at how many people thought it can react to voice in real-
| time - the net result is that a lot of people (maybe most?)
| were deceived. And the text prompts were actually longer and
| more specific than what was said in the video!
|
| > somebody is pressing a button to submit stills from a video
| feed, rather than live video.
|
| Somebody hand-picked images to convey exactly the right amount
| of information to Gemini.
|
| > Does that mean the model takes short video inputs as well?
| I'm assuming so
|
| It was given a hand-picked series of still images with the
| hands still on the cups so that it was easier to understand
| what cup moved where.
|
| Source for the above:
| https://developers.googleblog.com/2023/12/how-its-made-gemin...
| skilled wrote:
| fake benchmarks, fake stitched together videos, disingenuous
| charts, no developer API on launch, announcements stuffed with
| marketing fluff.
|
| As soon as I saw that opening paragraph from Sundar and how it
| was written I knew that Gemini is going to be a steaming pile of
| shit.
|
| They should have watched the GPT-4 announcement from OpenAI
| again. That demo Greg Brockman did with converting a sketch on a
| piece of paper to a CodePen from a Discord channel, with all the
| error correcting and whatnot, is how you launch a product that's
| appealing to users.
|
| TechCrunch, Twitter and some other sites (including HN i guess)
| are already piling on to this and by Monday things will go back
| to how they were and Google will have to go back to the drawing
| board to figure out another way to relaunch Gemini in the future.
| taspeotis wrote:
| Google Gemi-lie
| vjerancrnjak wrote:
| There is a possibility of dataset contamination on the
| competitive programming benchmark. A nice discussion on the page
| where AlphaCode2 was solving the problems
| https://codeforces.com/blog/entry/123035
|
| Problem showed in the video was reused in a recent competition
| (so could have been available in the dataset).
| mtrovo wrote:
| I guess a much better next step is to compare how GPT4V performs
| when asked similar prompts. Even if mostly staged this is very
| impressive to me, not much on the current tech but more on how
| much leverage Google has to win this race on the long run because
| of its hardware presence.
|
| The more these models improve the more we will want less friction
| and faster interactions, this means that in the long term having
| to open an app and ask a question is not gonna fly compared to
| just pointing your phone camera to something, asking a question
| and getting an answer that's tailored to everything Google knows
| about you in real time.
|
| Apple will most likely also roll their own in house solution for
| Siri instead of relying on an external company. This leaves
| OpenAI and the other small companies not just competing for the
| best models but also on how to put them in front of people in the
| first place and how to get access to their personal information.
| bradhe wrote:
| > Even if mostly staged this is very impressive to me, not much
| on the current tech but more on how much leverage Google has to
| win this race on the long run because of its hardware presence.
|
| I think you have too much information to form a reasonable
| opinion on the situation. Google is using editing techniques
| and specific scripting to try to demonstrate they have a
| sufficiently powerful general AI. The magnitude of this claim
| is huge, and the fact that they're faking it should be a
| likewise enormous scandal.
|
| To sum this up "well I guess they're doing better than XYZ"
| discounts the absurd context of all this.
| DonnyV wrote:
| This is so crazy. Google invented transformers which is the bases
| for all these models. How do they keep fumbling like this over
| and over. Google Docs created in 2006! Microsoft is eating their
| lunch. Google creates the ability to change VM's in place and
| makes a fully automated datacenter. Amazon and Microsoft are
| killing them in the cloud. Google has been working on self
| driving longer than anyone. Tesla is catching up and will most
| likely beat them.
|
| The amount of fumbles is monumental.
| bradhe wrote:
| Microsoft eating Google's lunch on documents is laughable at
| best. Not to mention it confuses the entire timeline of office
| productivity software??
| hot_gril wrote:
| Is paid MS Teams is more or less common than paid GSuite?
| It's hard to find stats on this. GSuite is the better product
| IMO, but MS has a stronger b2b reputation, and anecdotally I
| hear more about people using Teams.
| UrineSqueegee wrote:
| I worked at many companies in my times and all of them used
| teams except from one that used slack but all used MS
| products, none used googles.
| abustamam wrote:
| Does anyone use paid GSuite for anything other than
| docs/drive/Gmail ? In all companies I've worked at, we've
| used GSuite exclusively for those, and used slack/discord
| for chat, and zoom/discord for video/meetings.
|
| I know that MS Teams is a more full-featured product suite,
| but even at companies that used it, we still used Zoom for
| meetings.
| hot_gril wrote:
| GSuite for calendar makes sense too. Chat sucks, and Meet
| would be alright if it weren't so laggy, but those are
| two things you can easily not use.
| bbarnett wrote:
| Teams will likely still be around in 20 years. I doubt
| gsuite will exist in 5... or even 1.
| hot_gril wrote:
| GSuite has existed since 2006, so it's not like Google
| lacks focus on it.
| bbarnett wrote:
| That's ancient by google metrics!!!
| w10-1 wrote:
| Isn't it always easier to learn from others' mistakes?
|
| Google has the problem that it's typically the first to
| encounter a problem, and it has the resources to approach it
| (from search), but the incentive to monetize it (to get away
| from depending entirely on search revenue). And, management.
| rurp wrote:
| I don't know if that really excuses Google in this case
| because it's a productization problem. Google never tried to
| release a ChatGPT competitor until after OpenAI had. OpenAI
| has been wildly successful as the first mover, despite having
| to blaze some new product trails. Even after months of
| watching them and with near-infinite resources, Google is
| still struggling to catch up.
| hosh wrote:
| Outside of outliers like gmail, Google didn't get their
| success with product. The organization is set up for
| engineering to carry the day, funded by search.
|
| An AI product that makes search irrelevant is an
| existential threat, but I don't think Google has the
| product DNA to pull it off. I heard it has been taken over
| by more business / management types, but it is still
| missing product as a core pillar.
| rtsil wrote:
| Considerng the number of messaging apps they tried to launch,
| if there's at least one thing that can be concluded, it's
| that it isn't easier to learn from their own mistakes.
| hot_gril wrote:
| Engineer-driven company. Not enough top-down direction on the
| products. Too much self-perceived moral high ground. But lately
| they've been changing this.
| Slackwise wrote:
| Uhh, no, not really; quite the opposite in fact.
|
| Under Eric Schmidt they were engineer-driven, during the
| golden era of the 2000s. Nowadays they're MBA driven, which
| is why they had 4 different messaging apps from different
| product managers.
| hot_gril wrote:
| Lack of top-down direction is what allowed that situation.
| Microsoft is MBA-driven and usually has a coherent product
| lineup, including messaging.
|
| Also, "had." Google cleaned things up. They still sometimes
| do stuff just cause, but it's a lot less now. I still feel
| like Meet using laggy VP9 (vs H.264 like everyone else) is
| entirely due to engineer stubbornness.
| robertlagrant wrote:
| I would say that Microsoft's craziness around buying Kin
| and Nokia, and Windows 8, RT edition, etc etc, was far
| more fundamental product misdirection than anything
| Google has ever done.
| hot_gril wrote:
| Microsoft failed to enter the mobile space, yeah. Google
| fumbled with the Nexus stuff, even though they succeeded
| with the Android software. But bigger picture, Microsoft
| was still able to diversify their revenue sources a lot
| while Google failed to do so.
| _the_inflator wrote:
| I say it again and again: sales, sales. Money is earned in
| enterprise domains.
|
| And this business is so totally different to Google in every
| way imaginable.
|
| Senior Managers love customer support, SLAs - Google loves
| automation. Two worlds collide.
| hot_gril wrote:
| Google customer support says "Won't Fix [Skill Issue]"
| ASalazarMX wrote:
| Google Workspace works through resellers, they train less
| people, and those people give the customer support instead.
| IMO Google's bad reputation comes from their public customer
| support.
| sourcegrift wrote:
| I was at MS in 2008 September and internally they had a very
| beautiful and well functioning Office web already (named
| differently, forgot the name but it wasn't sharepoint if I
| recall correctly, I think it had to do something with expense
| reports?) that would put Google Docs to shame today. They just
| didn't want to cannibalize their own product.
| rurp wrote:
| While it is crazy, it's not too surprising. Google has become
| as notorious for product ineptitude as they have been for
| technical prowess. Dominating the fundamental research for
| GenAI but face planting on the resulting consumer products is
| right in line with the company that built Stadia, GMail/Inbox,
| and 17 different chat apps.
| ren_engineer wrote:
| >Google Docs created in 2006
|
| tech was based on an acquired company, Google just abused their
| search monopoly to make it more popular(same thing they did
| with YT). This has been the strategy for every service they've
| ever made, Google really hasn't launched a decent in-house
| product since Gmail and even that was grown using their search
| monopoly as free advertising
|
| >Google Docs originated from Writely, a web-based word
| processor created by the software company Upstartle and
| launched in August 2005
| robertlagrant wrote:
| > Google really hasn't launched a decent in-house product
| since Gmail
|
| What about Chrome? And Chromebooks?
| camflan wrote:
| mmm, WebKit?
| holoduke wrote:
| They are an ads company. Focus is never on "core" products.
| lern_too_spel wrote:
| I was with you until the Tesla hot take. I'd bet dollars to
| donuts that Tesla doesn't get to level 4 by the end of the
| decade. Waymo is already there.
| bendbro wrote:
| Space man bad.
| hot_gril wrote:
| I agree, but I also bet Waymo doesn't exist by the end of the
| decade. Not just because it's Google but because it's hard to
| profit from.
| renegade-otter wrote:
| Google doesn't know how to do anything else.
|
| A product requires commitment, it requires grind. That 10% is
| the most critical one, and Google persistently refuses to push
| products across the finish line, just giving up on them and
| adding to the infamous Google Product Graveyard.
|
| Honestly, what is the point? They could just maintain the core
| search/ads and not pay billions of dollars for tens of
| thousands of expensive engineers who have to go through a
| bullshit interview process and achieve nothing.
| davesque wrote:
| The hype really is drowning out the simple fact that basically no
| one really knows what these models are doing. Why does it matter
| so much that we include auto-correlation of embedding vectors as
| the "attention" mechanism in these models? And that we do this
| sufficiently many times across all the layers? And that we
| blindly smoosh values together with addition and call it a "skip"
| connection? Yes, you can tell me a bunch of stuff about gradients
| and residual information, but tell me why any of this stuff is or
| isn't a good model of causality.
| stormfather wrote:
| So what? Voice to text is a solved problem. And in cases where
| realtime is important, just throw more compute at it. I'm missing
| the damning gotcha moment here.
| davesque wrote:
| A big red flag for me was that Sundar was prompting the model to
| report lots of facts that can be either true or false. We all saw
| the benchmark figures that they published and the results mostly
| showed marginal improvements. In other words, the issue of
| hallucination has not been solved. But the demo seemed to imply
| that it had. My conclusion was that they had mostly cherry picked
| instances in which the model happened to report correct or
| consistent information.
|
| They oversold its capabilities, but it does still seem that
| multi-modal models are going to be a requirement for AI to
| converge on a consistent idea of what kinds of phenomena are
| truly likely to be observed across modalities. So it's a good
| step forward. Now if they can just show us convincingly that a
| given architecture is actually modeling causality.
| LesZedCB wrote:
| i think this was demonstrated in that mark rober promo video[1]
| where he asked why the paper airplane stalled by blatantly
| leading the witness.
|
| "do you believe that a pocket of hot air would lead to lower
| air pressure causing my plane to stall?"
|
| he could barely even phrase the question correctly because it
| was so awkward. just embarrassing.
|
| [1] https://www.youtube.com/watch?v=mHZSrtl4zX0&t=277s
| calf wrote:
| Ever since the "stochastic parrots" and "super-autocomplete"
| criticisms of LLMs, the question is whether hallucinations are
| solvable in principle at all. And if hallucinations are
| solvable, it would of such basic and fundamental scientific
| importance that I think would be another mini-breakthrough in
| AI.
| plaidfuji wrote:
| These LLMs do not have a concept of factual correctness and are
| not trained/optimized as such. I find it laughable that people
| expect these things to act like quiz bots - this misunderstands
| the nature of a generative LLM entirely.
|
| It simply spits out whatever output sequence it feels is most
| likely to occur after your input sequence. How it defines "most
| likely" is the subject of much research, but to optimize for
| factual correctness is a completely different endeavor. In
| certain cases (like coding problems) it can sound smart enough
| because for certain prompts, the approximate consensus of all
| available text on the internet is pretty much true and is
| unpolluted by garbage content from laypeople. It is also good
| at generating generic fluffy "content" although the value of
| this feature escapes me.
|
| In the end the quality of the information it will get back to
| you is no better than the quality of a thorough google search..
| it will just get you a more concise and well-formatted answer
| faster.
| eurekin wrote:
| The first question I always ask myself in such cases: how
| much input data has a simple "I don't know" lines? This is
| clearly a concept (not knowing sth) that has to be learned in
| order to be expressed in the output.
| bradhe wrote:
| Fucking. Shocking.
|
| Anyone with half a brain could see through this "demo." It was
| vastly too uncanny to be real, to the point that it was poorly
| setup. Google should be ashamed.
| FartyMcFarter wrote:
| Unpaywalled Bloomberg article linked in the tweet:
|
| https://archive.is/4H1fB
| mirkodrummer wrote:
| I didn't believed Google presentation off-hand because I don't
| care anymore, especially because it comes from them. I just use
| tools and adapt. Copilot helps me automating boring tasks, can't
| help much at new stuff, so I actually discovered I often do
| "interesting" work. I use gpt 3.5/4 for everything but work, it's
| been a bless, best suggestion engine for movies, books and music
| with just a prompt and without the need of tons of data about my
| watch history(looking at you youtube). In these strange times I'm
| actually learning a lot more, productivity is more or less the
| same as before llms, but annoying tasks are relieved a bit. All
| of that without the hype. Sometimes I laugh at Google, it must be
| a real shit show inside that mega corporation, but I kinda
| understand the need of a marketing editing, having a first class
| ticket on the AI train is so important for them as it seems they
| see it as an existential threat. At least it seems so since they
| decided to take the risk of lying.
| zdrummond wrote:
| This is just a tweet that makes a claim without backing, and
| links to an article that was pulled.
|
| Can we change the URL to the real article if it still exists?
| dilawar wrote:
| Bloomberg link in Xeet is 404 for me (Bangalore).
| gsuuon wrote:
| Wow - my first thought was I wonder what framerate they're
| sending video at. The whole demo seems significantly less
| impressive in that case.
| ElijahLynn wrote:
| Link to the Bloomberg article from the Tweet is 404 now.
___________________________________________________________________
(page generated 2023-12-07 23:00 UTC)