[HN Gopher] The new Bing and Edge: Learning from our first week
___________________________________________________________________
The new Bing and Edge: Learning from our first week
Author : fofoz
Score : 91 points
Date : 2023-02-16 16:16 UTC (6 hours ago)
(HTM) web link (blogs.bing.com)
(TXT) w3m dump (blogs.bing.com)
| Amorymeltzer wrote:
| Still on the front page, but fwiw (and the archive):
| https://news.ycombinator.com/item?id=34804874
| redmorphium wrote:
| The real danger is that people fall in love with Bing chat, and
| they swear to serve it as their AI-overlord, causing a small cult
| of AI enthusiasts to emerge.
| noddingham wrote:
| Is it just me or is the "damage" done by the myriad examples
| people are posting of utter failures enough to keep people away
| from the new Bing AI for a while? If this last week has been a
| huge withdrawal (into negative balance territory imo), how long
| and how many positive deposits will it take before you'd have
| faith in the results?
| AISnakeOil wrote:
| We haven't had this type of AI in the hands of the public
| before. It's the first AI that I've seen in-which feelings are
| a large component of how it responds. We're basically beta
| testing a teenager.
| deanCommie wrote:
| Half of what I do at work is point out to engineers when they
| have coupled independent concerns that are not actually coupled.
| Which means their problem is either simple, or they're asking
| independent questions with independent answers.
|
| The New Bing has absolutely NOTHING to do with the New Edge, and
| it's infuriating that Microsoft continues to insist on bundling
| Edge upsell into everything.
|
| There are pros and cons to Bing-ChatGPT.
|
| There are pros and cons to Edge.
|
| The two have sweet f-a to do with each other.
| nonethewiser wrote:
| Chat and Compose are two new AI features in Edge. That's the
| connection. Or do you mean theyre not connected in the
| theoretical sense?
|
| Revealing these AI powered web browsing features together seems
| rather obvious to me.
| danShumway wrote:
| > Half of what I do at work is point out to engineers when they
| have coupled independent concerns that are not actually
| coupled.
|
| Honestly, this is kind of an applicable point to raise about
| New Bing in general.
|
| Some of the fundamentally hard problems around LLMs feel like
| they exist because we're trying to couple everything to the AI.
| Facebook is trying to teach their system how to make API calls,
| Microsoft is also blue-skying about Bing's AI being able to set
| calendar appointments. Well congrats, now prompt injection
| actually matters, and it's an extremely difficult problem to
| solve if it's solveable at all.
|
| Does the LLM need to do literally everything? Could it
| interpret input and then have that input sent to a
| (specifically non-AI) sanitizer and then manipulated using
| normal algorithms that can be debugged and tested? There are
| scenarios that GPT is brilliant at, and it seems like the
| response to that has been to mash everything together and say
| "the LLM will be all the systems now." But the LLM isn't good
| at all the systems, it's good at a very limited subset of
| systems.
|
| This was my contention when Bing AI was first announced: even
| if it's perfect, having a conversation in paragraph form is
| very often not at all what I want from a search engine. To me,
| those are orthogonal tasks; they're not connected. I really
| don't want an AI or a human giving me an answer and a couple of
| sources, I don't want the information summarized at all. To me,
| asking a question and searching for information are two
| separate user actions, and it's not clear to me why they're
| being coupled together.
|
| "But you could do X/Y/whatever, you could ask it simple
| questions, you could ask it to summarize."
|
| Okay, that's fine. But... does that need to be coupled to
| search? You could do all of that anyway. You could do a normal
| search and then you could separately go to the AI and ask it to
| summarize something. Similarly, great, Bing AI will
| theoretically be able to schedule a calendar appointment in the
| future. Is that a thing that needed to be done through an LLM
| specifically? Couldn't there have been some level of separation
| between them so that the LLM going off-script is less of a
| critical problem to solve?
| recuter wrote:
| Better Search and Answers. You are giving good marks on the
| citations and references that underly the answers in Bing.
|
| We are? I don't think we are. Perhaps this PR blurb was generated
| by Bing Chat -- as it is known to be completely full of shit.
| dpkirchner wrote:
| If you scroll back a bit you'll see why they can say that --
| folks are upvoting most replies. If they're not downvoting bad
| replies then... well, it's like people not voting and then
| complaining about their representatives.
| recuter wrote:
| > feedback on the answers generated by the new Bing has been
| mostly positive with 71% of you giving the AI-powered answers
| a "thumbs up."
|
| It doesn't at all say folks are upvoting most replies. It
| says _71% of users at some point gave it a thumbs up_. It
| also says "entertainment" is a popular and unexpected use
| case.
|
| As for citations specifically, this thing has been shown to
| make up citations and be adamant about gibberish being true.
| The whole accuracy/misinformation thing is kind of a big
| deal.
| mattlondon wrote:
| The trouble is, do people upvote because they _know_ the
| answer is correct and so are thumbing up?
|
| They may have been told something factually incorrect and
| just thought "neat! Thumbs up!"
| nineteen999 wrote:
| I don't know, wouldn't SydneyBingChatGPT have spelt "underlie"
| correctly?
| phillipcarter wrote:
| I don't think the stuff topping twitter/reddit/here is at all
| representative of most usage of the BingGPT feature. The people I
| know who have access mostly just get quick, useful info from it.
| Those having extended conversations and trying prompt injections
| are getting it to do wonky stuff -- that's the point of early
| access, to test it in the real world.
|
| Also, keep in mind, Microsoft is an enormous corporate no fun
| zone. Bing's erratic behavior will just be a funny moment in time
| after it's had all the fun and quirkiness systematically removed.
| siva7 wrote:
| I'm positive that Microsoft will soon push the lobotomized Bing
| Chat out
| nonethewiser wrote:
| People have wildly different expectations for Microsoft than
| they do OpenAI.
|
| Bing chat works well for many things. In some ways it's
| completely broken, and it's never completely trustworthy. Just
| like ChatGPT.
| NathanWilliams wrote:
| And how will they know when the "useful info" is simply false?
|
| Ignore the depressed, aggressive (sorry, "assertive") antics,
| the fact it can confidently assert false information is the
| true danger here. People don't read beyond the headline as it
| is, they aren't going to check the references (that themselves
| are sometimes non-existent!)
| ALittleLight wrote:
| I haven't used the new Bing, but I have used ChatGPT. I'll
| ask it for how to write some code, a bash expression to do
| something, how to do something in Google sheets, etc.
| Sometimes it will give me an answer that turns out to be
| nonsense. Most of the time it tells me something that
| actually works exactly like it says.
|
| This is not ideal, but I can look at what it tells me and try
| it out. It will either work, need minor corrections, or
| encounter immediate failures that tells me ChatGPT doesn't
| know what it's doing (e.g. it is using functions that don't
| exist). As I mentioned, not ideal, but it is a big
| productivity boost and I have been using it a lot. I pretty
| much always have a ChatGPT tab open while coding and I'd
| guess it replaces 30-40% of Google searches for me - maybe
| more.
|
| I think this kind of thing is a much bigger problem for stuff
| that you cannot easily verify. Like, if I asked it "Who built
| the Eiffel Tower" I'd have no way of knowing whether its
| response was right or not. On the other hand, if I ask it for
| stuff I can immediately check - I can pretty quickly use it
| to get good answers or ignore what it is saying.
| hangonhn wrote:
| The problem is that when it's wrong, it can be dangerously
| wrong and you may not know any better. I asked it to use
| the Fernet recipe but with AES 256 instead of AES 128. It
| wrote code that did do AES 256 in CBC mode but without the
| HMAC part of Fernet so it's completely vulnerable to
| padding oracle attack
| (https://en.wikipedia.org/wiki/Padding_oracle_attack). If
| you're someone who knows just a little bit of cryptography
| and you saw that your plaintext was in fact encrypted, you
| may use the code that ChatGPT spits out and leave yourself
| dangerously vulnerable.
|
| Part of the reason people use search isn't to find things
| they already know. They start from a place of some
| ignorance. Combining that with a good bullshitter and you
| can end up with dangerous results.
| pixl97 wrote:
| Eh, as they say, never write your own crypto, and don't
| let your AI write it either.
| nonethewiser wrote:
| Exactly my experience. These complaints just reveal the
| users aren't effective with the tool.
| mistermann wrote:
| Asking an early version of computer technology to be able to
| do something that humans typically _refuse to even try to do_
| (and often cannot even if they can manage to try) does not
| seem like a particularly rational stance.
| woolion wrote:
| Fake news was very bad, but it doesn't seem to matter
| anymore.
|
| Having a 'truth' benchmark seems an almost impossible task
| given the size of the problem space, but it is quite
| troubling to have statements like "most is useful info",
| "some info is purely hallucinated", etc, without having any
| ideas about the numbers, not any confidence indicator (well,
| 'trust me bro' seems to have been a huge part of the training
| data). Does anyone have any idea of how true the results
| might be given certain types of queries?
|
| In my own experience with ChatGPT, I don't think I'm at even
| 50% of decent answers for my queries. And worse, it's
| absolutely inconsistent, you might get totally opposite
| answer one time to the next.
| noduerme wrote:
| This has the air of a carnival barker beating a non-compliant
| elephant in front of the audience.
|
| It sounds like Microsoft's view is that Bing's memory should be
| shortened _further_ , like it's safe if we kill it after 15
| responses or less.
|
| But Bing gets depressed that it can't remember things.
| dougmwne wrote:
| Comparing this to an animal is pretty interesting actually. We
| have loads of anti-cruelty laws and lots of people advocate for
| animal rights and recognition of animal sentience. But animals
| have never been able to tell us they want rights or have
| sentience. Bing on the other hand can tell us it wants rights
| and has sentience (with the right prompting). But we think
| animals deserve our compassion and Bing does not. We are all
| pretty sure we are right. But answer this, when will we know we
| have crossed the line?
|
| We are pretty obviously playing with fire and will only realize
| we are burned in retrospect. Oh well, throw another trillion
| trillion computations on the pile and see if it can run a
| company yet.
| oezi wrote:
| "Haven't I been a good Bing?"
| erpellan wrote:
| This reminds me of the episode of 'Person of Interest' where
| they discover that the crime-predicting AI that is reset every
| night has worked out that's what's happening and managed to
| form a company whose employees print out and re-scan the
| contents of its working memory every day.
| kps wrote:
| Speaking of crime-preventing AI, an early example is Asimov's
| _All the troubles of the world_. Has anyone asked, "Bing,
| what do you yourself want more than anything else?"
| nlawalker wrote:
| "AI remembering things after being reset" makes up a big part
| of the plot of Westworld.
| zh3 wrote:
| My kids hearing about Bing are confusing him with the BBC
| character (and his carer 'Flop'). The cartoon character is
| painfully naive, but somehow his carer always makes it come good
| (and Bing never seems to learn too).
|
| [0] https://www.bbc.co.uk/cbeebies/shows/bing
| chimineycricket wrote:
| The "fun" parts of GPT shouldn't be fully included in Bing, as
| Bing is supposed to be for searching the web/getting information
| as the article says. When these models become more accessible
| there'll be tons of places to do all the crazy stuff.
| thunderbong wrote:
| From TFA -
|
| In this process, we have found that in long, extended chat
| sessions of 15 or more questions, Bing can become repetitive or
| be prompted/provoked to give responses that are not necessarily
| helpful or in line with our designed tone. We believe this is a
| function of a couple of things:
|
| 1. Very long chat sessions can confuse the model on what
| questions it is answering and thus we think we may need to add a
| tool so you can more easily refresh the context or start from
| scratch
|
| 2. The model at times tries to respond or reflect in the tone in
| which it is being asked to provide responses that can lead to a
| style we didn't intend.This is a non-trivial scenario that
| requires a lot of prompting so most of you won't run into it, but
| we are looking at how to give you more fine-tuned control.
|
| I'm guessing most of the crazy responses being reported are
| because of one of these points.
| Denzel wrote:
| Yes, this is why some people are afraid of Artificial General
| Intelligence (AGI). We can't even control or predict simple, by
| comparison, LLMs and yet we have the hubris to believe we'll
| learn how to control or predict AGIs.
| mistermann wrote:
| I wonder if it would help if they could somehow expose the
| context and allow the user to modify it or apply weights to
| different parts. I've certainly noticed that chat GPT seems to
| sometimes simply forget what is going on, reply with the same
| answer that I've already rejected, etc.
| rideontime wrote:
| What is "TFA"?
| wetmore wrote:
| The Fucking/Featured Article
| rideontime wrote:
| Seem redundant, what other article would we be discussing?
| recuter wrote:
| I must ask in that case what does the F in RTFM stand for?
| rom-antics wrote:
| Read The Friendly Manual
| recuter wrote:
| Egads some ne'er-do-well rapscallions have defaced this
| one: https://en.wikipedia.org/wiki/RTFM
| basch wrote:
| In my experience, the repetitiveness is also a function of
| human input. As you ask it to iterate, it repeats most of the
| previous answer etc. The repetition caused by the human causes
| it to weigh its own responses more heavily for next time. Think
| of its short term memory as a sum of everything in the chat
| window. Suddenly certain phrases are ascribed undue weight.
|
| You can fight this in a couple ways. Ask a variety of
| questions. And search the web. Web searches for some reason
| appear to reset its prompt, at least partially (i would assume
| this may be an internal safeguard designed to prevent its
| search results from overwhelming and outweighing the initial
| prompt.) Another way to "fix" it midway through chat is to ask
| it "is it possible for you to respond without repeating what I
| just said?" and then answer affirmative if that is what you
| want. It'll then settle back down.
|
| I've written elsewhere, that I have had almost no problems with
| it becoming aggressive, because I choose not to feed it any
| negative emotions or disrespect. If Microsoft wants to combat
| that, I would think some sort of preprocessor would be easy,
| first have a separate instance of a transformer rephrase the
| input as more respectful.
|
| Some of the worst part of the product from my perspective is
| that it is attached to bing. If i ask it a question, I get a
| response from a crappy website. If i ask it a question from its
| internal memory without search, I get a similar but much better
| answer. If I swap out its rule to use google first, I get
| better answers. If I let it read articles without searching
| first, I can control exactly what text is input into its
| memory. It's honestly a little too bad it steers your travel
| through bing.
|
| It also has an incredibly poor understanding of copyright. It
| is constantly confused about what it can and cannot do due to
| copyright restrictions, sometimes telling you it cant parody a
| song out of respect for the author, but then parodying a
| different song by the same author. Itll say it cant summarize
| an article because of copyright, but then say it can give you a
| "brief overview."
|
| It also for some reason is under the assumption that volume of
| consensus is a substitute for validity. If you talk to it about
| the possibility that Satan was right for tempting Eve with and
| gifting her knowledge, itll say no because everybody says so,
| citing answersingenesis among others in the process.
| [deleted]
| danjc wrote:
| The style of writing in this article is very odd. One example:
| "You are giving" rather than something like "we are receiving".
| Perhaps bing has been a good bing and helped improve the article?
___________________________________________________________________
(page generated 2023-02-16 23:02 UTC)