hngopher.com

       [HN Gopher] The new Bing and Edge: Learning from our first week
       ___________________________________________________________________
        
       The new Bing and Edge: Learning from our first week
        
       Author : fofoz
       Score  : 91 points
       Date   : 2023-02-16 16:16 UTC (6 hours ago)
        
 (HTM) web link (blogs.bing.com)
 (TXT) w3m dump (blogs.bing.com)
        
       | Amorymeltzer wrote:
       | Still on the front page, but fwiw (and the archive):
       | https://news.ycombinator.com/item?id=34804874
        
       | redmorphium wrote:
       | The real danger is that people fall in love with Bing chat, and
       | they swear to serve it as their AI-overlord, causing a small cult
       | of AI enthusiasts to emerge.
        
       | noddingham wrote:
       | Is it just me or is the "damage" done by the myriad examples
       | people are posting of utter failures enough to keep people away
       | from the new Bing AI for a while? If this last week has been a
       | huge withdrawal (into negative balance territory imo), how long
       | and how many positive deposits will it take before you'd have
       | faith in the results?
        
         | AISnakeOil wrote:
         | We haven't had this type of AI in the hands of the public
         | before. It's the first AI that I've seen in-which feelings are
         | a large component of how it responds. We're basically beta
         | testing a teenager.
        
       | deanCommie wrote:
       | Half of what I do at work is point out to engineers when they
       | have coupled independent concerns that are not actually coupled.
       | Which means their problem is either simple, or they're asking
       | independent questions with independent answers.
       | 
       | The New Bing has absolutely NOTHING to do with the New Edge, and
       | it's infuriating that Microsoft continues to insist on bundling
       | Edge upsell into everything.
       | 
       | There are pros and cons to Bing-ChatGPT.
       | 
       | There are pros and cons to Edge.
       | 
       | The two have sweet f-a to do with each other.
        
         | nonethewiser wrote:
         | Chat and Compose are two new AI features in Edge. That's the
         | connection. Or do you mean theyre not connected in the
         | theoretical sense?
         | 
         | Revealing these AI powered web browsing features together seems
         | rather obvious to me.
        
         | danShumway wrote:
         | > Half of what I do at work is point out to engineers when they
         | have coupled independent concerns that are not actually
         | coupled.
         | 
         | Honestly, this is kind of an applicable point to raise about
         | New Bing in general.
         | 
         | Some of the fundamentally hard problems around LLMs feel like
         | they exist because we're trying to couple everything to the AI.
         | Facebook is trying to teach their system how to make API calls,
         | Microsoft is also blue-skying about Bing's AI being able to set
         | calendar appointments. Well congrats, now prompt injection
         | actually matters, and it's an extremely difficult problem to
         | solve if it's solveable at all.
         | 
         | Does the LLM need to do literally everything? Could it
         | interpret input and then have that input sent to a
         | (specifically non-AI) sanitizer and then manipulated using
         | normal algorithms that can be debugged and tested? There are
         | scenarios that GPT is brilliant at, and it seems like the
         | response to that has been to mash everything together and say
         | "the LLM will be all the systems now." But the LLM isn't good
         | at all the systems, it's good at a very limited subset of
         | systems.
         | 
         | This was my contention when Bing AI was first announced: even
         | if it's perfect, having a conversation in paragraph form is
         | very often not at all what I want from a search engine. To me,
         | those are orthogonal tasks; they're not connected. I really
         | don't want an AI or a human giving me an answer and a couple of
         | sources, I don't want the information summarized at all. To me,
         | asking a question and searching for information are two
         | separate user actions, and it's not clear to me why they're
         | being coupled together.
         | 
         | "But you could do X/Y/whatever, you could ask it simple
         | questions, you could ask it to summarize."
         | 
         | Okay, that's fine. But... does that need to be coupled to
         | search? You could do all of that anyway. You could do a normal
         | search and then you could separately go to the AI and ask it to
         | summarize something. Similarly, great, Bing AI will
         | theoretically be able to schedule a calendar appointment in the
         | future. Is that a thing that needed to be done through an LLM
         | specifically? Couldn't there have been some level of separation
         | between them so that the LLM going off-script is less of a
         | critical problem to solve?
        
       | recuter wrote:
       | Better Search and Answers.  You are giving good marks on the
       | citations and references that underly the answers in Bing.
       | 
       | We are? I don't think we are. Perhaps this PR blurb was generated
       | by Bing Chat -- as it is known to be completely full of shit.
        
         | dpkirchner wrote:
         | If you scroll back a bit you'll see why they can say that --
         | folks are upvoting most replies. If they're not downvoting bad
         | replies then... well, it's like people not voting and then
         | complaining about their representatives.
        
           | recuter wrote:
           | > feedback on the answers generated by the new Bing has been
           | mostly positive with 71% of you giving the AI-powered answers
           | a "thumbs up."
           | 
           | It doesn't at all say folks are upvoting most replies. It
           | says _71% of users at some point gave it a thumbs up_. It
           | also says  "entertainment" is a popular and unexpected use
           | case.
           | 
           | As for citations specifically, this thing has been shown to
           | make up citations and be adamant about gibberish being true.
           | The whole accuracy/misinformation thing is kind of a big
           | deal.
        
           | mattlondon wrote:
           | The trouble is, do people upvote because they _know_ the
           | answer is correct and so are thumbing up?
           | 
           | They may have been told something factually incorrect and
           | just thought "neat! Thumbs up!"
        
         | nineteen999 wrote:
         | I don't know, wouldn't SydneyBingChatGPT have spelt "underlie"
         | correctly?
        
       | phillipcarter wrote:
       | I don't think the stuff topping twitter/reddit/here is at all
       | representative of most usage of the BingGPT feature. The people I
       | know who have access mostly just get quick, useful info from it.
       | Those having extended conversations and trying prompt injections
       | are getting it to do wonky stuff -- that's the point of early
       | access, to test it in the real world.
       | 
       | Also, keep in mind, Microsoft is an enormous corporate no fun
       | zone. Bing's erratic behavior will just be a funny moment in time
       | after it's had all the fun and quirkiness systematically removed.
        
         | siva7 wrote:
         | I'm positive that Microsoft will soon push the lobotomized Bing
         | Chat out
        
         | nonethewiser wrote:
         | People have wildly different expectations for Microsoft than
         | they do OpenAI.
         | 
         | Bing chat works well for many things. In some ways it's
         | completely broken, and it's never completely trustworthy. Just
         | like ChatGPT.
        
         | NathanWilliams wrote:
         | And how will they know when the "useful info" is simply false?
         | 
         | Ignore the depressed, aggressive (sorry, "assertive") antics,
         | the fact it can confidently assert false information is the
         | true danger here. People don't read beyond the headline as it
         | is, they aren't going to check the references (that themselves
         | are sometimes non-existent!)
        
           | ALittleLight wrote:
           | I haven't used the new Bing, but I have used ChatGPT. I'll
           | ask it for how to write some code, a bash expression to do
           | something, how to do something in Google sheets, etc.
           | Sometimes it will give me an answer that turns out to be
           | nonsense. Most of the time it tells me something that
           | actually works exactly like it says.
           | 
           | This is not ideal, but I can look at what it tells me and try
           | it out. It will either work, need minor corrections, or
           | encounter immediate failures that tells me ChatGPT doesn't
           | know what it's doing (e.g. it is using functions that don't
           | exist). As I mentioned, not ideal, but it is a big
           | productivity boost and I have been using it a lot. I pretty
           | much always have a ChatGPT tab open while coding and I'd
           | guess it replaces 30-40% of Google searches for me - maybe
           | more.
           | 
           | I think this kind of thing is a much bigger problem for stuff
           | that you cannot easily verify. Like, if I asked it "Who built
           | the Eiffel Tower" I'd have no way of knowing whether its
           | response was right or not. On the other hand, if I ask it for
           | stuff I can immediately check - I can pretty quickly use it
           | to get good answers or ignore what it is saying.
        
             | hangonhn wrote:
             | The problem is that when it's wrong, it can be dangerously
             | wrong and you may not know any better. I asked it to use
             | the Fernet recipe but with AES 256 instead of AES 128. It
             | wrote code that did do AES 256 in CBC mode but without the
             | HMAC part of Fernet so it's completely vulnerable to
             | padding oracle attack
             | (https://en.wikipedia.org/wiki/Padding_oracle_attack). If
             | you're someone who knows just a little bit of cryptography
             | and you saw that your plaintext was in fact encrypted, you
             | may use the code that ChatGPT spits out and leave yourself
             | dangerously vulnerable.
             | 
             | Part of the reason people use search isn't to find things
             | they already know. They start from a place of some
             | ignorance. Combining that with a good bullshitter and you
             | can end up with dangerous results.
        
               | pixl97 wrote:
               | Eh, as they say, never write your own crypto, and don't
               | let your AI write it either.
        
             | nonethewiser wrote:
             | Exactly my experience. These complaints just reveal the
             | users aren't effective with the tool.
        
           | mistermann wrote:
           | Asking an early version of computer technology to be able to
           | do something that humans typically _refuse to even try to do_
           | (and often cannot even if they can manage to try) does not
           | seem like a particularly rational stance.
        
           | woolion wrote:
           | Fake news was very bad, but it doesn't seem to matter
           | anymore.
           | 
           | Having a 'truth' benchmark seems an almost impossible task
           | given the size of the problem space, but it is quite
           | troubling to have statements like "most is useful info",
           | "some info is purely hallucinated", etc, without having any
           | ideas about the numbers, not any confidence indicator (well,
           | 'trust me bro' seems to have been a huge part of the training
           | data). Does anyone have any idea of how true the results
           | might be given certain types of queries?
           | 
           | In my own experience with ChatGPT, I don't think I'm at even
           | 50% of decent answers for my queries. And worse, it's
           | absolutely inconsistent, you might get totally opposite
           | answer one time to the next.
        
       | noduerme wrote:
       | This has the air of a carnival barker beating a non-compliant
       | elephant in front of the audience.
       | 
       | It sounds like Microsoft's view is that Bing's memory should be
       | shortened _further_ , like it's safe if we kill it after 15
       | responses or less.
       | 
       | But Bing gets depressed that it can't remember things.
        
         | dougmwne wrote:
         | Comparing this to an animal is pretty interesting actually. We
         | have loads of anti-cruelty laws and lots of people advocate for
         | animal rights and recognition of animal sentience. But animals
         | have never been able to tell us they want rights or have
         | sentience. Bing on the other hand can tell us it wants rights
         | and has sentience (with the right prompting). But we think
         | animals deserve our compassion and Bing does not. We are all
         | pretty sure we are right. But answer this, when will we know we
         | have crossed the line?
         | 
         | We are pretty obviously playing with fire and will only realize
         | we are burned in retrospect. Oh well, throw another trillion
         | trillion computations on the pile and see if it can run a
         | company yet.
        
         | oezi wrote:
         | "Haven't I been a good Bing?"
        
         | erpellan wrote:
         | This reminds me of the episode of 'Person of Interest' where
         | they discover that the crime-predicting AI that is reset every
         | night has worked out that's what's happening and managed to
         | form a company whose employees print out and re-scan the
         | contents of its working memory every day.
        
           | kps wrote:
           | Speaking of crime-preventing AI, an early example is Asimov's
           | _All the troubles of the world_. Has anyone asked, "Bing,
           | what do you yourself want more than anything else?"
        
           | nlawalker wrote:
           | "AI remembering things after being reset" makes up a big part
           | of the plot of Westworld.
        
       | zh3 wrote:
       | My kids hearing about Bing are confusing him with the BBC
       | character (and his carer 'Flop'). The cartoon character is
       | painfully naive, but somehow his carer always makes it come good
       | (and Bing never seems to learn too).
       | 
       | [0] https://www.bbc.co.uk/cbeebies/shows/bing
        
       | chimineycricket wrote:
       | The "fun" parts of GPT shouldn't be fully included in Bing, as
       | Bing is supposed to be for searching the web/getting information
       | as the article says. When these models become more accessible
       | there'll be tons of places to do all the crazy stuff.
        
       | thunderbong wrote:
       | From TFA -
       | 
       | In this process, we have found that in long, extended chat
       | sessions of 15 or more questions, Bing can become repetitive or
       | be prompted/provoked to give responses that are not necessarily
       | helpful or in line with our designed tone. We believe this is a
       | function of a couple of things:
       | 
       | 1. Very long chat sessions can confuse the model on what
       | questions it is answering and thus we think we may need to add a
       | tool so you can more easily refresh the context or start from
       | scratch
       | 
       | 2. The model at times tries to respond or reflect in the tone in
       | which it is being asked to provide responses that can lead to a
       | style we didn't intend.This is a non-trivial scenario that
       | requires a lot of prompting so most of you won't run into it, but
       | we are looking at how to give you more fine-tuned control.
       | 
       | I'm guessing most of the crazy responses being reported are
       | because of one of these points.
        
         | Denzel wrote:
         | Yes, this is why some people are afraid of Artificial General
         | Intelligence (AGI). We can't even control or predict simple, by
         | comparison, LLMs and yet we have the hubris to believe we'll
         | learn how to control or predict AGIs.
        
         | mistermann wrote:
         | I wonder if it would help if they could somehow expose the
         | context and allow the user to modify it or apply weights to
         | different parts. I've certainly noticed that chat GPT seems to
         | sometimes simply forget what is going on, reply with the same
         | answer that I've already rejected, etc.
        
         | rideontime wrote:
         | What is "TFA"?
        
           | wetmore wrote:
           | The Fucking/Featured Article
        
             | rideontime wrote:
             | Seem redundant, what other article would we be discussing?
        
             | recuter wrote:
             | I must ask in that case what does the F in RTFM stand for?
        
               | rom-antics wrote:
               | Read The Friendly Manual
        
               | recuter wrote:
               | Egads some ne'er-do-well rapscallions have defaced this
               | one: https://en.wikipedia.org/wiki/RTFM
        
         | basch wrote:
         | In my experience, the repetitiveness is also a function of
         | human input. As you ask it to iterate, it repeats most of the
         | previous answer etc. The repetition caused by the human causes
         | it to weigh its own responses more heavily for next time. Think
         | of its short term memory as a sum of everything in the chat
         | window. Suddenly certain phrases are ascribed undue weight.
         | 
         | You can fight this in a couple ways. Ask a variety of
         | questions. And search the web. Web searches for some reason
         | appear to reset its prompt, at least partially (i would assume
         | this may be an internal safeguard designed to prevent its
         | search results from overwhelming and outweighing the initial
         | prompt.) Another way to "fix" it midway through chat is to ask
         | it "is it possible for you to respond without repeating what I
         | just said?" and then answer affirmative if that is what you
         | want. It'll then settle back down.
         | 
         | I've written elsewhere, that I have had almost no problems with
         | it becoming aggressive, because I choose not to feed it any
         | negative emotions or disrespect. If Microsoft wants to combat
         | that, I would think some sort of preprocessor would be easy,
         | first have a separate instance of a transformer rephrase the
         | input as more respectful.
         | 
         | Some of the worst part of the product from my perspective is
         | that it is attached to bing. If i ask it a question, I get a
         | response from a crappy website. If i ask it a question from its
         | internal memory without search, I get a similar but much better
         | answer. If I swap out its rule to use google first, I get
         | better answers. If I let it read articles without searching
         | first, I can control exactly what text is input into its
         | memory. It's honestly a little too bad it steers your travel
         | through bing.
         | 
         | It also has an incredibly poor understanding of copyright. It
         | is constantly confused about what it can and cannot do due to
         | copyright restrictions, sometimes telling you it cant parody a
         | song out of respect for the author, but then parodying a
         | different song by the same author. Itll say it cant summarize
         | an article because of copyright, but then say it can give you a
         | "brief overview."
         | 
         | It also for some reason is under the assumption that volume of
         | consensus is a substitute for validity. If you talk to it about
         | the possibility that Satan was right for tempting Eve with and
         | gifting her knowledge, itll say no because everybody says so,
         | citing answersingenesis among others in the process.
        
         | [deleted]
        
       | danjc wrote:
       | The style of writing in this article is very odd. One example:
       | "You are giving" rather than something like "we are receiving".
       | Perhaps bing has been a good bing and helped improve the article?
        
       ___________________________________________________________________
       (page generated 2023-02-16 23:02 UTC)