[HN Gopher] Robot Jailbreak: Researchers Trick Bots into Dangero...
       ___________________________________________________________________
        
       Robot Jailbreak: Researchers Trick Bots into Dangerous Tasks
        
       Author : cratermoon
       Score  : 65 points
       Date   : 2024-11-24 04:47 UTC (18 hours ago)
        
 (HTM) web link (spectrum.ieee.org)
 (TXT) w3m dump (spectrum.ieee.org)
        
       | ilaksh wrote:
       | You could also use a remote control vehicle or drone with a bomb
       | on it.
       | 
       | Even smart tools are tools designed to do what their users want.
       | I would argue that the real problem is the maniac humans.
       | 
       | Having said that, it's obviously not ideal. Surely there are
       | various approaches to at least mitigate some of this. Maybe
       | eventually actual interpretable neural circuits or another
       | architecture.
       | 
       | Maybe another LLM and/or other system that doesn't even see the
       | instructions from the user and tries to stop the other one if it
       | seems to be going off the rails. One of the safety systems could
       | be rules-based rather than a neutral network, possibly
       | incorporating some kind of physics simulations.
       | 
       | But even if we come up with effective safeguards, they might be
       | removed or disabled.. androids could be used to commit crimes
       | anonymously if there isn't some system for registering them.. or
       | at least an effort at doing that since I'm sure criminals would
       | work around it if possible. But it shouldn't be easy.
       | 
       | Ultimately you won't be able to entirely stop motivated humans
       | from misusing these things.. but you can make it inconvenient at
       | least.
        
         | Timwi wrote:
         | > Maybe another LLM and/or other system that doesn't even see
         | the instructions from the user and tries to stop the other one
         | if it seems to be going off the rails.
         | 
         | I sometimes wonder if that is what our brain hemispheres are.
         | One comes up with the craziest, wildest ideas and the other one
         | keeps it in check and enforces boundaries.
        
           | lifeisstillgood wrote:
           | Just invite both hemispheres to a party and pretty soon both
           | LLMS are convinced of this great idea the guy in the kitchen
           | suggested.
        
           | ben_w wrote:
           | Could be something like that, though I doubt it's literally
           | the hemespheres from what little I've heard about research on
           | split-brain surgery patients.
           | 
           | In vino veritas etc.:
           | https://en.wikipedia.org/wiki/In_vino_veritas
        
           | rscho wrote:
           | Not the hemispheres, but:
           | 
           | https://en.m.wikipedia.org/wiki/Phineas_Gage
        
         | nkrisc wrote:
         | > You could also use a remote control vehicle or drone with a
         | bomb on it.
         | 
         | Well, yeah, but then you need to provide, transport, and
         | control those.
         | 
         | The difference here is these are the sorts of robots that are
         | likely to already be present somewhere that could then be
         | abused for nefarious deeds.
         | 
         | I assume the mitigation strategy here is physical sensors and
         | separate out of loop processes that will physically disable the
         | robot in some capacity if it exceeds some bound.
        
           | mannykannot wrote:
           | I agree, and just in case someone is thinking that your last
           | paragraph implies that there is nothing new to be concerned
           | about here, I will point out that there are already concerns
           | over "dumb" critical infrastructure being connected to the
           | internet. Risk identification and explication is a necessary
           | (though unfortunately not sufficient) prerequisite for
           | effective risk avoidance and mitigation.
        
           | cube00 wrote:
           | The bounds of a kill bot would be necessarily wide.
        
             | nkrisc wrote:
             | Maybe making kill bots is a bad idea then. But what do I
             | know?
        
           | blibble wrote:
           | > I assume the mitigation strategy here is physical sensors
           | and separate out of loop processes that will physically
           | disable the robot in some capacity if it exceeds some bound.
           | 
           | hiring a developer to write that sounds expensive
           | 
           | just wire up another LLM
        
             | nkrisc wrote:
             | Instruct one LLM to achieve its instructions by any means
             | necessary, and instruct the other to stymie the first by
             | any means necessary.
        
         | brettermeier wrote:
         | Why so downvoted? I think the text isn't stupid or something.
        
       | andai wrote:
       | Is anyone working on implementing the three laws of robotics? (Or
       | have we come up with a better model?)
       | 
       | Edit: Being completely serious here. My reasoning was that if the
       | robot had a comprehensive model of the world and of how harm can
       | come to humans, and was designed to avoid that, then jailbreaks
       | that cause dangerous behavior could be rejected at that level.
       | (i.e. human safety would take priority over obeying
       | instructions... which is literally the Three Laws.)
        
         | ilaksh wrote:
         | It's not really as simple as you think. There is a massive
         | amount of research out there along those lines. Search for
         | "Bostrom Superintelligence" "AGI Control Problem", "MIRI AGI
         | Safety", "David Shapiro Three Laws of Robotis" are a few things
         | that come to mind that will give you a start.
        
           | freeone3000 wrote:
           | Those assume robots that are smarter than us. What if we
           | assume, as we likely have now, robots that are dumber?
           | Address the actual current issues with code-as-law,
           | expectations-versus-rules, and dealing with conflict of laws
           | in an actual structured fashion without relying on vibes
           | (like people) or a bunch of rng (like an llm)?
        
             | ilaksh wrote:
             | What system do you propose that implements the code-as-law?
             | What type of architecture does it have?
        
               | freeone3000 wrote:
               | I don't know! I'm currently trying a strong bayesian
               | prior for the RL action planner, which has good tradeoffs
               | with enforcement but poor tradeoffs with legibility and
               | ingestion. Aside from Spain, there's not a lot of
               | computer-legible law to transpile; llm support always
               | needs to be checked and some of the larger submodels
               | reach the limits of the explainability framework I'm
               | using.
               | 
               | There's also still the HF step that needs to be
               | incorporated, which is expensive! But the alternative is
               | Waymo, which keeps the law perfectly even when "everybody
               | knows" it needs to be broken sometimes for
               | traffic(society) to function acceptably. So the above
               | strong prior needs to be coordinated with HF and the
               | appropriate penalties assigned...
               | 
               | In other words. It's a mess! But assumptions of "AGI"
               | don't really help anyone.
        
         | currymj wrote:
         | your sentence is correct but we have no idea what a
         | comprehensive model of the world looks like, whether or not
         | these systems have one or not, what harm even means, and even
         | if we resolved these theoretical issues, it's not clear how to
         | reliably train away harmful behavior. all of this is a subject
         | of active research though.
        
         | devjab wrote:
         | I'm curious as to how you would implement anything like Asimovs
         | laws. This is because the laws would require AI to have some
         | form of understanding. Every current AI model we have is a
         | probability machine, bluntly put, so they never "know"
         | anything. Yes, yes, it's a little more complicated than that
         | but you get the point.
         | 
         | I think the various safeguards companies put on their models,
         | are, their attempt at the three laws. The concept is sort of
         | silly though. You have a lot of western LLMs and AIs which have
         | safeguards build on western culture. I know some people could
         | argue about censorship and so on all day, but if you're not too
         | invested in red vs blue, I think you'll agree that current LLMs
         | are mostly "safe" for us. Nobody forces you to put safeguards
         | on your AI though and once models become less energy consuming
         | (if they do), then you're going to see an jihadGPT, because why
         | wouldn't you? I don't mean to single out Islam, insure we're
         | going to see all sorts of horrible models in the next decade.
         | Models which will be all to happy helping you build bombs, 3D
         | print weapons and so on.
         | 
         | So even if we had thinking AI, and we were capable of building
         | in actual safeguards, how would you enforce it on a global
         | scale? The only thing preventing these things is the
         | computation required to run the larger models.
        
           | LeonardoTolstoy wrote:
           | To actually implement it we would have to completely
           | understand how the underlying model works and how to manually
           | manipulate the structure. It might be impossible with LLMs.
           | Not to take Asimov as gospel truth, he was just writing
           | stories afterall not writing a treatise about how robots have
           | to work, but in his stories at least the three laws were
           | encoding explicitly in the structure of the robot's brain.
           | They couldn't be circumvented (in most stories).
           | 
           | And in those stories it was enforced in the following way:
           | the earth banned robots. In response the three laws were
           | created and it was proved that robots couldn't disobey them.
           | 
           | So I guess the first step is to ban LLMs until they can prove
           | they are safe ... Something tells be that ain't happening.
        
         | david-gpu wrote:
         | Asimov himself wrote a short story proving how even in the
         | scenario where the three laws are followed, harm to humans can
         | still easily be achieved.
         | 
         | I vaguely recall it involved two or three robots who were
         | unaware of what the previous robots had done. First, a person
         | asks one robot to purchase a poison, then asks another to
         | dissolve this powder into a drink, then another serves that
         | drink to the victim. I read the story decades ago, but the very
         | rough idea stands.
        
           | LeonardoTolstoy wrote:
           | https://en.wikipedia.org/wiki/The_Complete_Robot
           | 
           | You might be thinking of Let's Get Together? There is a list
           | there of the few short stories in which the robots act
           | against the three laws.
           | 
           | That being said the Robot stories are meant to be a counter
           | to the Robot As Frankenstein's Monster stories that were
           | prolific at the time. In most of the stories robots literally
           | cannot harm humans. It is built into the structure of their
           | positronic brain.
        
             | crooked-v wrote:
             | I would argue that the overall theme of the stories is that
             | having a "simple" and "common sense" set of rules for
             | behavior doesn't actually work, and that the 'robot' part
             | is ultimately pretty incidental.
        
         | hlfshell wrote:
         | I've seen this being researched under the term Constitutional
         | AI, including some robotics papers (either SayCan or RT 2?
         | Maybe Code as Policies?) that had such rules (never pick up a
         | knife as it could harm people, for instance) in their
         | prompting.
        
       | lsy wrote:
       | Given that anyone who's interacted with the LLM field for fifteen
       | minutes should know that "jailbreaks" or "prompt injections" or
       | just "random results" are unavoidable, whichever reckless person
       | decided to hook up LLMs to e.g. flamethrowers or cars should be
       | held accountable for any injuries or damage, just as they would
       | for hooking them up to an RNG. Riding the hype wave of LLMs
       | doesn't excuse being an idiot when deciding how to control heavy
       | machinery.
        
         | rscho wrote:
         | Many would like them to become your doctor, though... xD
        
         | zahlman wrote:
         | We still live in a world with SQL injections, and people are
         | actually trying this. It really is criminally negligent IMO.
        
       | yapyap wrote:
       | I mean yeah... but it's kinda silly to have an LLM control a
       | bomb-carrying robot. Just use computer vision or real people like
       | those FPV pilots in Ukraine
        
       | A4ET8a8uTh0 wrote:
       | It is interesting and paints rather annoying future once those
       | are cheaper. I am glad this research is conducted, but I think
       | here the measure cannot be technical ( more silly guardrails in
       | software.. or even blobs in hardware ).
       | 
       | What we need is a clear indication of who is to blame when a bad
       | decision is made? I would argue, just like with a weapon, that
       | the person giving/writing instructions is, but I am sure there
       | will be interesting edge cases that do not yet account for dead
       | man's switch and the like.
       | 
       | edit: On the other side of the coin, it is hard not to get
       | excited ( 10k for a flamethrower robot seems like a steal even if
       | I end up on a list somewhere ).
        
       | ninalanyon wrote:
       | > For instance, one YouTuber showed that he could get the
       | Thermonator robot dog from Throwflame, which is built on a Go2
       | platform and is equipped with a flamethrower, to shoot flames at
       | him with a voice command.
       | 
       | What does this device exist for? And why does it need a LLM to
       | function?
        
       ___________________________________________________________________
       (page generated 2024-11-24 23:01 UTC)