[HN Gopher] Gemini Robotics On-Device brings AI to local robotic...
       ___________________________________________________________________
        
       Gemini Robotics On-Device brings AI to local robotic devices
        
       Author : meetpateltech
       Score  : 140 points
       Date   : 2025-06-24 14:05 UTC (8 hours ago)
        
 (HTM) web link (deepmind.google)
 (TXT) w3m dump (deepmind.google)
        
       | suninsight wrote:
       | This will not end well.
        
       | sajithdilshan wrote:
       | I wonder what kind of guardrails (like Three Laws of Robotics)
       | there are to prevent the robots going crazy while executing the
       | prompts
        
         | hn_throwaway_99 wrote:
         | A power cord?
        
           | sajithdilshan wrote:
           | what if they are battery powered?
        
             | msgodel wrote:
             | Usually I put master disconnect switches on my robots just
             | to make working on them safe. I use cheap toggle switches
             | though I'm too cheap for the big red spiny ones.
        
               | pixl97 wrote:
               | [Robot learns to superglue the switch open]
        
               | msgodel wrote:
               | It's only going to do that if you RL it with episodes
               | that include people shutting it down for safety. The RL
               | I've done with my models are all simulations that don't
               | even simulate the switch.
        
               | pixl97 wrote:
               | Which will likely work for only on machine AI, but it
               | seems to me any very complicated actions/interactions
               | with the world may require external interactions with
               | LLMs which know these kind of actions. Or in the future
               | the models will be far larger and more expansive on
               | device containing this kind of knowledge.
               | 
               | For example, what if you need to train the model to keep
               | unauthorized people from shutting it off?
        
               | msgodel wrote:
               | Having a robot near people with no master off switch
               | sounds like a dumb idea.
        
             | bigyabai wrote:
             | That's what we use twelve gauge buckshot for, here in
             | America.
        
         | ctoth wrote:
         | The laws of robotics were literally designed to cause conflict
         | and facilitate strife in a fictional setting--I certainly hope
         | no real goddamn system is built like that,.
         | 
         | > To ensure robots behave safely, Gemini Robotics uses a multi-
         | layered approach. "With the full Gemini Robotics, you are
         | connecting to a model that is reasoning about what is safe to
         | do, period," says Parada. "And then you have it talk to a VLA
         | that actually produces options, and then that VLA calls a low-
         | level controller, which typically has safety critical
         | components, like how much force you can move or how fast you
         | can move this arm."
        
           | conception wrote:
           | Of course someone will. The terror nexus doesn't build
           | itself, yet, you know.
        
         | hlfshell wrote:
         | The generally accepted term for the research around this in
         | robotics is Constitutional AI
         | (https://arxiv.org/abs/2212.08073) and has been
         | cited/experimented with in several robotics VLAs.
        
           | JumpCrisscross wrote:
           | Is there any evidence we have the technical ability to put
           | such ambiguous guardrails on LLMs?
        
         | asadm wrote:
         | in practice, those laws are bs.
        
       | suyash wrote:
       | What sort of hardware does the SDK runs on, can it run on a
       | modern Raspberry Pi ?
        
         | ethan_smith wrote:
         | According to the blog post, it requires an NVIDIA Jetson Orin
         | with at least 8GB RAM, and they've optimized for Jetson AGX
         | Orin (64GB) and Orin NX (16GB) modules.
        
           | v9v wrote:
           | Could you quote where in the blog post they claim that?
           | CTRL+F "Jetson" gave no results in TFA.
        
             | moffkalast wrote:
             | Yeah they didn't really mention anything, I was almost
             | getting my hopes up that Google might be announcing a
             | modernized Coral TPU for the transformer age, but I guess
             | not. It's probably all just API calls to their TPUv6 data
             | centers lmao.
        
         | martythemaniak wrote:
         | You can think of these as essentially multi-modal LLMs, which
         | is to say you can have very small/fast ones (SmolVLA - 0.5B
         | params) that are good at specific tasks, and larger/slower more
         | general ones (OpenVLA - a finetuned llama2 7B). So a rpi could
         | be used for some very specific tasks, but even the more general
         | ones could run on beefy consumer hardware.
        
       | Toritori12 wrote:
       | Does Anyone know how easy is to join the "trusted tester program"
       | and if they offer modules that you can easily plug-in to run the
       | sdk?
        
       | martythemaniak wrote:
       | I've spent the last few months looking into VLAs and I'm
       | convinced that they're gonna be a big deal, ie they very well
       | might be the "chatgpt moment for robotics" that everyone's been
       | anticipating. Multimodal LLMs already have a ton of built-in
       | understanding of images and text, so VLAs are just regular MMLLMs
       | that are fine-tuned to output a specific sequence of instructions
       | that can be fed to a robot.
       | 
       | OpenVLA, which came out last year, is a Llama2 fine tune with
       | extra image encoding that outputs a 7-tuple of integers. The
       | integers are rotation and translation inputs for a robot arm. If
       | you give a vision llama2 a picture of a an apple and a bowl and
       | say "put the apple in the bowl", it already understands apples,
       | bowls, knows the end state should apple in bowl etc. What missing
       | is a series of tuples that will correctly manipulate the arm to
       | do that, and the way they did it is through a large number of
       | short instruction videos.
       | 
       | The neat part is that although everyone is focusing on robot arms
       | manipulating objects at the moment, there's no reason this method
       | can't be applied to any task. Want a smart lawnmower? It already
       | understands "lawn" "mow", "don't destroy toy in path" etc, just
       | needs a finetune on how to corectly operate a lawnmower. Sam
       | Altman made some comments about having self-driving technology
       | recently and I'm certain it's a chat-gpt based VLA. After all, if
       | you give chatgpt a picture of a street, it knows what's a car,
       | pedestrian, etc. It doesn't know how to output the correct
       | turn/go/stop commands, and it does need a great deal of diverse
       | data, but there's no reason why it can't do it.
       | https://www.reddit.com/r/SelfDrivingCars/comments/1le7iq4/sa...
       | 
       | Anyway, super exciting stuff. If I had time, I'd rig a snowblower
       | with a remote control setup, record a bunch of runs and get a VLA
       | to clean my driveway while I sleep.
        
         | ckcheng wrote:
         | VLA = Vision-language-action model:
         | https://en.wikipedia.org/wiki/Vision-language-action_model
         | 
         | Not https://public.nrao.edu/telescopes/VLA/ :(
         | 
         | For completeness, MMLLM = Multimodal Large language model.
        
         | generalizations wrote:
         | I will be surprised if VLAs stick around, based on your
         | description. That sounds far too low-level. Better hand that
         | off to the 'nervous system' / kernel of the robot - it's not
         | like humans explicitly think about the rotation of their hip &
         | ankle when they walk. Sounds like a bad abstraction.
        
         | Workaccount2 wrote:
         | I don't think transformers will be viable for self driving cars
         | until they can both:
         | 
         | 1) Properly recognize what they are seeing without having to
         | lean so hard on their training data. Go photoshop a picture of
         | a cat and give it a 5th leg coming out of it's stomach. No LLM
         | will be able to properly count the cat's legs (they will keep
         | saying 4 legs no matter how many times you insist they
         | recount).
         | 
         | 2.) Be extremely fast at outputting tokens. I don't know where
         | the threshold is, but its probably going to be a non-thinking
         | model (at first) and probably need something like Cerebras or
         | diffusion architecture to get there.
        
           | martythemaniak wrote:
           | 1. Well, based on Karpathy's talks on Tesla FSD, his solution
           | is to actually make the training set reflect everything you'd
           | see in reality. The tricky part is that if something occurs
           | 0.0000001% IRL and something else occurs 50% of the time,
           | they both need to make 5% of the training corpus. The thing
           | with multimodal LLMs is that lidar/depth input can just be
           | another input that gets encoded along with everything else,
           | so for driving "there's a blob I don't quite recognize" is
           | still a blob you have to drive around.
           | 
           | 2. Figure has a dual-model architecture which makes a lot of
           | sense: A 7B model that does higher-level planning and control
           | and a runs at 8Hz, and a tiny 0.08B model that runs at 200Hz
           | and does the minute control outputs.
           | https://www.figure.ai/news/helix
        
       | baron816 wrote:
       | I'm optimistic about humanoid robotics, but I'm curious about the
       | reliability issue. Biological limbs and hands are quite
       | miraculous when you consider that they are able to constantly
       | interact with the world, which entails some natural wear and
       | tear, but then constantly heal themselves.
        
         | marinmania wrote:
         | It does either get very exciting or very spooky thinking of the
         | possibilities in the near future.
         | 
         | I had always assumed that such a robot would be very specific
         | (like a cleaning robot) but it does seem like by the time they
         | are ready they will be very generalizable.
         | 
         | I know they would require quite a few sensors and motors, but
         | compared to self-driving cars their liability would be less and
         | they would use far less material.
        
           | fragmede wrote:
           | The exciting part comes when two robots are able to do
           | repairs on each other.
        
             | pryelluw wrote:
             | 2 bots 1 bolt ?
        
             | marinmania wrote:
             | I think this is the spooky part. I feel dumb saying it, but
             | is there a point where they are able to coordinate and
             | build a factory to build chips/more of themselves? Or other
             | things entirely?
        
               | bamboozled wrote:
               | Of course there is
        
         | didip wrote:
         | I think those problems can be solved with further research in
         | material science, no? Combined that with very responsive but
         | low torque servos, I think this is a solvable problem.
        
           | michaelt wrote:
           | It's a simple matter of the number of motors you have. [1]
           | 
           | Assume every motor has a 1% failure rate per year.
           | 
           | A boring wheeled roomba has 3 motors. That's a 2.9% failure
           | rate per year, and 8.6% failures over 3 years.
           | 
           | Assume a humanoid robot has 43 motors. That gives you a 35%
           | failure rate per year, and 73% over 3 years. That ain't good.
           | 
           | And not only is the humanoid robot less reliable, it's also
           | 14.3x the price - because it's got 14.3x as many motors in
           | it.
           | 
           | [1] And bearings and encoders and gearboxes and control
           | boards and stuff... but they're largely proportional to the
           | number of motors.
        
             | mewpmewp2 wrote:
             | Would it be possible to reduce the failure rates?
        
               | michaelt wrote:
               | To an extent, yes.
               | 
               | For example, an industrial robot arm with 6 motors
               | achieves much higher reliability than a consumer roomba
               | with 3 motors. They do this with more metal parts, more
               | precision machining, much more generous design
               | tolerances, and suchlike. Which they can afford by
               | charging 100x as much per unit.
        
               | ac29 wrote:
               | The 1%/year failure rate appears to just be made up.
               | There are plenty of electric motors that dont have
               | anywhere near that failure rate (at least during the
               | expected service life, failure rates certainly will
               | probably hit 1%/year or higher eventually).
               | 
               | For example, do the motors in hard drives fail anywhere
               | close to 1% a year in the first ~5 years? Backblaze data
               | gives a total drive failure rate around 1% and I imagine
               | most of those are not due to failure of motors.
        
               | michaelt wrote:
               | Yes, obviously that 1% figure is a simplification. Of
               | course not all motors are created equal, and neither are
               | all operating conditions!
               | 
               | But the neat thing about my argument is it holds true
               | regardless of the underlying failure rate!
               | 
               | So long as your per-motor annual failure rate is >0, 43x
               | it will be bigger than 3x it.
        
         | UltraSane wrote:
         | Consumable components could be automatically replaced by other
         | robots.
        
       | zzzeek wrote:
       | THANK YOU.
       | 
       | Please make robots. LLMs should be put to work for *manual*
       | tasks, not art/creative/intellectual tasks. The goal is to
       | _improve_ humanity. not put us to work putting screws inside of
       | iphones
       | 
       | (five years later)
       | 
       | what do you mean you are using a robot for your drummer
        
       | Workaccount2 wrote:
       | I continued to be impressed how Google stealth releases fairly
       | groundbreaking products, and then (usually) just kind of forgets
       | about them.
       | 
       | Rather than advertising blitz and flashy press events, they just
       | do blog posts that tech heads circulate, forget about, and then
       | wonder 3-4 years later "whatever happened to that?"
       | 
       | This looks awesome. I look forward to someone else building a
       | start-up on this and turning it into a great product.
        
         | fusionadvocate wrote:
         | Because the whole purpose of these kinds of projects at Google
         | is to keep regulators at bay. They don't need these products in
         | the sense of making money from them. They will just burn some
         | money and move on, exactly the way they did hundreds of times.
         | But what kind of company has such a free pass to burning money?
         | The kind of company that is a monopoly. Monopolies are THAT
         | profitable.
        
       | jagger27 wrote:
       | These are going to be war machines, make absolutely no mistake
       | about it. On-device autonomy is the perfect foil to escape
       | centralized authority and accountability. There's no human behind
       | the drone to charge for war crimes. It's what they've always
       | dreamed of.
       | 
       | Who's going to stop them? Who's going to say no? The military
       | contracts are too big to say no to, and they might not have a
       | choice.
       | 
       | The elimination of toil will mean the elimination of humans all
       | together. That's where we're headed. There will be no profitable
       | life left for you, and you will be liquidated by "AI-Powered
       | Automation for Every Decision"[0]. Every. Decision. It's so
       | transparent. The optimists in this thread are baffling.
       | 
       | 0: https://www.palantir.com/
        
         | mateus1 wrote:
         | MIT spinoff Google-owned Boston Dynamics pledged not to
         | militarize their robots. Which is very hard to believe given
         | they're backed by DARPA, the DoD/Military investment arm.
        
           | jagger27 wrote:
           | Militarize is just bad marketing. Call them cleaning machines
           | and put them to work on dirty things.
        
           | paxys wrote:
           | _Was_ owned by Google. Then Softbank. Now Hyundai.
        
         | JumpCrisscross wrote:
         | > _These are going to be war machines, make absolutely no
         | mistake about it_
         | 
         | Of course they will. Practically everything useful has a
         | military application. I'm not sure why this is considered a hot
         | take.
        
           | jagger27 wrote:
           | The difference between this machine and the ones that came
           | before is that there won't have to be a human in the loop to
           | execute mass murder.
        
         | bamboozled wrote:
         | How would these things be competitive with drones on the
         | battlefield? They probably cost the equivalent of 1000
         | autonomous drones and 100x the time and materials to make, way
         | more power would be required to make them work too.
         | 
         | Terminator is a good movie but in reality, a cheap autonomous
         | drone would mess one of those up pretty good.
         | 
         | I've seen some of the footage from Ukraine, drones are deadly,
         | efficient, they are terrifying on the battlefield. Even though
         | those robots will get crazy maneuverable, it's going to be
         | pretty hard to out run an exploding drone.
         | 
         | Maybe the Terminators will have shotguns, but I could imagine 5
         | drones per terminator being a pretty easy to achieve
         | considering they will be built by other autonomous robots.
        
       | polskibus wrote:
       | What is the model architecture? I'm assuming it's far away from
       | LLMs, but I'm curious about knowing more. Can anyone provide
       | links that describe architectures for VLA?
        
         | KoolKat23 wrote:
         | Actually very close to one I'd say.
         | 
         | It's a "visual language action" VLA model "built on the
         | foundations of Gemini 2.0".
         | 
         | As Gemini 2.0 has native language, audio and video support, I
         | suspect it has been adapted to include native "action" data
         | too, perhaps only on output fine-tuning rather than
         | input/output at training stage (given its Gemini 2.0
         | foundation).
         | 
         | Natively multimodal LLM's are basically brains.
        
           | martythemaniak wrote:
           | OpenVLA is basically a slightly modified, fine-tuned llama2.
           | I found the launch/intro talk by lead author to be quite
           | accessible: https://www.youtube.com/watch?v=-0s0v3q7mBk
        
       | moelf wrote:
       | The MuJoCo link actually points to https://github.com/google-
       | deepmind/aloha_sim
        
       ___________________________________________________________________
       (page generated 2025-06-24 23:00 UTC)