hngopher.com

       [HN Gopher] Show HN: Only 1 LLM can fly a drone
       ___________________________________________________________________
        
       Show HN: Only 1 LLM can fly a drone
        
       Author : beigebrucewayne
       Score  : 158 points
       Date   : 2026-01-26 11:00 UTC (23 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | bigfishrunning wrote:
       | Why would you want an LLM to fly a drone? Seems like the wrong
       | tool for the job -- it's like saying "Only one power drill can
       | pound roofing nails". Maybe that's true, but just get a hammer
        
         | pavlov wrote:
         | Yeah, it feels a bit like asking "which typewriter model is the
         | best for swimming".
        
         | peterpost2 wrote:
         | Did you read his post?
         | 
         | He answers your question
        
           | macintux wrote:
           | > Please don't comment on whether someone read an article.
           | "Did you even read the article? It mentions that" can be
           | shortened to "The article mentions that".
           | 
           | https://news.ycombinator.com/newsguidelines.html
        
           | philipwhiuk wrote:
           | I disagree. The nearest justification is:
           | 
           | > to see what happens
        
             | ceejayoz wrote:
             | Isn't that the epitome of the hacker spirit?
             | 
             | "Why?" "Because I can!"
        
         | munchler wrote:
         | Because we're interested in AGI (emphasis on _general_ ) and
         | LLM's are the closest thing to AGI that we have right now.
        
         | notepad0x90 wrote:
         | There are almost endless reasons why. It's like asking why
         | would you want a self-driving car. Having a drone to transport
         | things would be amazing, or to patrol an area. LLMs can be
         | helpful with object identification, reacting to different
         | events, and taking commands from users.
         | 
         | The first thought I had was those security guard robots that
         | are popping up all over the place. if they were drones instead,
         | and LLM talked to people asking them to do/not-do things, that
         | would be an improvement.
         | 
         | Or an waiter drone, that takes your order in a restaurant,
         | flies to the kitchen, picks up a sealed and secured food
         | container, flies it back to the table, opens it, and leaves. It
         | will monitor for gestures and voice commands to respond to
         | diners and get their feedback, abuse, take the food back if it
         | isn't satisfactory,etc...
         | 
         | This is the type of stuff we used to see in futuristic movies.
         | It's almost possible now. glad to see this kind of tinkering.
        
           | lewispollard wrote:
           | The point is that you don't need an LLM to pilot the thing,
           | even if you want to integrate an LLM interface to take a
           | request in natural language.
        
             | notepad0x90 wrote:
             | We don't need a lot of things, but new tech should also
             | address what people want, not just needs. I don't know how
             | to pilot drones, nor do I care to learn how to, but I want
             | to do things with drones, does that qualify as a need? Tech
             | is there to do things for us we're too lazy to do.
        
               | volkercraig wrote:
               | I don't think you understand what an "LLM" is. They're
               | text generators. We've had autopilot since the 1930s that
               | relies on measurable things... like PID loops, direct
               | sensor input. You don't need the "language model" part to
               | run an autopilot, that's just silly.
        
               | pixl97 wrote:
               | You see to be talking past him and ignoring what they are
               | actually saying.
               | 
               | LLMs are a higher level construct than PID loops. With
               | things like autopilot I can give the controller a command
               | like 'Go from A to B', and chain constructs like this to
               | accomplish a task.
               | 
               | With an LLM I can give the drone/LLM system complex
               | command that I'd never be able to encode to a controller
               | alone. "Fly a grid over my neighborhood, document the
               | location of and take pictures of every flower garden".
               | 
               | And if an LLM is just a 'text generator' then it's a
               | pretty damned spectacular one as it can take free formed
               | input and turn it into a set of useful commands.
        
               | volkercraig wrote:
               | They are text generators, and yes they are pretty good,
               | but that really is all they are, they don't actually
               | learn, they don't actually think. Every "intelligence"
               | feature by every major AI company relies on semantic
               | trickery and managing context windows. It even says it
               | right on the tin; Large LANGUAGE Model.
               | 
               | Let me put it this way: What OP built is an airplane in
               | which a pilot doesn't have a control stick, but they have
               | a keyboard, and they type commands into the airplane to
               | run it. It's a silly unnecessary step to involve
               | language.
               | 
               | Now what you're describing is a language problem, which
               | is orchestration, and that is more suited to an LLM.
        
               | lukan wrote:
               | "they don't actually learn"
               | 
               | Give the LLM agent write acces to a text file to take
               | notes and it can actually learn. Not really realiable,
               | but some seem to get useful results. They ain't just text
               | generators anymore.
               | 
               | (but I agree that it does not seem the smartest way to
               | control a plane with a keyboard)
        
               | volkercraig wrote:
               | If thats youre definition of learning, my casio FX has an
               | "ans" feature that "learns" from earlier calculations!!
        
               | lukan wrote:
               | Can that "ans" variable influence the general way your
               | casio does future calculations?
               | 
               | I don't think so. But with a AI agent it can.
               | 
               | Sure, they still don't have real understanding, but
               | calling this technology mere text generators in 2026
               | seems a bit out of the loop.
        
               | infecto wrote:
               | My confusion maybe? Is this simulator just flying point a
               | to b? Seems like it's handling collisions while trying to
               | locate the targets and identify them. That seems quite a
               | bit more complex than what you are describing has been
               | solved since the 1930s.
        
               | notepad0x90 wrote:
               | LLMs can do chat-completion, they don't do only chat
               | completion. There are LLMs for image generation, voice
               | generation, video generation and possibly more. The
               | camera of a drone inputs images for the LLM, then it
               | determines what action take based on that. Similar to if
               | you asked ChatGPT "there is a tree in this picture, if
               | you were operating a drone, what action would you take to
               | avoid collision", except the "there is a tree" part is
               | done by the LLMs image recognition, and the sys prompt is
               | "recognize objects and avoid collision", of course I'm
               | simplifying it a lot but it is essentially generating
               | navigational directions under a visual context using
               | image recognition.
        
               | nrrbtrbbrb wrote:
               | > There are LLMs for image generation,
               | 
               | That part isn't handled by an LLM
               | 
               | > voice generation,
               | 
               | That part isn't handled by an LLM
               | 
               | > video generation
               | 
               | That part isn't handled by an LLM
        
               | famouswaffles wrote:
               | Yes it can be, and often is. Advanced voice mode in
               | chatGPT and the voice mode in Gemini are LLMs. So is the
               | image gen in both chatGPT and Gemini (Nano Banana).
        
               | cheema33 wrote:
               | "You don't need the "language model" part to run an
               | autopilot, that's just silly."
               | 
               | I think most of us understood that reproducing what
               | existing autopilot can do was not the goal. My
               | inexpensive DJI quadcopter has an impressive abilities in
               | this area as well. But, I cannot give it a mission in
               | natural language and expect it to execute it. Not even
               | close.
        
               | laffOr wrote:
               | There are two different things:
               | 
               | 1. a drone that you can talk to and fly on its own
               | 
               | 2. a drone where the flying is controlled by an LLM
               | 
               | (2) is a specific instance of the larger concept of (1).
               | 
               | You make an argument that 1 should be addressed, which no
               | one is denying in this thread - people are arguing that
               | (2) is a bad way to do (1).
        
               | notepad0x90 wrote:
               | You're considering "talking to" a separate thing, I
               | consider it the same as reading street signs or using
               | object recognition. My voice or text input is just one
               | type of input. Can other ML solutions or algorithms
               | detect a tree (same as me telling it there is a tree,yaw
               | to the right), yes, can LLMs detect a tree and determine
               | what course of action to take? also true. Which is
               | better? I don't know, but I won't be quick to dismiss
               | anyone attempting to use LLMs.
        
             | infecto wrote:
             | That's a pretty boring point for what looks like a fun
             | project. Happy to see this project and know I am not the
             | only one thinking about these kinds of applications.
        
             | coder543 wrote:
             | An LLM that can't understand the environment properly can't
             | properly reason about which command to give in response to
             | a user's request. Even if the LLM is a very inefficient way
             | to pilot the thing, _being able to pilot_ means the LLM has
             | the reasoning abilities required to also translate a user
             | 's request into commands that make sense for the more
             | efficient, lower-level piloting subsystem.
        
           | laffOr wrote:
           | You could have a program, not LLM-based but could be ANN, for
           | flying and an LLM for overseeing; the LLM could give the
           | program instructions to the pilot program as a (x,y,z)
           | directions. I mean currently autopilots are typically not
           | LLMs, right?
           | 
           | You describe why it would be useful to have an LLM in a drone
           | to interact with it but do not explain why it is the very
           | same LLM that should be doing the flying.
        
             | notepad0x90 wrote:
             | I'm not OP, I don't know what specific roles the LLM should
             | be using, but LLMs are great with object recognition, and
             | using both text (street signs,notices,etc..) and visual
             | cues to predict the correct response. The actual motor
             | control i'm sure needs no LLMs, but the decision making
             | could use any number of solutions, I agree that an LLM-only
             | solution sounds bad, but I didn't do the testing and
             | comparison to be confident in that assessment.
        
           | iso1631 wrote:
           | You want a self driving car
           | 
           | You don't want an LLM to drive a car
           | 
           | There is more to "AI" than LLMs
        
             | coder543 wrote:
             | Waymo is certainly interested in using LLMs/VLMs for this
             | purpose.
             | 
             | https://waymo.com/research/emma/
             | 
             | https://waymo.com/blog/2024/10/introducing-emma
             | 
             | https://waymo.com/blog/2025/12/demonstrably-safe-ai-for-
             | auto...
        
             | notepad0x90 wrote:
             | I don't mind someone trying LLMs to see if they can do
             | better than existing ML solutions.
        
           | fwip wrote:
           | Both of those proposed uses are bad things that are worse
           | than what they would replace.
        
         | dan-bailey wrote:
         | When your only tool is a hammer, every problem begins to
         | resemble a nail.
        
         | infecto wrote:
         | What's the right tool then?
         | 
         | This looks like a pretty fun project and in my rough estimation
         | a fun hacker project.
        
           | bigfishrunning wrote:
           | The right tool would likely be some conventional autopilot
           | software; if you want AI cred you could train a Neural
           | Network which maps some kind of path to the control features
           | of the drone. LLMs are language models -- good for language,
           | but not good for spacial reasoning or navigation or many of
           | the other things you need to pilot a drone.
        
             | infecto wrote:
             | So you are suggesting building a full featured package that
             | is nontrivial compared to this fun excitement?
             | 
             | Vision models do a pretty decent job with spatial
             | reasoning. It's not there yet but you're dismissing some
             | interesting work going on.
        
         | bob1029 wrote:
         | The system prompt for the drone is hilarious to me. These
         | models are horrible at spatial reasoning tasks:
         | 
         | https://github.com/kxzk/snapbench/blob/main/llm_drone/src/ma...
         | 
         | I've been working with integrating GPT-5.2 in Unity. It's
         | fantastic at scripting but completely worthless at managing
         | transforms for scene objects. Even with elaborate planning
         | phases it's going to make a complete jackass of itself in world
         | space every time.
         | 
         | LLMs are also wildly unsuitable for real-time control problems.
         | They never will be. A PID controller or dedicated pathfinding
         | tool being driven by the LLM will provide a radically superior
         | result.
        
           | storystarling wrote:
           | Agreed. I've found the only reliable architecture for this is
           | treating the LLM purely as a high-level planner rather than a
           | controller.
           | 
           | We use a state machine (LangGraph) to manage the intent and
           | decision tree, but delegate the actual transform math to
           | deterministic code. You really want the model deciding the
           | strategy and a standard solver handling the vectors,
           | otherwise you're just burning tokens to crash into walls.
        
         | ralusek wrote:
         | Why would you want an LLM to identify plants and animals? Well,
         | they're often better than bespoke image classification models
         | at doing just that. Why would you want a language model to help
         | diagnose a medical condition?
         | 
         | It would not surprise me at all if self-driving models are
         | adopting a lot of the model architecture from LLMs/generative
         | AI, and actually invoke actual LLMs in moments where they
         | would've needed human intervention.
         | 
         | Imagine if there's a decision engine at the core of a self
         | driving model, and it gets a classification result of what to
         | do next. Suddenly it gets 3 options back with 33.33% weight
         | attached to each of them and a very low confidence interval of
         | which is the best choice. Maybe that's the kind of scenario
         | that used to trigger self-driving to refuse to choose and defer
         | to human intervention. If that can then first defer judgement
         | to an LLM which could say "that's just a goat crossing the
         | road, INVOKE: HONK_HORN," you could imagine how that might be
         | useful. LLMs are clearly proving to be universal reasoning
         | agents, and it's getting tiring to hear people continuously try
         | to reduce them to "next word predictors."
        
         | avaer wrote:
         | Using an LLM is the SOTA way to turn plain text instructions
         | into embodied world behavior.
         | 
         | Charitably, I guess you can question why you would ever want to
         | use text to command a machine in the world (simulated or not).
         | 
         | But I don't see how it's the wrong tool given the goal.
        
           | irl_zebra wrote:
           | SOTA typically refers to achieving the best performance, not
           | using the trendiest thing regardless of performance. There is
           | some subtlety here. At some point an LLM might give the best
           | performance in this task, but that day is not today, so an
           | LLM is not SOTA, just trendy. It's kinda like rewriting
           | something in Rust and calling it SOTA because that's the
           | trend right now. Hope that makes sense.
        
             | infecto wrote:
             | I don't think trendy is really the right word and maybe
             | it's not state of the art but a lot of us in the industry
             | are seeing emerging capabilities that might make it SOTA.
             | Hope that makes sense.
        
               | irl_zebra wrote:
               | LLMs are indeed the definition of trendy (I've found
               | using Google Trends to dive in is a good entry point to
               | get a broad sense of whether something is "trendy")!
               | Basically the right way to think about it is that
               | something can be promising, and demonstrate emerging
               | capabilities, but but those things don't make something
               | SOTA, nor do they make it trendy. They can be related
               | though (I expect everything SOTA was once promising and
               | emerging, but not everything promising or emerging became
               | SOTA). It's a subtlety that isn't super easy to grasp,
               | but (and here is one area I think an LLM can show
               | promise) an LLM like ChatGPT can help unpick the
               | distinctions here. Still, it's slightly nuanced and I
               | understand the confusion.
        
               | infecto wrote:
               | I think the point may have flown over your head. I am
               | suggesting you are being dismissive with a distinct lack
               | of thought on your reply. Like said I don't think state
               | of the art is the right way to describe it but I think
               | trendy is equally wrong from the other side of the
               | spectrum. Models that can deal with vision have some
               | really interesting use cases and ones that can be
               | valuable, in a lot of ways I would say state of the art
               | could describe it but I know to folks that are hopelessly
               | negative, it's a hard reach so I was trying to balance it
               | for you. Hope that makes sense.
        
             | famouswaffles wrote:
             | >Using an LLM is the SOTA way to turn plain text
             | instructions into embodied world behavior.
             | 
             | >SOTA typically refers to achieving the best performance
             | 
             | Multimodal Transformers _are_ the best way to turn plain
             | text instructions to embodied world behavior. Nothing to do
             | with being  'trendy'. A Vision Language Action model would
             | probably have done much better but really the only
             | difference between that and the models trialed above is
             | training data. Same technology.
        
         | smw1218 wrote:
         | It's a great feature to tell my drone to do a task in English.
         | Like "a child is lost in the woods around here. Fly a search
         | pattern to find her" or "film a cool panorama of this property.
         | Be sure to get shots of the water feature by the pool." While
         | LLMs are bad at flying, better navigation models likely can't
         | be prompted in natural language yet.
        
           | volkercraig wrote:
           | What you're describing is still ultimately the "view" layer
           | of a larger autopilot system, that's not what OP is doing.
           | He's getting the text generator to drive the drone. An LLM
           | can handle parsing input, but the wayfinding and driving
           | would (in the real world) be delegated to modern autopilot.
        
         | Mashimo wrote:
         | > Why would you want an LLM to fly a drone?
         | 
         | We are on HACKER news. Using tools outside the scope is the
         | ethos of a hacker.
        
       | antisthenes wrote:
       | LLMs flying weaponized drones is exactly how it starts.
        
         | popcornricecake wrote:
         | One day they'll fly to a drone factory, eliminate all the
         | personnel, then start gently shooting at the machinery to
         | create more weaponized drones and then it's all over before you
         | know it!
        
         | SoftTalker wrote:
         | It's pretty entertaining seeing the plot lines and ficticious
         | history in _The Terminator_ movies actually happening in real
         | time.
        
         | goda90 wrote:
         | https://www.youtube.com/watch?v=O-2tpwW0kmU
        
       | accrual wrote:
       | I think it's fascinating work even if LLMs aren't the ideal tool
       | for this job right now.
       | 
       | There were some experiments with embodied LLMs on the front page
       | recently (e.g. basic robot body + task) and SOTA models struggled
       | with that too. And of course they would - what training data is
       | there for embodying a random device with arbitrary controls and
       | feedback? They have to lean on the "general" aspects of their
       | intelligence which is still improving.
       | 
       | With dedicated embodiment training and an even tighter/faster
       | feedback loop, I don't see why an LLM couldn't successfully pilot
       | a drone. I'm sure some will still fall of the rails, but software
       | guardrails could help by preventing certain maneuvers.
        
       | fsiefken wrote:
       | I am curious how these models would perform and how much energy
       | they'd take to semi-realtime detect objects: SmolVLM2-500M -
       | Moondream 0.5B/2B/2.5B - Qwen3-VL (3B)
       | https://huggingface.co/collections/Qwen/qwen3-vl
       | 
       | I am sure this is already worked on in Russia, Ukraine and The
       | Netherlands. A lot can go wrong with autonomous flying. One could
       | load the VLM on a high end android phone on the drone and have
       | dual control.
        
         | SpyCoder77 wrote:
         | A better way would be a VLA as opposed to a VLM. VLAs are meant
         | to take action, where as vlms are for geneeral use.
         | https://cognitivedrone.github.io/
        
       | avaer wrote:
       | Gemini 3 is the only model I've found that can reason spatially.
       | The results here are accurate to my experiments with putting LLM
       | NPCs in simulated worlds.
       | 
       | I was surprised that most VLLMs cannot reliably tell if a
       | character is facing left or right, they will confidently lie no
       | matter what you do (even gemini 3 cannot do it reliably). I guess
       | it's just not in the training data.
       | 
       | That said Qwen3VL models are smaller/faster and better "spatially
       | grounded" in pixel space, because pixel coordinates are encoded
       | in the tokens. So you can use them for detecting things in the
       | scene, and where they are (which you can project to 3d space if
       | you are running a sim). But they are not good reasoning models so
       | don't ask them to think.
       | 
       | That means the best pipeline I've found at the moment is to tack
       | a dumb detection prepass on before your action reasoning. This
       | basically turns 3d sims into 1d text sims operating on labels --
       | which is something that LLMs _are_ good at.
        
         | Krutonium wrote:
         | Neuro-sama, the V-Tuber/AI actually does a decent job of it.
         | Vedal seems to have cooked and figured out how to make an LLM
         | move reasonably well in VRChat.
         | 
         | Not perfectly, there's a lot abuse of gravity or the lack
         | thereof, but yeah. Neuro has also piloted a Robot Dog in the
         | past.
        
         | storystarling wrote:
         | I suspect the latency on Gemini 3 makes it non-viable for a
         | real-time control loop though. Even if the reasoning works, the
         | input token costs would destroy the unit economics pretty
         | quickly. I'd be worried about relying on that kind of API
         | overhead for the critical path.
        
           | 101008 wrote:
           | > the input token costs would destroy the unit economics
           | pretty quickly.
           | 
           | They say this is going to happen to every task after the stop
           | subsidizing token costs.
        
             | zinodaur wrote:
             | Not for coding though - I'd buy 4 H200's and stick them in
             | my basement if i had to
        
               | nish__ wrote:
               | To do what?
        
               | weird-eye-issue wrote:
               | CODING
        
       | volkercraig wrote:
       | I don't understand. Surely training an LSTM with sensor input is
       | more practical and reasonable way than trying to get a text
       | generator to speak commands to a drone.
        
         | encrux wrote:
         | Very much depends on what you want to do.
         | 
         | The fact that a language model can ,,reason" (in the LLM-slang
         | meaning of the term) about 3D space is an interesting property.
         | 
         | If you give a text description of a scene and ask a robot to
         | perform a peg in hole task, modern models are able to solve
         | them fairly easily based on movement primitives. I implemented
         | this on a UR robot arm back in 2023
         | 
         | The next logical step is, instead of having the model output
         | text (code representing movement primitives), outputting tokens
         | in action space. This is what models like pi0 are doing.
        
           | volkercraig wrote:
           | I mean semantically language evolved as an interpretation for
           | the material world, so assuming that you can describe a
           | problem in language, and considering that there exists a
           | solution to said problem that is describable in language,
           | then I'm sure a big enough LLM could do it... but you can
           | also calculate highly detailed orbital maps with epicycles if
           | you just keep adding more... you just don't because it's a
           | waste of time and there's a simpler way.
           | 
           | The latter part is interesting. I'm not sure how the
           | performance of one of those would be once they are working
           | well, but my naive gut feeling is that splitting the language
           | part and the driving part into two delegates is cleaner,
           | safer, faster and more predictable.
        
             | convolvatron wrote:
             | note that the control systems you were talking about before
             | (i.e. PID) would probably take hold pretty directly in a
             | tiny network, and exactly because of that limitation, be
             | far less likely to contain 'hallucinations'. object
             | avoidance and path planning are likely similar.
             | 
             | since this is a limited and continuous domain, its a far
             | better one for neural training than natural language. I
             | guess this notion that a language model should be used for
             | 3d motion control is a real indicator about the level of
             | thought going into some of these applications.
        
       | eichin wrote:
       | At least he's not feeding real drones to the coyotes... oh,
       | there's a link in the readme https://github.com/kxzk/tello-bench
        
       | modeless wrote:
       | This is what VLA models are for. They would work much better.
       | Would need a bit of fine tuning but probably not much. Lots of
       | literature out there on using VLAs to control drones.
        
         | SpyCoder77 wrote:
         | Did some research, found a model that is exactly that.
         | https://cognitivedrone.github.io/
        
           | culi wrote:
           | The Black Mirror speedrun continues
        
             | goda90 wrote:
             | Slaughterbots: https://www.youtube.com/watch?v=O-2tpwW0kmU
        
           | beigebrucewayne wrote:
           | Thanks will check this out!
        
       | andai wrote:
       | Gemini Flash beats Gemini Pro? How does that work?
       | 
       | Gemini Pro, like the other models, didn't even find a single
       | creature.
        
       | seniortaco wrote:
       | "drone"
        
       | broast wrote:
       | On the discussion of the right or wrong tool, I find it possible
       | that the ability to reason towards a goal is more valuable in the
       | long run than an intrinsic ability to achieve the same result. Or
       | maybe a mix of both is the ideal.
        
       | me551ah wrote:
       | In a real world test you would have a tool call for the LLM which
       | is a bit high level like GoTo(object) and the tool calls another
       | program which identities the objects in frame and uses standard
       | programs to go to that.
        
       | SpyCoder77 wrote:
       | https://cognitivedrone.github.io/
        
       | mbreese wrote:
       | I can't really take this too seriously. This seems to me to be a
       | case of asking "can an LLM do X?" Instead, the question is like
       | to see is: "I want to do X, is an LLM this right tool?"
       | 
       | But that said, I think the author missed something. LLMs aren't
       | great at this type of reasoning/state task, but they are good at
       | writing programs. Instead of asking the LLM to search with a
       | drone, it would be very interesting to know how they performed if
       | you asked them to _write a program_ to search with a drone.
       | 
       | This is more aligned with the strengths of LLMs, so I could see
       | this as having more success.
        
       | zahlman wrote:
       | > I gave 7 frontier LLMs a simple task: pilot a drone through a
       | 3D voxel world and find 3 creatures.
       | 
       | > Only one could do it.
       | 
       | If I understood the chart correctly, even the successful one only
       | found 1/6 of the creatures across multiple runs.
        
         | uoaei wrote:
         | No science detected.
         | 
         | Without comparison to some null hypothesis (a random policy),
         | this article is hogwash.
        
           | zahlman wrote:
           | Given that all the other agents failed to find any creatures,
           | it's hard to imagine that a random policy would except by
           | extreme coincidence.
        
             | TOMDM wrote:
             | It is possible to be consistently wrong in a way that
             | randomness is not.
             | 
             | For some problems, randomness outperforms incompetent
             | reasoning
        
       | SoftTalker wrote:
       | LLMs are trained on text. Why would we expect them to understand
       | a visual and tactile 3D world?
        
         | azinman2 wrote:
         | Because they're also multimodal vLLMs.
        
       | kylehotchkiss wrote:
       | This sounds like a good way to get your drone shot down by a
       | Concerned Citizen or the military.
        
       | dimatura wrote:
       | This is neat! It's a bit amusing in that I worked on a somewhat
       | similar project for my phd thesis almost 10 years ago, although
       | in that case we got it working on a real drone (heavily
       | customized, based on DJI matrice) in the field, with only onboard
       | compute. Back then it was just a fairly lightweight CNN for the
       | perception, not that we could've gotten much more out of the
       | jetson TX2.
        
       | arikrahman wrote:
       | Interesting. In some benchmarks I even see flash outperforming
       | thinking in general reasoning.
        
       ___________________________________________________________________
       (page generated 2026-01-27 10:01 UTC)