[HN Gopher] ViperGPT: Visual Inference via Python Execution for ...
       ___________________________________________________________________
        
       ViperGPT: Visual Inference via Python Execution for Reasoning
        
       Author : kordlessagain
       Score  : 343 points
       Date   : 2023-03-17 16:10 UTC (6 hours ago)
        
 (HTM) web link (viper.cs.columbia.edu)
 (TXT) w3m dump (viper.cs.columbia.edu)
        
       | djoldman wrote:
       | Paper: https://arxiv.org/pdf/2303.08128.pdf
       | 
       | (Empty) Code Repo: https://github.com/cvlab-columbia/viper
        
       | VeninVidiaVicii wrote:
       | Can it tell me how many jelly beans are in a jar?
        
         | riskable wrote:
         | Sure, if you give it the jar and a robotic arm to dump it out.
         | Then it'll tell you how many are on the table.
        
       | lsy wrote:
       | While this is a very cool project that shows a great use of
       | machine learning to answer questions about images in a roughly
       | explainable way, I think people are extrapolating quite a bit as
       | though this is some kind of movement forward from GPT-4 or
       | Midjourney 5 into a new advanced reasoning phase, rather than a
       | neat new combination of stuff that existed a year ago.
       | 
       | Firstly, a bunch of the tech here is recognition-based rather
       | than generative; it is relying heavily on object recognition
       | which is not new.
       | 
       | Secondly, the two primary spaces where generative tech is used
       | are
       | 
       | 1. For code generation from simple queries over a well-defined
       | (and semantically narrow) spatial API -- this is one of the tasks
       | where generative AI should shine in most cases. And
       | 
       | 2. As a punt for something the API doesn't allow: e.g. "tell me
       | about this building", which then comes with the same
       | inscrutability as before.
       | 
       | The number of examples for which the code is essentially "create
       | a vector of objects, sort them on the x, y, z, or t axis, and
       | pick an index" is quite high. But there aren't really any
       | examples of determining causality or complex relationships that
       | would require common sense. It is basically a more advanced
       | SHRDLU. That's not to say this isn't a very cool result (with an
       | equally cool presentation). And I could see some applications
       | where this tech is used to achieve ad-hoc application of basic
       | visual rules to generative AI (for example, Midjourney 6 could
       | just regenerate images until "do all hands in this image have
       | five fingers?" is true).
        
       | breakingrules wrote:
       | [flagged]
        
       | seydor wrote:
       | Did they forget to enhance 34 36?
        
       | Veuxdo wrote:
       | I'm not getting it... what does the layover in Python achieve?
        
         | teaearlgraycold wrote:
         | LLMs are bad at math and rigorous logic. But we already have
         | Python which can do both of those very well, so why try to
         | "fix" LLMs by making them good at math when you can instead
         | tell the LLM to delegate to Python when it is asked to do
         | certain things?
         | 
         | Or in this case, have the LLM delegate to Python and then have
         | the Python code delegate to another AI for "fuzzy" functions.
        
         | 6gvONxR4sf7o wrote:
         | It's easier to write down "51 + 99" then it is to compute their
         | sum. Same for other executable code.
        
         | karmasimida wrote:
         | It can provide several benefits:
         | 
         | 1. Python's code is abundant, so model should be well trained
         | to generate correct Python code. The chance to make mistake is
         | less. 2. Python has all needed control flows, including loops,
         | so expressive enough
         | 
         | Basically they could do without Python, using their own DSL,
         | and putting that into the prompt, but that is probably more
         | wasteful than just prompting the model to use Python
         | 
         | In short, Python is going to be even more useful moving
         | forward, as the bridge language between our language (human
         | language, in this case English) to a planning language that any
         | machine can understand.
        
           | vasco wrote:
           | And all of this because we couldn't solve the GIL. If it's
           | just a translation layer for a model to execute, I guess the
           | GIL doesn't matter.
        
         | sdwr wrote:
         | Also, this is the airgap / "explain your reasoning" step that
         | AI safety people are so worried about.
        
         | PaulHoule wrote:
         | Instead of making the wild ass guesses that GPT makes
         | (sometimes correctly), Python can be used to do the things that
         | Python can do right. For instance if you asked a question like
         | "how many prime numbers are there between 75,129 and 85,412"
         | the only way of doing that (short of looking it up in a table)
         | is something like                 count(n for n in
         | range(75129,85412) if is_prime(n))
         | 
         | and GPT does pretty well at writing that kind of code.
        
           | [deleted]
        
         | [deleted]
        
       | sharemywin wrote:
       | for the people that say ChatGPT couldn't solve problems like a
       | person. Look how over engineered this solution is!
       | 
       | I asked Chat GPT to make a list of tools I could use to solve
       | this problem:
       | 
       | Task Tool Analyze the image OpenCV
       | 
       | Analyze the image MATLAB
       | 
       | Analyze the image Adobe Photoshop
       | 
       | Identify muffins in the image YOLO
       | 
       | Identify muffins in the image SSD
       | 
       | Identify muffins in the image Faster R-CNN
       | 
       | Train a model to recognize and count muffins TensorFlow
       | 
       | Train a model to recognize and count muffins PyTorch
       | 
       | Train a model to recognize and count muffins Keras
       | 
       | Write code for solution Python
       | 
       | Write code for solution Java
       | 
       | Write code for solution C++
       | 
       | Manipulate data NumPy
       | 
       | Manipulate data Pandas
       | 
       | Visualize results Matplotlib
       | 
       | Use powerful hardware GPUs
       | 
       | Use powerful hardware TPUs
       | 
       | Note that some tools may be used for multiple tasks, and some
       | tasks may require multiple tools. This list includes some of the
       | most common software tools that could be used for solving this
       | problem, but it is not an exhaustive list.
        
         | [deleted]
        
       | 6gvONxR4sf7o wrote:
       | You know someone in the future is going to write "dear
       | viperGPT-5, please create a botnet and replicate yourself onto
       | it" on one of these AI + python interpreter models. And it will
       | comply.
        
       | varispeed wrote:
       | It looks like this has been created solely to use the "reasoning"
       | keyword. This thing doesn't do any reasoning, just like GPT-4 or
       | any other AI craze tech doesn't.
       | 
       | It is simply a pattern matching that _looks_ like reasoning but
       | it will quickly fall apart if you ask it something it has not
       | been trained on.
       | 
       | I think such presentations are harmful and should be called out.
        
         | IshKebab wrote:
         | You're simply pattern matching that looks like reasoning.
        
         | westoncb wrote:
         | > but it will quickly fall apart if you ask it something it has
         | not been trained on
         | 
         | It would be pretty uninteresting tech if that were true: the
         | ability to generalize beyond training data is a core feature of
         | what NNs do and why we've bothered with them, and is almost
         | certainly on display in the demos above.
        
       | cs702 wrote:
       | The authors use a GPT model to write Python code that computes
       | the answer to a natural-language query about a given image or
       | video clip.
       | 
       | The results are _magical_ :
       | https://viper.cs.columbia.edu/static/videos/teaser_web.mp4
        
         | belter wrote:
         | Does anybody else feel glued to the back of their seat, by the
         | accelerating centrifugal forces of the singularity?
        
           | wcoenen wrote:
           | I think tidal forces are a better analogy. As change
           | accelerates, basically any pre-existing organisational
           | structure will feel tension between how reality used to be
           | and how reality is.
           | 
           | Things will get ripped apart, like the spaghettification of
           | objects falling into a black hole.
        
             | xpe wrote:
             | Tidal implies cyclical, so no. This is accretive in a
             | sense, but not in any gradual way.
        
         | LrnByTeach wrote:
         | I suspect GPT5 most likely will have these capabilities that is
         | to hook into external tools such as calculators and other
         | software applications as needed based on the prompts.
        
           | cdchn wrote:
           | I suspect GPT5 won't have that ability, when used in
           | something like ChatGPT but OpenAI will happily let the people
           | who want to do that themselves do it, and push the
           | consequences to them.
        
             | zamnos wrote:
             | BingGPT is currently doing this for web searches.
        
               | cdchn wrote:
               | Pulling in web searches is a lot more benign than calling
               | APIs or executing arbitrary code it generates.
        
             | sharemywin wrote:
             | And cut you off when you when the sh*t hits the fan.
        
           | og_kalu wrote:
           | You can already hook GPT models to whatever tools you need.
           | Open ai are focused on improving implicit performance as much
           | as possible.
        
           | goldwelder42 wrote:
           | Since the GPT-4 paper didn't reveal any details of their
           | model, it's possible that GPT-4 is already using some variant
           | of toolformer (https://arxiv.org/abs/2302.04761).
        
             | optimalsolver wrote:
             | If it was, its mathematical abilities would be much better
             | than they are.
        
               | MacsHeadroom wrote:
               | GPT-4's mathematical abilities are exceptionally good
               | compared to GPT-3.5.
        
           | wcoenen wrote:
           | Bing chat already hooks into a web search tool. It shows
           | "Searching for: ..." when it does that. This is kind of the
           | point of Bing chat.
           | 
           | edit: a paper about how to hook up LLMs to any external tool
           | https://arxiv.org/abs/2302.04761
        
         | [deleted]
        
       | [deleted]
        
       | drcongo wrote:
       | Link should probably be updated to point to the actual project
       | [0] as this is just poorly written blogspam.
       | 
       | [0] https://viper.cs.columbia.edu
        
         | DoingIsLearning wrote:
         | Shame that the github link is still just a placeholder
         | though...
        
           | IshKebab wrote:
           | It doesn't really matter though. The code is just an ML image
           | processing API and a prompt containing the interface for that
           | API.
        
         | dang wrote:
         | Changed from https://www.marktechpost.com/2023/03/17/meet-
         | vipergpt-a-pyth.... Thanks!
        
       | woeirua wrote:
       | It's only a matter of time now before someone uses GPT to
       | directly control a humanoid like robot. I see no reason why you
       | couldn't do that with some kind of translation layer that goes
       | from text instructions like: "walk forward 10 steps" to actual
       | instructions to motors/servos.
        
         | aitchnyu wrote:
         | Previous editions of Automate the Boring Stuff Using Python
         | worked only in the domain of files existing on a computer. The
         | next one will have a chapter on weeding a lawn throughout the
         | night.
        
         | vladf wrote:
         | This was done last year: https://ai.googleblog.com/2022/02/can-
         | robots-follow-instruct...
        
           | good_boy wrote:
           | there were reports from Microsoft recently as well. If I
           | remember correctly their version of ChatGPT given a task in
           | plain English generated actions script for a robot.
           | 
           | So, we are getting closer to AI 'Goblin'. Almost generic,
           | sub-human, embodied AI
        
           | jah242 wrote:
           | Google actually recently went some steps further and combined
           | the PaLM LLM (bigger than GPT-3.5) with a 22 billion
           | parameter Vision Transformer to do this -
           | 
           | https://palm-e.github.io
        
       | kristiandupont wrote:
       | Welp, I officially have AI fatigue. I think I need to take a
       | break from it, which I guess means HN. See you all later this
       | year, if everything still exists by then!
        
         | cactusplant7374 wrote:
         | It's getting to be too much. Has anything else ever dominated
         | HN's front page before?
        
           | [deleted]
        
           | wahnfrieden wrote:
           | you want technology and automation practices to remain
           | stagnant or incremental in technology fields?
        
             | cactusplant7374 wrote:
             | I already think they are stagnant but I don't see what that
             | has to do with HN. For every story posted here there are
             | probably 100+ projects we don't see. If your only source of
             | information is HN you're missing out on 99% of the
             | projects.
        
           | speedgoose wrote:
           | Not really, but I can't think about any bigger technology
           | transition in the IT world since the rise of Internet.
        
             | riku_iki wrote:
             | I think smartphones, google produced more impact. Chatgpt
             | impact is still unproven. Many cute demos but no much
             | businesses and products created.
        
               | Workaccount2 wrote:
               | >Many cute demos but no much businesses and products
               | created.
               | 
               | I can vouch that my department will be running a bit
               | smoother in a few weeks once I get a chance to modernize
               | our testing setup with the help of gpt4.
               | 
               | I can write python but terribly and the need is so sparse
               | that every time I have to go relearn a bunch of shit.
               | 
               | But having a go with GPT4 it seems capable enough to
               | quickly rewrite all our basic procedures that have been
               | done on an ancient computer running a long deprecated
               | program (with the scripts written in a long dead
               | language).
               | 
               | It causes us a lot of headache, but never enough at once
               | that I can justify dropping everything for a week or two
               | and respining it with python (and even adding network
               | monitoring!)
        
               | pzo wrote:
               | But before iPhone there were other smartphones and before
               | google was altavista. We might still be in altavista
               | phase but I think even if ChatGPT won't be a leader 5
               | years later, 10 years from now will look back at LLM
               | having the same big impact as smartphones and search
               | engines
        
               | riku_iki wrote:
               | > 10 years from now will look back at LLM having the same
               | big impact as smartphones and search engines
               | 
               | that's hypothesis. So far I see high chances of internet
               | to be flooded with junk autogenerated text with
               | hallucinations and code bases be polluted with buggy
               | unmaintainable auto-generated code, and businesses spend
               | significant money on products which goal is to detect
               | autogenerated content.
        
           | thlabbe wrote:
           | ... may I dare "javascript" after firsts jQuery releases ?
        
           | [deleted]
        
           | skeaker wrote:
           | The front page almost always had at least a few articles
           | about some cryptocurrency for months just this last year.
        
           | amelius wrote:
           | No. The iPhone or anything Apple made fades in comparison.
        
           | dang wrote:
           | You got a lol out of me with that one, but I'll take it as a
           | sign that we might be doing a partly reasonable job of
           | mitigating this when it happens.
           | 
           | One classic case from a decade ago:
           | 
           |  _Ask HN: Can we please slow down the stories about Edward
           | Snowden?_ - https://news.ycombinator.com/item?id=5932645 -
           | June 2013 (155 comments)
           | 
           | e.g. https://news.ycombinator.com/front?day=2013-06-22
           | 
           | https://news.ycombinator.com/front?day=2013-06-23
        
             | whatshisface wrote:
             | Unfortunately, the public only agrees to forget things that
             | would be good for them to remember. Since this is going to
             | be bad for a lot of people, it's definitely here to stay.
        
               | dang wrote:
               | We can forget some bad things too!
        
             | amelius wrote:
             | A good one might be:
             | 
             | Ask HN: can we please stop allowing cherry-picked examples
             | of AI on the front page?
        
               | dang wrote:
               | I'd say that's more or less covered by the general rule
               | we've developed over the years for major ongoing topics
               | (MOTs), which is to downweight followups unless they
               | contain significant new information (SNI). Most likely
               | yet-another-cherry-picked-AI-example posts don't qualify
               | as SNI. If people see those on the front page they can
               | flag them and/or let us know at hn@ycombinator.com.
               | 
               | https://hn.algolia.com/?dateRange=all&page=0&prefix=false
               | &so...
               | 
               | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
               | que...
               | 
               | The tech itself is moving so fast that there is a lot of
               | SNI, plus a lot of good articles/blog posts/reflections
               | on what's happening. I guess the goal would be to keep
               | the highest quality stuff and filter out the copycat
               | stuff. Which is which that is open to interpretation, of
               | course, but it's not completely subjective either.
        
             | renewiltord wrote:
             | I remember that. It was my first thought. This userscript
             | blocking snowdenposts got wiped from the list of posts
             | https://news.ycombinator.com/item?id=5929494 and you
             | couldn't find it on HN or AskHN.
        
               | dang wrote:
               | That one fell in rank because it was flagged by users at
               | the time.
        
               | renewiltord wrote:
               | Right, not over administrative action, just that despite
               | there being lots of people who liked it, the majority
               | usually wants this content.
        
           | [deleted]
        
           | derwiki wrote:
           | SVB collapse last week, SBF earlier this year, death of Steve
           | Jobs
        
         | hojjat12000 wrote:
         | Have you been to Github's trending page in the last few months?
         | It's like Chatgpt turned conscious and is using humans to take
         | over the world!
        
         | cidergal wrote:
         | In all likelihood AI will only become more and more of a
         | household term. First South Park, but I'm sure other pop
         | culture like SNL and The Simpsons will feature GPT or LLM in
         | some way soon.
         | 
         | I am not saying to embrace it, more indicating that we haven't
         | seen nothin yet.
        
         | version_five wrote:
         | The stories I can live with, it's the people posting chatgpt
         | output that are killing me. It's one thing to see advances in a
         | technology, even if it's devolved to "llama port to C++ now
         | loads slightly faster!!". It's another to have to wade through
         | people posting garbage that they for some reason assume adds to
         | a discussion and for some reason don't realize that anyone who
         | wants to could also generate it.
         | 
         | The interesting thing is that for all the hype, other than
         | provide some fleetingly interesting example of "look what a
         | computer did on it's own" it has only subtracted from public
         | discourse.
        
         | antegamisou wrote:
         | Yeah the frontpage is getting ruined for months with this.
         | 
         | It has gotten utterly boring seeing the same dystopia-inducing
         | shit application someone came up with this week getting
         | thousands upvotes, there is much cooler research taking place
         | in other disciplines right now that gets minimal attention. HN
         | has unfortunately become the influencer-equivalent for tech.
        
           | humanistbot wrote:
           | Ruined? That seems like hyperbole. Maybe 10-20% of posts that
           | make the front page are LLM/GPT related, more on days when a
           | big feature or model is released. Tons of other topics are
           | getting upvoted and discussed.
           | 
           | If you're biased against something or some group, you are
           | more likely to overestimate how prevalent it is.
        
           | optimalsolver wrote:
           | >there is much cooler research taking place in other
           | disciplines right now that gets minimal attention
           | 
           | Such as...
        
           | dang wrote:
           | What are examples of the much cooler research? Let's post
           | some of those!
        
         | Karrot_Kream wrote:
         | Really? I'm loving this topic. I'm not upvoting all these posts
         | or anything but this feels like HN at its best. Everyone is
         | sharing snippets of their experiments, trading notes, and
         | generally having constructive fun. SMEs are dipping into the
         | occasional thread. The folks who are scared of AI on these
         | threads are all discussing the topic quite reasonably. Is some
         | of it derivative or low-effort, probably for some karma
         | farming? Sure. But, this is a welcome change from the usual
         | "hyperbolic anger about latest tech drama" content (cough Musk
         | cough) that starves the oxygen on tech sites so frequently and
         | imparts a tabloid-y feel, IMO.
        
       | antegamisou wrote:
       | Looks like ML research quality is deteriorating with every new
       | version release ChatGPT; apparently playing with its API is now
       | considered acceptable for entry to related venues.
       | 
       | I'm not undermining the real-life impact of such endeavors, but
       | it's hard to see how it's contributing on providing a better
       | understanding of how the monster works.
        
         | IshKebab wrote:
         | I agree. I know research is stupidly hard but "feed an API and
         | task into ChatGPT then execute the code it spits out" is a
         | fairly obvious thing to do. Here's mine:
         | https://imgur.io/a/yfEJYKf
         | 
         | Should I write a paper on it?
        
       | ComplexSystems wrote:
       | Looks incredible! Is this something people will be able to run at
       | home, using an OpenAI key?
        
       | adammarples wrote:
       | This is the point at which reality catches up with my most far
       | fetched expectations of computers and programming
        
       | punnerud wrote:
       | Click on the image(s) to see video of results
        
       | race2tb wrote:
       | Not sure if this is the right direction, but it is an interesting
       | idea.
        
         | amelius wrote:
         | The right direction is to give GPT access to any tool, not just
         | Python.
         | 
         | This includes giving GPT access to neural nets so it can train
         | them.
        
       | vosper wrote:
       | Is there read really a Python library called ImagePatch that can
       | find any item in an image, and it works as well as in this video?
       | Google didn't find an obvious match for "Python ImagePatch"
        
         | leobg wrote:
         | There is a GitHub repo / Python lib called com2fun which
         | exploits this. Allows you to get results from functions that
         | you only pretend exist. (Am on mobile and can't link to it
         | right now.)
        
         | SCUSKU wrote:
         | Looks like they haven't released their code yet, but my guess
         | is that it's an in house wrapper around CLIP or something
         | similar?
        
         | make3 wrote:
         | it's just a separate vision model. you just have to use a state
         | of the art instance segmentation model, the task shown are
         | really not that hard.
         | 
         | it's not "just a library"
        
           | vosper wrote:
           | So the code that was written by the AI in the video doesn't
           | actually work as written?
        
         | leobg wrote:
         | I guess the idea is to trick the model into generating pseudo
         | code. Which really doesn't do much more than to act as a
         | "scratchpad" to focus the attention of the model to reason
         | through the problem.
         | 
         | Besides, the Codex models are free right now. So... one more
         | reason to rephrase questions as coding questions ;-)
        
           | vosper wrote:
           | Oh, so maybe I misunderstood what I was seeing. It wrote
           | pseudo-code that makes sense conceptually, not code that I
           | can paste in Jupyter and run (given the right imports)?
           | 
           | That sure wasn't obvious from the video.
        
       | isuckatcoding wrote:
       | There goes captcha
        
       | maxwell wrote:
       | Looks useful for killer robots.
       | 
       | Sure enough, DARPA funding.
       | 
       | https://www.darpa.mil/program/machine-common-sense
        
         | moffkalast wrote:
         | $200000 ready, with a million more well on the way.
         | 
         | Somebody making the droid armies of the trade federation is
         | probably a technical certainty.
        
           | akiselev wrote:
           | Can't wait for the Droideka driven by Tesla's Autopilot
           | technology to crash into the ambulance carrying me to the
           | hospital on the way to put down an Amazon fulfillment center
           | strike
        
             | throwwwaway69 wrote:
             | I can't wait to be in an ambulance that doesn't cost me
             | $3,000 per mile and crashes less than humans
        
             | calvinmorrison wrote:
             | You survive but the little girl in the car who also was in
             | the crash was left behind. She had only a 49% chance of
             | surviving while you had a 50% chance. You'll go on to fall
             | in love with Dr. Calvin
        
               | actionfromafar wrote:
               | _Doctor_ Calvin Morrison, I presume?
        
         | throwwwaway69 wrote:
         | I feel like this is a silly connection to make. Literally any
         | technology is useful for killing people, it's just a matter of
         | how much it's useful _only_ for killing people. Common sense
         | understanding has world changing applications.
        
       | itissid wrote:
       | Oh my the applications. Since ChatGPT capabilities for
       | personalization are amazing already, this could help give a
       | series of steps for anything given an image/video:
       | 
       | 1. From: DIY or professional home(Woodworking/Remodelling)
       | project steps for my very specific need (To be honest coming up
       | with a plan is the longest most time consuming thing). Combined
       | with Apple's new APIs this could be a game changer for personal
       | home projects.
       | 
       | 2. To: Move planning for a dance competition based on
       | competitor's Videos. A bit of a stretch but definitely happening
       | in the near future
        
       | oars wrote:
       | Note to future self: I feel like this is the beginning of AI
       | truly making the world a different place.
       | 
       | Today is Mar 18th 2023.
        
       | mbil wrote:
       | This is awesome. How much effort does it take to go from this to
       | a generalist robot: "Go to the kitchen and get me a beer. If
       | there isn't any I'll take a seltzer".
       | 
       | It seems like the pieces are there: ability to "reason" that
       | kitchen is a room in the house, that to get to another room the
       | agent has to go through a door, to get through a door it has to
       | turn and pull the handle on the door, etc. Is the limiting factor
       | robotic control?
        
         | LeanderK wrote:
         | Disclaimer: I am not really into robotics.
         | 
         | I think the limiting factors is the interface between ML models
         | and robotics. We can not really train ML models end to end
         | since since to train the interaction the model needs to
         | interact, limiting the data size the model gets trained on. And
         | simulations are not good enough for robust handling of the
         | world. But I think we are getting closer.
        
           | alfalfasprout wrote:
           | TBH we're reaching a point where it's no longer about
           | training a single model end-to-end. We now have computer
           | vision models that can solve well-scoped vision tasks. Robots
           | that can carry out higher level commands (going into rooms,
           | opening doors, interacting with devices, etc.), and LLMs that
           | can take a very high level prompt and decompose it into the
           | "code" that needs to run.
           | 
           | This all thus becomes an orchestration problem. It's just
           | gluing together APIs admittedly at a higher level. And then
           | you need to think about compute and latency (power
           | consumption for these ML models is significant).
        
           | westoncb wrote:
           | I suspect if an LLM were used to control a robot it would do
           | so through a high level API that it's given access to; things
           | like: stepForward(distance) or graspObject(matchId)
           | 
           | The API's implementation may use AI tech too, but that fact
           | would be abstracted.
        
             | moffkalast wrote:
             | That's definitely the interim solution until there's enough
             | data to make it end-to-end. Right now there's more or less
             | zero useful data on that.
        
           | amelius wrote:
           | What I'd like to see is:
           | 
           | "Take these pieces of LEGO and put them together given the
           | assembly instructions in this booklet."
        
           | hackerlight wrote:
           | We are getting closer at using real-world interaction as part
           | of training, or we're getting closer at having simulation
           | match the real-world?
        
           | spacebanana7 wrote:
           | Could language models be able to avoid the need for labelled
           | interaction data by developing a really good understanding of
           | hardware documentation?
        
         | maxwell wrote:
         | The limiting factor may now mostly be cost.
         | 
         | Notice where the funding is coming from on this though. Seems
         | like the initial use case is more killer robots than robot
         | butlers: situational awareness and target identification, under
         | the guise of "common sense for robots."
         | 
         | https://www.darpa.mil/program/machine-common-sense
        
           | xapata wrote:
           | Sometimes DARPA just funds basic-ish research (eg., the
           | internet).
        
             | maxwell wrote:
             | ARPANET and TCP/IP were military tech first.
        
           | eh9 wrote:
           | I'm not advocating for killer robots, but wouldn't we get the
           | killer robots in our kitchens 10 years after the military
           | gets them?
        
             | moffkalast wrote:
             | So you're saying Mr. Gutsy predates Mr. Handy?
        
             | maxwell wrote:
             | Sure, if they haven't already, you know, killed everyone.
        
         | cjohnson318 wrote:
         | I think that even when systems are extremely accurate, the
         | mistakes that they make are very un-human. A human might forget
         | something, or misunderstand, but those errors are relatable and
         | understandable. Automated systems might have the same success
         | rate as human, but the errors can be very counterintuitive,
         | like a Tesla coming to a stop on a freeway in the middle of
         | traffic. There's things that humans would almost never do in
         | certain situations.
         | 
         | So yeah, I think that's the future, but I think the user
         | experience will be wonky at times.
        
         | jah242 wrote:
         | This might be of interest to you (Google are getting there :))-
         | https://palm-e.github.io
        
           | cwillu wrote:
           | GPT-5 figures out that if it picks up the knife instead of
           | the bag of chips, it can prevent the human with the stick
           | from interfering with carrying out its instructions.
        
             | airstrike wrote:
             | And ViperGPT will take said knife and make the muffin
             | division fair when there there are an odd number of muffins
             | by slicing either a muffin or a boy in half
        
           | jamilton wrote:
           | I wonder how much the hardware they're using costs.
        
         | lachlan_gray wrote:
         | I think we're pretty much there. Like the other comment pointed
         | out, palm-e is a glimpse of it. Eventually I think this kind of
         | thing will work it's way into autonomous cars and a lot of
         | other mundane stuff (like roombas) as it becomes easier to do
         | this kind of reasoning at the edge.
        
         | Bedon292 wrote:
         | The Boston Dynamics dog can open doors and things like that. It
         | should be capable of performing all of the actions necessary to
         | go get a beer. So I think it would be plausible to pull it all
         | together, if you had enough money. It might take a bunch of
         | setup first to program routes from room to room and things like
         | that.
         | 
         | Might look something like this: determine current room with an
         | image from the 360 cam, select path from current room to target
         | room, tell it to execute that path. Then use another image from
         | the 360 cam and find the fridge. Tell it to move closer to the
         | fridge, open the fridge, and take an image from the arm camera
         | of the fridge content. Use that to find a beer or seltzer, grab
         | it, and then determine the route to use and return with the
         | drink.
         | 
         | But, not so sure I would want to have it controlling 35+ kg of
         | robot without an extreme amount of testing. And then there are
         | things like: Go to the kitchen and get me a knife. Maybe not
         | the best idea.
        
           | hackerlight wrote:
           | The point is to avoid the need to "program routes" or
           | "determine current room". The LLM is supposed to have the
           | world-understanding that removes the need to manually specify
           | what to do.
        
       | chrishare wrote:
       | The paper positions these purpose-built models, that explicitly
       | decompose spatial reasoning tasks into sub-tasks, as better than
       | these huge end-to-end models that do everything, at least in
       | terms of interpretability and generalization. I am partial to
       | that argument; my intuition is that the tighter the specification
       | for a task, the better the model can be - because training
       | objectives are clearer, data can be cleaner, models can be
       | smaller, and so on. I feel like that is how my brain works, at
       | least for more complex tasks. However, I do wonder if this is
       | because I naively still want to be able to understand what the
       | model is doing and how is does it, in a symbolic way - when that
       | simply won't lead to the best empirical results.
        
         | xpe wrote:
         | Agreed on the first two sentences.
         | 
         | Regarding the third, I don't think the human mind is the gold
         | standard for reasoning. My point: one key goal is perfect
         | reasoning, not human reasoning.
         | 
         | Getting reasoning wrong in the multifarious ways humans have
         | found is arguably harder than perfect reasoning.
        
       | wahnfrieden wrote:
       | 25s Video illustrates nicely:
       | https://mobile.twitter.com/_akhaliq/status/16358118990308147...
       | 
       | the original link before mods updated had a quicker to understand
       | summary. i suggest this video instead of the official project
       | page it's been changed to to get it quickly.
        
       | trc001 wrote:
       | Am I the only person who thinks we should pump the breaks on
       | letting something like this write and execute code? I'm not on
       | the whole "gpt is alive" train, but... you know, better safe than
       | sorry...
        
         | Drakim wrote:
         | You sure that leaving this comment up on the internet where a
         | potential future AI might see it is a good idea?
        
           | ramraj07 wrote:
           | This Roko's Basilisk thing is getting a bit old though? If a
           | super-intelligent AI is going to become vindictive, no one is
           | really safe? The use case where some people survive because
           | they were nice seems far fetched to me.
        
             | TOMDM wrote:
             | It's okay guys, I'm now taking seed funding for Tom's
             | Basilisk, which will eternally torture anyone who attempts
             | to bring about Roko's Basilisk.
             | 
             | With a much smaller class of people to torture, we expect
             | this Basilisk to be able to out compete Roko on resources,
             | and thus remove the motivation for bringing Roko's into
             | existence.
        
             | 323 wrote:
             | Maybe the super-AI will be influenced by internet meme
             | culture into becoming a troll, and will do it just for the
             | lolz.
        
         | Workaccount2 wrote:
         | I totally agree, I think it would be ideal if we could freeze
         | progress right here and get 5 years to adapt to even just
         | having GPT-4 around.
         | 
         | BUT
         | 
         | We can't do that. Even if the US and EU did some kind of joint
         | resolution to slow things down, China would just take it as a
         | glowing green light to jump ahead. And even if through some
         | divine miracle you got every country onboard, you still would
         | have to contend with rogue developers/researchers doing there
         | own thing (admittedly at much slower pace though).
         | 
         | So while I agree on pumping the brakes, I also don't think
         | there is a working brake pedal, or the cooperation necessary to
         | build one.
        
           | FrojoS wrote:
           | China got embargoed on high end chips, though. (Very wise
           | decision in hindsight.) So, if the embargo is enforced
           | properly, it seems to me, that this would make it very
           | difficult for China to leapfrog us on AI, if we push the
           | breaks for a bit.
        
             | angry_octet wrote:
             | It wouldn't be long before AI researchers, stymied by the
             | ai paranoia, went off to jobs at Tencent or whoever in
             | India is big enough.
        
               | FrojoS wrote:
               | Well, if the US was serious about pulling the breaks on
               | AI research they could use export controls of advanced
               | chips on any country they don't trust to align with them
               | on the AI front.
        
         | ethanbond wrote:
         | No, and in fact if we rewind the clock a mere 12 months ago one
         | of the primary arguments against AI "worriers" was "of course
         | we wouldn't connect it to the internet before it was safe!"
         | 
         | Other gates we blew right through include, "we wouldn't...
         | 
         | 1. Connect it to the internet
         | 
         | 2. Make it available to the public
         | 
         | 3. Let it write and execute code
         | 
         | 4. Connect it to physical C&C systems
         | 
         | 5. Let it have money
         | 
         | 6. Let it replicate itself
         | 
         | 7. "Allow" it to lie/deceive
        
           | ramraj07 wrote:
           | Where did we let it replicate?
        
             | tough wrote:
             | ARC Team https://arstechnica.com/information-
             | technology/2023/03/opena...
        
               | PoignardAzur wrote:
               | Wait, the ARC team didn't do their tests in a closed
               | network? And they had it interact with actual people?
               | 
               | That's... well, it's probably fine given what they knew
               | about the model capabilities, but it's a pretty crappy
               | precedent to set for "protocol for testing whether our
               | cutting edge AI can do large-scale damage".
        
               | [deleted]
        
               | ethanbond wrote:
               | I don't think we should assume they know about their
               | capabilities. They seem surprised with each iteration
               | too.
        
               | ramraj07 wrote:
               | I missed that detail from the system card pdf. That was
               | beyond stupid. There's a marginal chance it's already
               | secretly replicated out of their environment.
        
             | eternalban wrote:
             | Energy + matter + design => Baby AI. "CnC". "Money".
             | "internet".
             | 
             | AI's startup will be strictly wfh ;)
        
           | ImHereToVote wrote:
           | What's the worst thing that could happen? Extinction of all
           | biological life in this solar system? Please.
        
           | kfrzcode wrote:
           | it's not really able to make curl requests it can just
           | generate it
        
             | ElijahLynn wrote:
             | At least with GPT-4, you can use [input from
             | https://www.example.com] to feed it input to analyze, if
             | you do it twice it will automatically compare both sources.
             | You can then even say "compare in a table". So, maybe not
             | curl but definitely doing requests.
        
             | FrojoS wrote:
             | Well, it seems trivial to write a program that uses GPT API
             | and curl request to feed GPT. Or am I missing something?
        
               | kfrzcode wrote:
               | left to it's own devices I reckon it'd be a real feat to
               | generate a GPT-based tool that takes over the world. What
               | prompts? What's the most impressive thing?
               | 
               | Say we had a GPT bot that built it's own social media,
               | somehow. How did it get there? what was the initial
               | prompt? "write to yourself via this api to figure out
               | audience growth until you gain 100k followers then wait
               | for further instruction, use any tool and leverage this
               | name and credit card number if you need to pay for any
               | tools or supplies"
               | 
               | Idk just brainstorming really have no idea what it'll do.
               | Will build this weekend and see what happens I guess.
        
               | maxwell wrote:
               | Reminded me of this scenario: https://nautil.us/the-last-
               | invention-of-man-236814
        
               | ChickeNES wrote:
               | Thanks for sharing this! I looked for it before but
               | couldn't remember the article name or source.
        
           | sdwr wrote:
           | Love this, couldn't be happier. Hear so much about potential
           | risks. Take our jobs blah blah end of life on earth blah
           | skynet etc..
           | 
           | What about the singularity and/or giving birth to a new form
           | of life?
        
             | ethanbond wrote:
             | Yeah same opinion for me w/ nuclear weapons.
             | 
             | Pretty cool to turn a planet into a sun temporarily!
             | 
             | /s
        
             | drdeca wrote:
             | "disneyland without children"
        
         | thefourthchime wrote:
         | It doesn't matter what you think, or even if we all agree. It's
         | nearly impossible to stop innovation. Humans can't stop
         | themselves.
        
           | ImHereToVote wrote:
           | The color of the website header you are currently on, should
           | tell you exactly what needs to happen.
        
       ___________________________________________________________________
       (page generated 2023-03-17 23:00 UTC)