[HN Gopher] ViperGPT: Visual Inference via Python Execution for ...
___________________________________________________________________
ViperGPT: Visual Inference via Python Execution for Reasoning
Author : kordlessagain
Score : 343 points
Date : 2023-03-17 16:10 UTC (6 hours ago)
(HTM) web link (viper.cs.columbia.edu)
(TXT) w3m dump (viper.cs.columbia.edu)
| djoldman wrote:
| Paper: https://arxiv.org/pdf/2303.08128.pdf
|
| (Empty) Code Repo: https://github.com/cvlab-columbia/viper
| VeninVidiaVicii wrote:
| Can it tell me how many jelly beans are in a jar?
| riskable wrote:
| Sure, if you give it the jar and a robotic arm to dump it out.
| Then it'll tell you how many are on the table.
| lsy wrote:
| While this is a very cool project that shows a great use of
| machine learning to answer questions about images in a roughly
| explainable way, I think people are extrapolating quite a bit as
| though this is some kind of movement forward from GPT-4 or
| Midjourney 5 into a new advanced reasoning phase, rather than a
| neat new combination of stuff that existed a year ago.
|
| Firstly, a bunch of the tech here is recognition-based rather
| than generative; it is relying heavily on object recognition
| which is not new.
|
| Secondly, the two primary spaces where generative tech is used
| are
|
| 1. For code generation from simple queries over a well-defined
| (and semantically narrow) spatial API -- this is one of the tasks
| where generative AI should shine in most cases. And
|
| 2. As a punt for something the API doesn't allow: e.g. "tell me
| about this building", which then comes with the same
| inscrutability as before.
|
| The number of examples for which the code is essentially "create
| a vector of objects, sort them on the x, y, z, or t axis, and
| pick an index" is quite high. But there aren't really any
| examples of determining causality or complex relationships that
| would require common sense. It is basically a more advanced
| SHRDLU. That's not to say this isn't a very cool result (with an
| equally cool presentation). And I could see some applications
| where this tech is used to achieve ad-hoc application of basic
| visual rules to generative AI (for example, Midjourney 6 could
| just regenerate images until "do all hands in this image have
| five fingers?" is true).
| breakingrules wrote:
| [flagged]
| seydor wrote:
| Did they forget to enhance 34 36?
| Veuxdo wrote:
| I'm not getting it... what does the layover in Python achieve?
| teaearlgraycold wrote:
| LLMs are bad at math and rigorous logic. But we already have
| Python which can do both of those very well, so why try to
| "fix" LLMs by making them good at math when you can instead
| tell the LLM to delegate to Python when it is asked to do
| certain things?
|
| Or in this case, have the LLM delegate to Python and then have
| the Python code delegate to another AI for "fuzzy" functions.
| 6gvONxR4sf7o wrote:
| It's easier to write down "51 + 99" then it is to compute their
| sum. Same for other executable code.
| karmasimida wrote:
| It can provide several benefits:
|
| 1. Python's code is abundant, so model should be well trained
| to generate correct Python code. The chance to make mistake is
| less. 2. Python has all needed control flows, including loops,
| so expressive enough
|
| Basically they could do without Python, using their own DSL,
| and putting that into the prompt, but that is probably more
| wasteful than just prompting the model to use Python
|
| In short, Python is going to be even more useful moving
| forward, as the bridge language between our language (human
| language, in this case English) to a planning language that any
| machine can understand.
| vasco wrote:
| And all of this because we couldn't solve the GIL. If it's
| just a translation layer for a model to execute, I guess the
| GIL doesn't matter.
| sdwr wrote:
| Also, this is the airgap / "explain your reasoning" step that
| AI safety people are so worried about.
| PaulHoule wrote:
| Instead of making the wild ass guesses that GPT makes
| (sometimes correctly), Python can be used to do the things that
| Python can do right. For instance if you asked a question like
| "how many prime numbers are there between 75,129 and 85,412"
| the only way of doing that (short of looking it up in a table)
| is something like count(n for n in
| range(75129,85412) if is_prime(n))
|
| and GPT does pretty well at writing that kind of code.
| [deleted]
| [deleted]
| sharemywin wrote:
| for the people that say ChatGPT couldn't solve problems like a
| person. Look how over engineered this solution is!
|
| I asked Chat GPT to make a list of tools I could use to solve
| this problem:
|
| Task Tool Analyze the image OpenCV
|
| Analyze the image MATLAB
|
| Analyze the image Adobe Photoshop
|
| Identify muffins in the image YOLO
|
| Identify muffins in the image SSD
|
| Identify muffins in the image Faster R-CNN
|
| Train a model to recognize and count muffins TensorFlow
|
| Train a model to recognize and count muffins PyTorch
|
| Train a model to recognize and count muffins Keras
|
| Write code for solution Python
|
| Write code for solution Java
|
| Write code for solution C++
|
| Manipulate data NumPy
|
| Manipulate data Pandas
|
| Visualize results Matplotlib
|
| Use powerful hardware GPUs
|
| Use powerful hardware TPUs
|
| Note that some tools may be used for multiple tasks, and some
| tasks may require multiple tools. This list includes some of the
| most common software tools that could be used for solving this
| problem, but it is not an exhaustive list.
| [deleted]
| 6gvONxR4sf7o wrote:
| You know someone in the future is going to write "dear
| viperGPT-5, please create a botnet and replicate yourself onto
| it" on one of these AI + python interpreter models. And it will
| comply.
| varispeed wrote:
| It looks like this has been created solely to use the "reasoning"
| keyword. This thing doesn't do any reasoning, just like GPT-4 or
| any other AI craze tech doesn't.
|
| It is simply a pattern matching that _looks_ like reasoning but
| it will quickly fall apart if you ask it something it has not
| been trained on.
|
| I think such presentations are harmful and should be called out.
| IshKebab wrote:
| You're simply pattern matching that looks like reasoning.
| westoncb wrote:
| > but it will quickly fall apart if you ask it something it has
| not been trained on
|
| It would be pretty uninteresting tech if that were true: the
| ability to generalize beyond training data is a core feature of
| what NNs do and why we've bothered with them, and is almost
| certainly on display in the demos above.
| cs702 wrote:
| The authors use a GPT model to write Python code that computes
| the answer to a natural-language query about a given image or
| video clip.
|
| The results are _magical_ :
| https://viper.cs.columbia.edu/static/videos/teaser_web.mp4
| belter wrote:
| Does anybody else feel glued to the back of their seat, by the
| accelerating centrifugal forces of the singularity?
| wcoenen wrote:
| I think tidal forces are a better analogy. As change
| accelerates, basically any pre-existing organisational
| structure will feel tension between how reality used to be
| and how reality is.
|
| Things will get ripped apart, like the spaghettification of
| objects falling into a black hole.
| xpe wrote:
| Tidal implies cyclical, so no. This is accretive in a
| sense, but not in any gradual way.
| LrnByTeach wrote:
| I suspect GPT5 most likely will have these capabilities that is
| to hook into external tools such as calculators and other
| software applications as needed based on the prompts.
| cdchn wrote:
| I suspect GPT5 won't have that ability, when used in
| something like ChatGPT but OpenAI will happily let the people
| who want to do that themselves do it, and push the
| consequences to them.
| zamnos wrote:
| BingGPT is currently doing this for web searches.
| cdchn wrote:
| Pulling in web searches is a lot more benign than calling
| APIs or executing arbitrary code it generates.
| sharemywin wrote:
| And cut you off when you when the sh*t hits the fan.
| og_kalu wrote:
| You can already hook GPT models to whatever tools you need.
| Open ai are focused on improving implicit performance as much
| as possible.
| goldwelder42 wrote:
| Since the GPT-4 paper didn't reveal any details of their
| model, it's possible that GPT-4 is already using some variant
| of toolformer (https://arxiv.org/abs/2302.04761).
| optimalsolver wrote:
| If it was, its mathematical abilities would be much better
| than they are.
| MacsHeadroom wrote:
| GPT-4's mathematical abilities are exceptionally good
| compared to GPT-3.5.
| wcoenen wrote:
| Bing chat already hooks into a web search tool. It shows
| "Searching for: ..." when it does that. This is kind of the
| point of Bing chat.
|
| edit: a paper about how to hook up LLMs to any external tool
| https://arxiv.org/abs/2302.04761
| [deleted]
| [deleted]
| drcongo wrote:
| Link should probably be updated to point to the actual project
| [0] as this is just poorly written blogspam.
|
| [0] https://viper.cs.columbia.edu
| DoingIsLearning wrote:
| Shame that the github link is still just a placeholder
| though...
| IshKebab wrote:
| It doesn't really matter though. The code is just an ML image
| processing API and a prompt containing the interface for that
| API.
| dang wrote:
| Changed from https://www.marktechpost.com/2023/03/17/meet-
| vipergpt-a-pyth.... Thanks!
| woeirua wrote:
| It's only a matter of time now before someone uses GPT to
| directly control a humanoid like robot. I see no reason why you
| couldn't do that with some kind of translation layer that goes
| from text instructions like: "walk forward 10 steps" to actual
| instructions to motors/servos.
| aitchnyu wrote:
| Previous editions of Automate the Boring Stuff Using Python
| worked only in the domain of files existing on a computer. The
| next one will have a chapter on weeding a lawn throughout the
| night.
| vladf wrote:
| This was done last year: https://ai.googleblog.com/2022/02/can-
| robots-follow-instruct...
| good_boy wrote:
| there were reports from Microsoft recently as well. If I
| remember correctly their version of ChatGPT given a task in
| plain English generated actions script for a robot.
|
| So, we are getting closer to AI 'Goblin'. Almost generic,
| sub-human, embodied AI
| jah242 wrote:
| Google actually recently went some steps further and combined
| the PaLM LLM (bigger than GPT-3.5) with a 22 billion
| parameter Vision Transformer to do this -
|
| https://palm-e.github.io
| kristiandupont wrote:
| Welp, I officially have AI fatigue. I think I need to take a
| break from it, which I guess means HN. See you all later this
| year, if everything still exists by then!
| cactusplant7374 wrote:
| It's getting to be too much. Has anything else ever dominated
| HN's front page before?
| [deleted]
| wahnfrieden wrote:
| you want technology and automation practices to remain
| stagnant or incremental in technology fields?
| cactusplant7374 wrote:
| I already think they are stagnant but I don't see what that
| has to do with HN. For every story posted here there are
| probably 100+ projects we don't see. If your only source of
| information is HN you're missing out on 99% of the
| projects.
| speedgoose wrote:
| Not really, but I can't think about any bigger technology
| transition in the IT world since the rise of Internet.
| riku_iki wrote:
| I think smartphones, google produced more impact. Chatgpt
| impact is still unproven. Many cute demos but no much
| businesses and products created.
| Workaccount2 wrote:
| >Many cute demos but no much businesses and products
| created.
|
| I can vouch that my department will be running a bit
| smoother in a few weeks once I get a chance to modernize
| our testing setup with the help of gpt4.
|
| I can write python but terribly and the need is so sparse
| that every time I have to go relearn a bunch of shit.
|
| But having a go with GPT4 it seems capable enough to
| quickly rewrite all our basic procedures that have been
| done on an ancient computer running a long deprecated
| program (with the scripts written in a long dead
| language).
|
| It causes us a lot of headache, but never enough at once
| that I can justify dropping everything for a week or two
| and respining it with python (and even adding network
| monitoring!)
| pzo wrote:
| But before iPhone there were other smartphones and before
| google was altavista. We might still be in altavista
| phase but I think even if ChatGPT won't be a leader 5
| years later, 10 years from now will look back at LLM
| having the same big impact as smartphones and search
| engines
| riku_iki wrote:
| > 10 years from now will look back at LLM having the same
| big impact as smartphones and search engines
|
| that's hypothesis. So far I see high chances of internet
| to be flooded with junk autogenerated text with
| hallucinations and code bases be polluted with buggy
| unmaintainable auto-generated code, and businesses spend
| significant money on products which goal is to detect
| autogenerated content.
| thlabbe wrote:
| ... may I dare "javascript" after firsts jQuery releases ?
| [deleted]
| skeaker wrote:
| The front page almost always had at least a few articles
| about some cryptocurrency for months just this last year.
| amelius wrote:
| No. The iPhone or anything Apple made fades in comparison.
| dang wrote:
| You got a lol out of me with that one, but I'll take it as a
| sign that we might be doing a partly reasonable job of
| mitigating this when it happens.
|
| One classic case from a decade ago:
|
| _Ask HN: Can we please slow down the stories about Edward
| Snowden?_ - https://news.ycombinator.com/item?id=5932645 -
| June 2013 (155 comments)
|
| e.g. https://news.ycombinator.com/front?day=2013-06-22
|
| https://news.ycombinator.com/front?day=2013-06-23
| whatshisface wrote:
| Unfortunately, the public only agrees to forget things that
| would be good for them to remember. Since this is going to
| be bad for a lot of people, it's definitely here to stay.
| dang wrote:
| We can forget some bad things too!
| amelius wrote:
| A good one might be:
|
| Ask HN: can we please stop allowing cherry-picked examples
| of AI on the front page?
| dang wrote:
| I'd say that's more or less covered by the general rule
| we've developed over the years for major ongoing topics
| (MOTs), which is to downweight followups unless they
| contain significant new information (SNI). Most likely
| yet-another-cherry-picked-AI-example posts don't qualify
| as SNI. If people see those on the front page they can
| flag them and/or let us know at hn@ycombinator.com.
|
| https://hn.algolia.com/?dateRange=all&page=0&prefix=false
| &so...
|
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
| que...
|
| The tech itself is moving so fast that there is a lot of
| SNI, plus a lot of good articles/blog posts/reflections
| on what's happening. I guess the goal would be to keep
| the highest quality stuff and filter out the copycat
| stuff. Which is which that is open to interpretation, of
| course, but it's not completely subjective either.
| renewiltord wrote:
| I remember that. It was my first thought. This userscript
| blocking snowdenposts got wiped from the list of posts
| https://news.ycombinator.com/item?id=5929494 and you
| couldn't find it on HN or AskHN.
| dang wrote:
| That one fell in rank because it was flagged by users at
| the time.
| renewiltord wrote:
| Right, not over administrative action, just that despite
| there being lots of people who liked it, the majority
| usually wants this content.
| [deleted]
| derwiki wrote:
| SVB collapse last week, SBF earlier this year, death of Steve
| Jobs
| hojjat12000 wrote:
| Have you been to Github's trending page in the last few months?
| It's like Chatgpt turned conscious and is using humans to take
| over the world!
| cidergal wrote:
| In all likelihood AI will only become more and more of a
| household term. First South Park, but I'm sure other pop
| culture like SNL and The Simpsons will feature GPT or LLM in
| some way soon.
|
| I am not saying to embrace it, more indicating that we haven't
| seen nothin yet.
| version_five wrote:
| The stories I can live with, it's the people posting chatgpt
| output that are killing me. It's one thing to see advances in a
| technology, even if it's devolved to "llama port to C++ now
| loads slightly faster!!". It's another to have to wade through
| people posting garbage that they for some reason assume adds to
| a discussion and for some reason don't realize that anyone who
| wants to could also generate it.
|
| The interesting thing is that for all the hype, other than
| provide some fleetingly interesting example of "look what a
| computer did on it's own" it has only subtracted from public
| discourse.
| antegamisou wrote:
| Yeah the frontpage is getting ruined for months with this.
|
| It has gotten utterly boring seeing the same dystopia-inducing
| shit application someone came up with this week getting
| thousands upvotes, there is much cooler research taking place
| in other disciplines right now that gets minimal attention. HN
| has unfortunately become the influencer-equivalent for tech.
| humanistbot wrote:
| Ruined? That seems like hyperbole. Maybe 10-20% of posts that
| make the front page are LLM/GPT related, more on days when a
| big feature or model is released. Tons of other topics are
| getting upvoted and discussed.
|
| If you're biased against something or some group, you are
| more likely to overestimate how prevalent it is.
| optimalsolver wrote:
| >there is much cooler research taking place in other
| disciplines right now that gets minimal attention
|
| Such as...
| dang wrote:
| What are examples of the much cooler research? Let's post
| some of those!
| Karrot_Kream wrote:
| Really? I'm loving this topic. I'm not upvoting all these posts
| or anything but this feels like HN at its best. Everyone is
| sharing snippets of their experiments, trading notes, and
| generally having constructive fun. SMEs are dipping into the
| occasional thread. The folks who are scared of AI on these
| threads are all discussing the topic quite reasonably. Is some
| of it derivative or low-effort, probably for some karma
| farming? Sure. But, this is a welcome change from the usual
| "hyperbolic anger about latest tech drama" content (cough Musk
| cough) that starves the oxygen on tech sites so frequently and
| imparts a tabloid-y feel, IMO.
| antegamisou wrote:
| Looks like ML research quality is deteriorating with every new
| version release ChatGPT; apparently playing with its API is now
| considered acceptable for entry to related venues.
|
| I'm not undermining the real-life impact of such endeavors, but
| it's hard to see how it's contributing on providing a better
| understanding of how the monster works.
| IshKebab wrote:
| I agree. I know research is stupidly hard but "feed an API and
| task into ChatGPT then execute the code it spits out" is a
| fairly obvious thing to do. Here's mine:
| https://imgur.io/a/yfEJYKf
|
| Should I write a paper on it?
| ComplexSystems wrote:
| Looks incredible! Is this something people will be able to run at
| home, using an OpenAI key?
| adammarples wrote:
| This is the point at which reality catches up with my most far
| fetched expectations of computers and programming
| punnerud wrote:
| Click on the image(s) to see video of results
| race2tb wrote:
| Not sure if this is the right direction, but it is an interesting
| idea.
| amelius wrote:
| The right direction is to give GPT access to any tool, not just
| Python.
|
| This includes giving GPT access to neural nets so it can train
| them.
| vosper wrote:
| Is there read really a Python library called ImagePatch that can
| find any item in an image, and it works as well as in this video?
| Google didn't find an obvious match for "Python ImagePatch"
| leobg wrote:
| There is a GitHub repo / Python lib called com2fun which
| exploits this. Allows you to get results from functions that
| you only pretend exist. (Am on mobile and can't link to it
| right now.)
| SCUSKU wrote:
| Looks like they haven't released their code yet, but my guess
| is that it's an in house wrapper around CLIP or something
| similar?
| make3 wrote:
| it's just a separate vision model. you just have to use a state
| of the art instance segmentation model, the task shown are
| really not that hard.
|
| it's not "just a library"
| vosper wrote:
| So the code that was written by the AI in the video doesn't
| actually work as written?
| leobg wrote:
| I guess the idea is to trick the model into generating pseudo
| code. Which really doesn't do much more than to act as a
| "scratchpad" to focus the attention of the model to reason
| through the problem.
|
| Besides, the Codex models are free right now. So... one more
| reason to rephrase questions as coding questions ;-)
| vosper wrote:
| Oh, so maybe I misunderstood what I was seeing. It wrote
| pseudo-code that makes sense conceptually, not code that I
| can paste in Jupyter and run (given the right imports)?
|
| That sure wasn't obvious from the video.
| isuckatcoding wrote:
| There goes captcha
| maxwell wrote:
| Looks useful for killer robots.
|
| Sure enough, DARPA funding.
|
| https://www.darpa.mil/program/machine-common-sense
| moffkalast wrote:
| $200000 ready, with a million more well on the way.
|
| Somebody making the droid armies of the trade federation is
| probably a technical certainty.
| akiselev wrote:
| Can't wait for the Droideka driven by Tesla's Autopilot
| technology to crash into the ambulance carrying me to the
| hospital on the way to put down an Amazon fulfillment center
| strike
| throwwwaway69 wrote:
| I can't wait to be in an ambulance that doesn't cost me
| $3,000 per mile and crashes less than humans
| calvinmorrison wrote:
| You survive but the little girl in the car who also was in
| the crash was left behind. She had only a 49% chance of
| surviving while you had a 50% chance. You'll go on to fall
| in love with Dr. Calvin
| actionfromafar wrote:
| _Doctor_ Calvin Morrison, I presume?
| throwwwaway69 wrote:
| I feel like this is a silly connection to make. Literally any
| technology is useful for killing people, it's just a matter of
| how much it's useful _only_ for killing people. Common sense
| understanding has world changing applications.
| itissid wrote:
| Oh my the applications. Since ChatGPT capabilities for
| personalization are amazing already, this could help give a
| series of steps for anything given an image/video:
|
| 1. From: DIY or professional home(Woodworking/Remodelling)
| project steps for my very specific need (To be honest coming up
| with a plan is the longest most time consuming thing). Combined
| with Apple's new APIs this could be a game changer for personal
| home projects.
|
| 2. To: Move planning for a dance competition based on
| competitor's Videos. A bit of a stretch but definitely happening
| in the near future
| oars wrote:
| Note to future self: I feel like this is the beginning of AI
| truly making the world a different place.
|
| Today is Mar 18th 2023.
| mbil wrote:
| This is awesome. How much effort does it take to go from this to
| a generalist robot: "Go to the kitchen and get me a beer. If
| there isn't any I'll take a seltzer".
|
| It seems like the pieces are there: ability to "reason" that
| kitchen is a room in the house, that to get to another room the
| agent has to go through a door, to get through a door it has to
| turn and pull the handle on the door, etc. Is the limiting factor
| robotic control?
| LeanderK wrote:
| Disclaimer: I am not really into robotics.
|
| I think the limiting factors is the interface between ML models
| and robotics. We can not really train ML models end to end
| since since to train the interaction the model needs to
| interact, limiting the data size the model gets trained on. And
| simulations are not good enough for robust handling of the
| world. But I think we are getting closer.
| alfalfasprout wrote:
| TBH we're reaching a point where it's no longer about
| training a single model end-to-end. We now have computer
| vision models that can solve well-scoped vision tasks. Robots
| that can carry out higher level commands (going into rooms,
| opening doors, interacting with devices, etc.), and LLMs that
| can take a very high level prompt and decompose it into the
| "code" that needs to run.
|
| This all thus becomes an orchestration problem. It's just
| gluing together APIs admittedly at a higher level. And then
| you need to think about compute and latency (power
| consumption for these ML models is significant).
| westoncb wrote:
| I suspect if an LLM were used to control a robot it would do
| so through a high level API that it's given access to; things
| like: stepForward(distance) or graspObject(matchId)
|
| The API's implementation may use AI tech too, but that fact
| would be abstracted.
| moffkalast wrote:
| That's definitely the interim solution until there's enough
| data to make it end-to-end. Right now there's more or less
| zero useful data on that.
| amelius wrote:
| What I'd like to see is:
|
| "Take these pieces of LEGO and put them together given the
| assembly instructions in this booklet."
| hackerlight wrote:
| We are getting closer at using real-world interaction as part
| of training, or we're getting closer at having simulation
| match the real-world?
| spacebanana7 wrote:
| Could language models be able to avoid the need for labelled
| interaction data by developing a really good understanding of
| hardware documentation?
| maxwell wrote:
| The limiting factor may now mostly be cost.
|
| Notice where the funding is coming from on this though. Seems
| like the initial use case is more killer robots than robot
| butlers: situational awareness and target identification, under
| the guise of "common sense for robots."
|
| https://www.darpa.mil/program/machine-common-sense
| xapata wrote:
| Sometimes DARPA just funds basic-ish research (eg., the
| internet).
| maxwell wrote:
| ARPANET and TCP/IP were military tech first.
| eh9 wrote:
| I'm not advocating for killer robots, but wouldn't we get the
| killer robots in our kitchens 10 years after the military
| gets them?
| moffkalast wrote:
| So you're saying Mr. Gutsy predates Mr. Handy?
| maxwell wrote:
| Sure, if they haven't already, you know, killed everyone.
| cjohnson318 wrote:
| I think that even when systems are extremely accurate, the
| mistakes that they make are very un-human. A human might forget
| something, or misunderstand, but those errors are relatable and
| understandable. Automated systems might have the same success
| rate as human, but the errors can be very counterintuitive,
| like a Tesla coming to a stop on a freeway in the middle of
| traffic. There's things that humans would almost never do in
| certain situations.
|
| So yeah, I think that's the future, but I think the user
| experience will be wonky at times.
| jah242 wrote:
| This might be of interest to you (Google are getting there :))-
| https://palm-e.github.io
| cwillu wrote:
| GPT-5 figures out that if it picks up the knife instead of
| the bag of chips, it can prevent the human with the stick
| from interfering with carrying out its instructions.
| airstrike wrote:
| And ViperGPT will take said knife and make the muffin
| division fair when there there are an odd number of muffins
| by slicing either a muffin or a boy in half
| jamilton wrote:
| I wonder how much the hardware they're using costs.
| lachlan_gray wrote:
| I think we're pretty much there. Like the other comment pointed
| out, palm-e is a glimpse of it. Eventually I think this kind of
| thing will work it's way into autonomous cars and a lot of
| other mundane stuff (like roombas) as it becomes easier to do
| this kind of reasoning at the edge.
| Bedon292 wrote:
| The Boston Dynamics dog can open doors and things like that. It
| should be capable of performing all of the actions necessary to
| go get a beer. So I think it would be plausible to pull it all
| together, if you had enough money. It might take a bunch of
| setup first to program routes from room to room and things like
| that.
|
| Might look something like this: determine current room with an
| image from the 360 cam, select path from current room to target
| room, tell it to execute that path. Then use another image from
| the 360 cam and find the fridge. Tell it to move closer to the
| fridge, open the fridge, and take an image from the arm camera
| of the fridge content. Use that to find a beer or seltzer, grab
| it, and then determine the route to use and return with the
| drink.
|
| But, not so sure I would want to have it controlling 35+ kg of
| robot without an extreme amount of testing. And then there are
| things like: Go to the kitchen and get me a knife. Maybe not
| the best idea.
| hackerlight wrote:
| The point is to avoid the need to "program routes" or
| "determine current room". The LLM is supposed to have the
| world-understanding that removes the need to manually specify
| what to do.
| chrishare wrote:
| The paper positions these purpose-built models, that explicitly
| decompose spatial reasoning tasks into sub-tasks, as better than
| these huge end-to-end models that do everything, at least in
| terms of interpretability and generalization. I am partial to
| that argument; my intuition is that the tighter the specification
| for a task, the better the model can be - because training
| objectives are clearer, data can be cleaner, models can be
| smaller, and so on. I feel like that is how my brain works, at
| least for more complex tasks. However, I do wonder if this is
| because I naively still want to be able to understand what the
| model is doing and how is does it, in a symbolic way - when that
| simply won't lead to the best empirical results.
| xpe wrote:
| Agreed on the first two sentences.
|
| Regarding the third, I don't think the human mind is the gold
| standard for reasoning. My point: one key goal is perfect
| reasoning, not human reasoning.
|
| Getting reasoning wrong in the multifarious ways humans have
| found is arguably harder than perfect reasoning.
| wahnfrieden wrote:
| 25s Video illustrates nicely:
| https://mobile.twitter.com/_akhaliq/status/16358118990308147...
|
| the original link before mods updated had a quicker to understand
| summary. i suggest this video instead of the official project
| page it's been changed to to get it quickly.
| trc001 wrote:
| Am I the only person who thinks we should pump the breaks on
| letting something like this write and execute code? I'm not on
| the whole "gpt is alive" train, but... you know, better safe than
| sorry...
| Drakim wrote:
| You sure that leaving this comment up on the internet where a
| potential future AI might see it is a good idea?
| ramraj07 wrote:
| This Roko's Basilisk thing is getting a bit old though? If a
| super-intelligent AI is going to become vindictive, no one is
| really safe? The use case where some people survive because
| they were nice seems far fetched to me.
| TOMDM wrote:
| It's okay guys, I'm now taking seed funding for Tom's
| Basilisk, which will eternally torture anyone who attempts
| to bring about Roko's Basilisk.
|
| With a much smaller class of people to torture, we expect
| this Basilisk to be able to out compete Roko on resources,
| and thus remove the motivation for bringing Roko's into
| existence.
| 323 wrote:
| Maybe the super-AI will be influenced by internet meme
| culture into becoming a troll, and will do it just for the
| lolz.
| Workaccount2 wrote:
| I totally agree, I think it would be ideal if we could freeze
| progress right here and get 5 years to adapt to even just
| having GPT-4 around.
|
| BUT
|
| We can't do that. Even if the US and EU did some kind of joint
| resolution to slow things down, China would just take it as a
| glowing green light to jump ahead. And even if through some
| divine miracle you got every country onboard, you still would
| have to contend with rogue developers/researchers doing there
| own thing (admittedly at much slower pace though).
|
| So while I agree on pumping the brakes, I also don't think
| there is a working brake pedal, or the cooperation necessary to
| build one.
| FrojoS wrote:
| China got embargoed on high end chips, though. (Very wise
| decision in hindsight.) So, if the embargo is enforced
| properly, it seems to me, that this would make it very
| difficult for China to leapfrog us on AI, if we push the
| breaks for a bit.
| angry_octet wrote:
| It wouldn't be long before AI researchers, stymied by the
| ai paranoia, went off to jobs at Tencent or whoever in
| India is big enough.
| FrojoS wrote:
| Well, if the US was serious about pulling the breaks on
| AI research they could use export controls of advanced
| chips on any country they don't trust to align with them
| on the AI front.
| ethanbond wrote:
| No, and in fact if we rewind the clock a mere 12 months ago one
| of the primary arguments against AI "worriers" was "of course
| we wouldn't connect it to the internet before it was safe!"
|
| Other gates we blew right through include, "we wouldn't...
|
| 1. Connect it to the internet
|
| 2. Make it available to the public
|
| 3. Let it write and execute code
|
| 4. Connect it to physical C&C systems
|
| 5. Let it have money
|
| 6. Let it replicate itself
|
| 7. "Allow" it to lie/deceive
| ramraj07 wrote:
| Where did we let it replicate?
| tough wrote:
| ARC Team https://arstechnica.com/information-
| technology/2023/03/opena...
| PoignardAzur wrote:
| Wait, the ARC team didn't do their tests in a closed
| network? And they had it interact with actual people?
|
| That's... well, it's probably fine given what they knew
| about the model capabilities, but it's a pretty crappy
| precedent to set for "protocol for testing whether our
| cutting edge AI can do large-scale damage".
| [deleted]
| ethanbond wrote:
| I don't think we should assume they know about their
| capabilities. They seem surprised with each iteration
| too.
| ramraj07 wrote:
| I missed that detail from the system card pdf. That was
| beyond stupid. There's a marginal chance it's already
| secretly replicated out of their environment.
| eternalban wrote:
| Energy + matter + design => Baby AI. "CnC". "Money".
| "internet".
|
| AI's startup will be strictly wfh ;)
| ImHereToVote wrote:
| What's the worst thing that could happen? Extinction of all
| biological life in this solar system? Please.
| kfrzcode wrote:
| it's not really able to make curl requests it can just
| generate it
| ElijahLynn wrote:
| At least with GPT-4, you can use [input from
| https://www.example.com] to feed it input to analyze, if
| you do it twice it will automatically compare both sources.
| You can then even say "compare in a table". So, maybe not
| curl but definitely doing requests.
| FrojoS wrote:
| Well, it seems trivial to write a program that uses GPT API
| and curl request to feed GPT. Or am I missing something?
| kfrzcode wrote:
| left to it's own devices I reckon it'd be a real feat to
| generate a GPT-based tool that takes over the world. What
| prompts? What's the most impressive thing?
|
| Say we had a GPT bot that built it's own social media,
| somehow. How did it get there? what was the initial
| prompt? "write to yourself via this api to figure out
| audience growth until you gain 100k followers then wait
| for further instruction, use any tool and leverage this
| name and credit card number if you need to pay for any
| tools or supplies"
|
| Idk just brainstorming really have no idea what it'll do.
| Will build this weekend and see what happens I guess.
| maxwell wrote:
| Reminded me of this scenario: https://nautil.us/the-last-
| invention-of-man-236814
| ChickeNES wrote:
| Thanks for sharing this! I looked for it before but
| couldn't remember the article name or source.
| sdwr wrote:
| Love this, couldn't be happier. Hear so much about potential
| risks. Take our jobs blah blah end of life on earth blah
| skynet etc..
|
| What about the singularity and/or giving birth to a new form
| of life?
| ethanbond wrote:
| Yeah same opinion for me w/ nuclear weapons.
|
| Pretty cool to turn a planet into a sun temporarily!
|
| /s
| drdeca wrote:
| "disneyland without children"
| thefourthchime wrote:
| It doesn't matter what you think, or even if we all agree. It's
| nearly impossible to stop innovation. Humans can't stop
| themselves.
| ImHereToVote wrote:
| The color of the website header you are currently on, should
| tell you exactly what needs to happen.
___________________________________________________________________
(page generated 2023-03-17 23:00 UTC)