[HN Gopher] OmniParser for Pure Vision Based GUI Agent
___________________________________________________________________
OmniParser for Pure Vision Based GUI Agent
Author : fzliu
Score : 62 points
Date : 2024-10-25 18:54 UTC (4 hours ago)
(HTM) web link (microsoft.github.io)
(TXT) w3m dump (microsoft.github.io)
| jauntywundrkind wrote:
| I have a little bit of a vice of enjoying some "idle" games. I
| have intended to do some very basic manual screen carving & ocr &
| computer vision to try to "read" my state in these games, & have
| multi-actor "play" models for them, just for fun really & to
| decrease time sunk gaming (by spending significant time
| coding/learning).
|
| This certainly seems like it has a lot of promise to make that
| much much much easier. Game UI's are less uniform so maybe this
| might be harder or not easily be applicable, but hopefully
| _adamb wrote:
| As someone who has done this to many games over a few decades,
| I can definitively say: 100% of the time, it ruins the fun of
| the game.
|
| I can't say exactly why. Maybe you feel like you haven't earned
| it. Maybe it's the idle nature of farming that we really
| enjoy...
| fragmede wrote:
| Depends what you consider fun, and how far you take it. Some
| people enjoy programming more than repetitive clicking in a
| GUI. For a clicker game, writing a bot lets you iterate on
| strategies easier - is it faster to get to level 2 if I buy
| the upgrade for A or B first? For Trackmania, it lets you get
| a world record and a YouTube video with 14M views.
|
| https://youtu.be/Dw3BZ6O_8LY
| trq_ wrote:
| This is awesome, can't wait for evals against Claude Computer
| Use!
| amelius wrote:
| Can we first test this with basic sysadmin work in a simple
| shell?
| Smaug123 wrote:
| To a considerable extent, we are stuck in the world we live in;
| but I am reminded of a quote by Guillaume Allais:
|
| > My entire job seems to be repeating variations of "never start
| by forgetting the user's stated intent only to then attempt to
| guess it".
| akshayKMR wrote:
| Does it also tell the coordinates (x,y) of the annotated box
| w.r.t. the screenshot dimensions?
| s3tt3mbr1n1 wrote:
| Has anyone gotten this to work?
|
| Copying the repo and downloading the models through HuggingFace
| or manually does not seem to work, you get errors indicating
| missing files.
| amelius wrote:
| Can it detect ads and mask them out?
___________________________________________________________________
(page generated 2024-10-25 23:00 UTC)