[HN Gopher] OmniParser for Pure Vision Based GUI Agent
       ___________________________________________________________________
        
       OmniParser for Pure Vision Based GUI Agent
        
       Author : fzliu
       Score  : 62 points
       Date   : 2024-10-25 18:54 UTC (4 hours ago)
        
 (HTM) web link (microsoft.github.io)
 (TXT) w3m dump (microsoft.github.io)
        
       | jauntywundrkind wrote:
       | I have a little bit of a vice of enjoying some "idle" games. I
       | have intended to do some very basic manual screen carving & ocr &
       | computer vision to try to "read" my state in these games, & have
       | multi-actor "play" models for them, just for fun really & to
       | decrease time sunk gaming (by spending significant time
       | coding/learning).
       | 
       | This certainly seems like it has a lot of promise to make that
       | much much much easier. Game UI's are less uniform so maybe this
       | might be harder or not easily be applicable, but hopefully
        
         | _adamb wrote:
         | As someone who has done this to many games over a few decades,
         | I can definitively say: 100% of the time, it ruins the fun of
         | the game.
         | 
         | I can't say exactly why. Maybe you feel like you haven't earned
         | it. Maybe it's the idle nature of farming that we really
         | enjoy...
        
           | fragmede wrote:
           | Depends what you consider fun, and how far you take it. Some
           | people enjoy programming more than repetitive clicking in a
           | GUI. For a clicker game, writing a bot lets you iterate on
           | strategies easier - is it faster to get to level 2 if I buy
           | the upgrade for A or B first? For Trackmania, it lets you get
           | a world record and a YouTube video with 14M views.
           | 
           | https://youtu.be/Dw3BZ6O_8LY
        
       | trq_ wrote:
       | This is awesome, can't wait for evals against Claude Computer
       | Use!
        
         | amelius wrote:
         | Can we first test this with basic sysadmin work in a simple
         | shell?
        
       | Smaug123 wrote:
       | To a considerable extent, we are stuck in the world we live in;
       | but I am reminded of a quote by Guillaume Allais:
       | 
       | > My entire job seems to be repeating variations of "never start
       | by forgetting the user's stated intent only to then attempt to
       | guess it".
        
       | akshayKMR wrote:
       | Does it also tell the coordinates (x,y) of the annotated box
       | w.r.t. the screenshot dimensions?
        
       | s3tt3mbr1n1 wrote:
       | Has anyone gotten this to work?
       | 
       | Copying the repo and downloading the models through HuggingFace
       | or manually does not seem to work, you get errors indicating
       | missing files.
        
       | amelius wrote:
       | Can it detect ads and mask them out?
        
       ___________________________________________________________________
       (page generated 2024-10-25 23:00 UTC)