[HN Gopher] TaxyAI: Open-source browser automation with GPT-4
___________________________________________________________________
TaxyAI: Open-source browser automation with GPT-4
Author : kcorbitt
Score : 70 points
Date : 2023-03-28 17:07 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| seydor wrote:
| Wow this kind of thing makes plugins obsolete. I thought it would
| take more than a week
| krembanan wrote:
| This is very cool, impressive work in 2 weeks! Each action seems
| to have some delay after it, is there any reason for that? Is it
| because you are streaming the OpenAI response and performing the
| actions as they come? If not, I imagine streaming the query
| response and executing each action as they emit would speed it up
| quite a bit?
| serjester wrote:
| Why use GPT-4? The latency is significantly worse than 3.5 and
| this seems simple enough that the performance delta is marginal.
| If I was going for robustness, I probably wouldn't be using AI in
| the first place.
|
| Edit: I noticed they support both but I'm assuming by the speed
| all the demos are using 3.5?
| Karrot_Kream wrote:
| I find that GPT-4 works much better with ReAct than GPT-3 for
| more complex tasks.
| dopeboy wrote:
| Anxiously been waiting for something like this - very cool.
|
| One use case I've had is that I hate spending time on my
| linkedin, twitter, etc newsfeeds. But there are a handful of
| people I care about and want to keep tabs on.
|
| Is there a way I could use TaxyAI to setup a role to monitor my
| LinkedIn newsfeed and keep tabs on certain people + topics and
| then email me a digest of that?
| [deleted]
| ashcorbitt22 wrote:
| It was such an amazing, surreal experience using taxy to complete
| a task! It made the task enjoyable and exciting!
| WonderBuilder wrote:
| This is amazing already! Very exciting. I'll make sure I follow
| this project's progress. It also reminds me of Adept and their
| goal with ACT-1. I still haven't seen their product launch,
| though...
| snihalani wrote:
| TAKE. MY. MONEY. NOW.
| Imnimo wrote:
| It will be interesting to see whether this sort of approach works
| better than something using GPT-4's vision capabilities.
| Obviously websites are built to be easy to use visually rather
| than easy to use via the DOM. On the other hand, it's much less
| clear how to ground action proposals in the visual domain - how
| do you ask GPT where on an image of the screen it wants to click?
| dpflan wrote:
| Curious: Can someone explain what they are excited to use this
| for? Can someone provide a large scale use-case/scenario?
| koch wrote:
| Filling out job applications using my resume
| kcorbitt wrote:
| Hey HN! My brother Arctic_fly and I spent the last two weeks
| since the GPT-4 launch building Taxy, an open source Chrome
| extension that lets you automate arbitrary tasks in your browser
| using GPT-4. You can see a few demos in the Github README, but
| basically it works like this:
|
| 1. You open the extension and write the task you'd like done (eg.
| "schedule a meeting with David tomorrow at 2").
|
| 2. Taxy pulls the DOM of the current page, puts it through a
| pipeline to remove all non-semantic information, hidden elements,
| etc and sends it to GPT-4 along with your text instructions.
|
| 3. GPT-4 tries to figure out what action to take. In our prompt
| we give it the option to either click an element or set an
| input's value. We use the ReAct paradigm
| (https://arxiv.org/abs/2210.03629) so it explains what it's
| trying to do before taking an action, which both makes it more
| accurate and helps with debugging.
|
| 4. Taxy parses GPT-4's response and performs the action requested
| on the page. It then goes back to step (2) and asks GPT-4 for the
| next action to take with the updated page DOM. It also sends the
| list of actions already taken as part of the current task so
| GPT-4 can detect if it's getting stuck in a loop and abort. :)
|
| 5. Once GPT-4 has decided the task is done or it can't make any
| more progress, it responds with a special action indicating it's
| done.
|
| Right now there are a lot of limitations, and this is more a
| "research preview" than a finished product. That said, I've found
| it surprisingly capable for a number of tasks, and I think it's
| in a stable enough place we can share. Happy to answer any
| questions!
| koolba wrote:
| Very cool. The "sending everything of relevance on the page to
| OpenAI" is of course creepy. But that's table stakes for anything
| like this until people can run them externally.
|
| This would make a cool, "magic box", at the top of a web page.
| Type in what you want to do, it sends it to the server along with
| the DOM extract (same site server). Server asks magical LLM how
| to do it, and then spits it back to the client. So no plug-in
| needed and data flow would pass through the source server.
| Arctic_fly wrote:
| Already useful across a variety of domains, and it's in early
| days yet!
|
| Just yesterday I used to create a GitHub issue with minimal
| effort.
___________________________________________________________________
(page generated 2023-03-28 23:00 UTC)