[HN Gopher] Launch HN: Browser Use (YC W25) - open-source web ag...
___________________________________________________________________
Launch HN: Browser Use (YC W25) - open-source web agents
Hey HN, we're Gregor and Magnus, the founders of browser-use
(https://browser-use.com/), an easy way to connect AI agents with
the browser. Our agent library is open-source
(https://github.com/browser-use/browser-use) and we have what is
the biggest open-source community for browser agents. And now we
have a cloud offering--hence our Launch HN today! Check out this
video to see it in action:
https://preview.screen.studio/share/r1h4DuAk. There are lots more
demos at https://github.com/browser-use/browser-use on how we
control the web with prompts. We started coding a decade ago with
Selenium bots and macros to automate tasks. Then we both moved into
ML. Last November, we asked ourselves, "How hard could it be to
build the interface between LLMs and the web?" We launched on Show
HN (https://news.ycombinator.com/item?id=42052432) and have since
been addressing various challenges of browser automation, such as:
- Automation scripts break when the website changes - Automation
scripts are annoying to build - Captchas and rate limits - parsing
errors and API key management - and perhaps worst of all, login
screens. People use us to fill out their forms, extract data
behind login walls, or automate their CRM. Others use the xPaths
browser-use clicked on and build their scripts faster, or directly
rerun the actions of browser-use deterministically. We're currently
working on robust task reruns, agent memory for long tasks,
parallelization for repetitive tasks, and many other sweet
improvements. One interesting aspect is that some companies now
want to change their UI to be more agent-friendly. Some developers
even replace ugly UIs with nice ones and use browser-use to copy
data over. Besides the open-source we have an API. We host the
browser and LLMs for you and help you with handling proxy rotation,
persistent sessions and allowing you to run multiple instances in
parallel. We price at $30/month--significantly lower than OpenAI's
Operator. On the open-source side, browser use remains free. You
can use any LLM, from Gemini to Sonnet, Qwen, or even DeepSeek-R1.
It's licensed under MIT, giving you full freedom to customize it.
We'd love to hear from you--what automation challenges are you
facing? Any thoughts, questions, experiences are welcome!
Author : maggreenWAI
Score : 98 points
Date : 2025-02-25 15:45 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tnolet wrote:
| How are you different from https://www.browserbase.com/ and their
| Stagehand framework? [0]
|
| [0]https://github.com/browserbase/stagehand
| baal80spam wrote:
| From the first glance, browser-use is compatible with more
| models, and has (much) more github stars ;)
|
| Coincidentally I played with it over the last weekend using
| Gemini model. It's quite promising!
| gregpr07 wrote:
| Yeah, we are much bigger and work on a higher level.
| stagehand work step by step, we are trying to make end to end
| web agents.
| mritchie712 wrote:
| How do you keep your service from being blocked on LinkedIn?
|
| LinkedIn's API sucks. I run an analytics platform[0] that uses it
| and it only has 10% of what our customers are asking for. It'd be
| great to use browser-use, but in my experience, you run into all
| sort of issues with browser automation on LinkedIn.
|
| 0 - https://www.definite.app/
| MagMueller wrote:
| If you run it locally, you can connect it to your real browser
| and user profile where you are already logged in. This works
| for me for LinkedIn automation, e.g., to send friend requests
| or answer messages.
|
| A bigger problem on LinkedIn for us is all the nested UI
| elements and different scrolling elements. With some
| configuration in our extraction layer in buildDomTree.js and
| some custom actions, I believe someone could build a really
| cool LinkedIn agent.
| edoceo wrote:
| Make a separate profile and launch that for the scrape. Don't
| have to gum up your primary profile
| moralestapia wrote:
| Ha!
|
| I just saw this win an AI Hackaton in Toronto but they said it
| was their own thing, quite dishonest. Everyone was rightfully
| impressed, me as well not gonna lie. I was a bit sus someone
| could come up with something like this in a weekend, but they
| were from U of Waterloo, Vector Institute and whatnot, so I said
| "maybe". Now I know they were just a bunch of scammers, sad.
|
| Anyway, this is a great project, congratulations. It's so good
| it's making other people win already, lol. I have _so many_ use
| cases for this. I truly wish you the best!
|
| Edit: Downvote me all you want, if you love scammers so much I
| can send you their contact so you can "invest" in their trash.
| Lol.
| giarc wrote:
| Did they claim the project as their own, or did they use the
| open source to build a project?
| moralestapia wrote:
| They claimed the project as their own, with a title like "AI
| agents that do things for you".
|
| One of the judges explicitly asked if they actually made this
| thing or was it something else like "a video" showing what it
| would be like.
|
| One of the team members confidently replied it was real and
| that they made it all during the weekend.
|
| It was a bit too good to be true.
|
| Edit: I found a video of the thing. I initially posted it
| here but decided to delete it, the reason for that is I don't
| think they deserve to be publicly shamed. We were all having
| fun and they probably got a little carried away. If any of
| them sees this just don't do that next time. Play fair.
| baal80spam wrote:
| The audacity. Imagine if someone googled and exposed them.
| abrichr wrote:
| Which hackathon?
| taytus wrote:
| Most hackathons are like that.
| MagMueller wrote:
| For me, it simply demonstrates how easy and fast you can build
| these tools now. We have many fellow YC founders who build
| great products on top of browser-use. They don't have to quote
| us. I think it's awesome to enable so many new startup ideas.
| arjunchint wrote:
| Have you inspected or thought through the security of your open
| source library?
|
| You are using debugger tools such as CDP, launching playwright
| without a sandbox, and guiding users to launch Chrome in debugger
| mode to connect to browser-use on their main browser.
|
| The debugging tools you use have active exploits that Google
| doesn't fix because they are supposed to be for debugging and not
| for production/general use. This combined with your other two
| design choices let an exploit to escalate and infect their main
| machine.
|
| Have you considered not using all these debugging permissions to
| productionize your service?
| gregpr07 wrote:
| how would that work? Can you control the browser without debug
| mode? Especially in production the browsers are anyway running
| on single instance docker containers so the file system is not
| accesible... are there exploits that can do harm from a virtual
| machine?
| lostmsu wrote:
| Embed a WebView instead of launching browser?
| arjunchint wrote:
| Yes, I was able to figure out a secure way to control the
| browser with AI Agents at rtrvr.ai without using debugger
| permissions/tools so it is most definitely possible.
|
| I meant by in production in the sense how you are advising
| your users to setup the local installation. Even if you
| launch browser use locally within a container but your
| restarting the user's Chrome in debug mode and controlling it
| with CDP from within the container, then the door is wide
| open to exploits and the container doesn't do anything?!
| cookiengineer wrote:
| I've been following your progress for a while now and I'm super
| impressed how far you've got already.
|
| Are you working on unifying the tools that the LLM uses with the
| MCP / model context protocol?
|
| As far as I understand, lots of other providers (like
| Bolt/Stackblitz etc) are migrating towards this. Currently,
| there's not many tools available in the upstream specification
| other than File I/O and some minor interactions for system-use -
| but it would be pretty awesome if tools and services (like say, a
| website service) could be reflected there as it would save a lot
| of development overhead for the "LLM bindings".
|
| Very interesting stuff you're building!
| gregpr07 wrote:
| hmm, I though about this a lot. But tbh I think MCP is sort of
| a gimmick... probably the better way is for agents just to
| understand the http apis directly. Maybe I'm wrong, very happy
| to be convinced differently. Do you think MCP server for the
| cloud version would be useful?
| nostrebored wrote:
| strong agree with this -- I don't understand outside of
| integration with Claude Desktop why to use MCP rather than a
| dedicated API endpoint.
| gregpr07 wrote:
| What's your take - how can we expose Browser Use to as many
| use cases as possible? Is there easier way than openapi
| config?
| anyekwest wrote:
| These guys are goated
| xnx wrote:
| Is it possible to mix browser-use with traditional DOM/XPath/CSS-
| selector automation? e.g. Have certain automation steps that are
| more fuzzy/AI like "click on the image of a cat"
| gregpr07 wrote:
| We are experimenting with this. Currently the library api is
| very raw but technically possible (we introduced this notion of
| initial actions, which are just deterministic actions before
| the LLM kicks in) - https://github.com/browser-use/browser-
| use/blob/main/example....
|
| The other way to achieve this with Browser Use is to save the
| history from `history = agent.run()` and rerun it with
| `agent.rerun_history(history)`.
|
| I'd love to see if this can of any use to you!
| jackienotchan wrote:
| AI agents have lead to a big surge in scraping/crawling activity
| on the web, and many don't use proper user agents and don't stick
| to any scraping best practices that the industry has developed
| over the past two decades (robots.txt, rate limits). This comes
| with negative side effects for website owners (costs, downtime,
| etc.), as repeatedly reported on HN.
|
| Do you have any built-in features that address these issues?
| MagMueller wrote:
| Yes, some hosting services have experienced a 100%-1000%
| increase in hosting costs.
|
| On most platforms, browser use only requires the interactive
| elements, which we extract, and does not need images or videos.
| We have not yet implemented this optimization, but it will
| reduce costs for both parties.
|
| Our goal is to abstract backend functionality from webpages. We
| could cache this, and only update the cache if eTags change.
|
| Websites that really don't want us will come up with audio
| captchas and new creative methods.
|
| Agents are different from bots. Agents are intended as a direct
| user clone and could also bring revenue to websites.
| erellsworth wrote:
| >Websites that really don't want us will come up with audio
| captchas and new creative methods.
|
| Which you or other AIs will then figure a way around. You
| literally mention "extract data behind login walls" as one of
| your use cases so it sounds like you just don't give a shit
| about the websites you are impacting.
|
| It's like saying, "If you really don't want me to break into
| your house and rifle through your stuff, you should just buy
| a more expensive security system."
| gregpr07 wrote:
| imo if the website doesn't want us there the long term
| value is anyway not great (maybe exception is SERP apis or
| sth which live exlusively because google search api is
| brutally expensive).
|
| > extract data behind login walls
|
| We mean this more from a perspective of companies wanting
| it, but there is a login wall. For example (actual
| customer) - "I am a compliance company that has system from
| 2001 and interacting with it really painful. Let's use
| Browser Use to use the search bar, download data and report
| back to me".
|
| I believe in the long run agents will have to pay for the
| data from website providers, and then the incentives are
| once again aligned.
| erellsworth wrote:
| > imo if the website doesn't want us there the long term
| value is anyway not great
|
| Wat? You're saying if a website doesn't want your
| scraping their data then that data has low long-term
| value? Or are you saying something else because that
| makes no fucking sense.
| OsrsNeedsf2P wrote:
| Does anyone have experience comparing this to Skyvern[0]? I
| originally thought the $30/month would be the killer feature, but
| it's only $30 worth of credits. Otherwise they both seem to have
| the same offering
|
| [0] https://www.skyvern.com/
| gregpr07 wrote:
| I think our cloud is much simpler (just one prompt and go). But
| it's also sort of a different service. The main differences
| come from the open source side - we are essentially building
| more of a framework for anytime to use and they are just a web
| app.
| darepublic wrote:
| The title says make your website more accessible for agents...
| But then the quick start seemingly just acts from the agentic
| side to find a post on Reddit. So I didn't fully grok what this
| is about. My initial guess is you use agents on a website, allow
| them to think long, then come up with some selectors to speed up
| subsequent tries. But it's really not clear to me
| tsvoboda wrote:
| pretty sick stuff guys, excited to see what you accomplish
| dogman123 wrote:
| i tried the reddit quickstart example in the repo and it seemed
| to be incapable of completing the task.
|
| https://pastebin.com/PnLnQ3kY
| gregpr07 wrote:
| hmm interesting - sometimes it definitely fails yes. Will take
| a look!
|
| btw - our biggest challenge is exactly this, solving thousands
| of issues that arise on the fly.
| dogman123 wrote:
| fwiw, i had it do something _far_ more complex that i am
| currently dealing with at work and it performed perfectly in
| my few test cases. i see very heavy use of this tool in my
| future. just figured i'd give a shot about the quickstart not
| functioning as planned :)
| rvz wrote:
| > On the open-source side, browser use remains free. You can use
| any LLM, from Gemini to Sonnet, Qwen, or even DeepSeek-R1. It's
| licensed under MIT, giving you full freedom to customize it.
|
| As this project is MIT, that means companies like Amazon can
| deploy a managed version and can compete against you with prices
| going close to zero in their free-tier and with a higher quotas
| than what you are offering.
|
| I predict that this project is likely going to change to AGPL or
| a new business license to combat against this.
| hakaneskici wrote:
| What is your overall vision and roadmap about automated testing
| for web apps by bringing value from AI into the process? When I
| worked on the accessibilityinsights.io team, dealing with
| inconsistent or complicated xPaths was also an issue. Is AI
| vision helping there much?
___________________________________________________________________
(page generated 2025-02-25 23:00 UTC)