[HN Gopher] Puppeteer Support for Firefox
___________________________________________________________________
Puppeteer Support for Firefox
Author : cpeterso
Score : 309 points
Date : 2024-08-07 16:19 UTC (6 hours ago)
(HTM) web link (hacks.mozilla.org)
(TXT) w3m dump (hacks.mozilla.org)
| hugs wrote:
| Ranked #4 on HN at the moment and no comments. So I'll just say
| hi. (Selenium project creator here. I had nothing to do with this
| announcement, but feel free to ask me anything!)
|
| My hot take on things: When the Puppeteer team left Google to
| join Microsoft and continue the project as Playwright, that left
| Google high and dry. I don't think Google truly realized how
| complementary a browser automation tool is to an AI-agent
| strategy. Similar to how they also fumbled the bag on transformer
| technology. (The T in GPT)... So Google had a choice, abandon
| Puppeteer and be dependent on MS/Playwright... or find a path
| forward for Puppeteer. WebDriver BiDi takes all the chocolatey
| goodness of the Chrome DevTools Protocol (CDP) that Puppeteer
| (and Playwright) are built on... and moves that forward in a
| standard way (building on the earlier success of the W3C
| WebDriver process that browser vendors and members of the
| Selenium project started years ago.)
|
| Great to see there's still a market for cross-industry standards
| and collaboration with this announcement from Mozilla today.
| localfirst wrote:
| is it possible to now use Puppeteer from inside the browser? or
| do security concerns restrict this?
|
| what does Webdriver Bidi do and what do you mean by "taking the
| good stuff from CDP"
|
| I don't want to run my scrapes in the cloud and pay a monthly
| fee
|
| I want to run them locally. I want to run LLM locally too.
|
| I'm sick of SaaS
| hugs wrote:
| Puppeteer controls a browser... from the outside... like a
| puppeteer controls a puppet. Other tools like Cypress (and
| ironically the very first version of Selenium 20 years ago)
| drive the browser from the inside using JavaScript. But we
| abandoned that "inside out" approach in later versions of
| Selenium because of the limitations imposed by the browser JS
| security sandbox. Cypress is still trying to make it work and
| I wish them luck.
|
| You could probably figure out how to connect Llama to
| Puppeteer. (If no one has done it, yet, that would be an
| awesome project.)
| localfirst wrote:
| I see im still looking for a way to control browser from
| the inside via an extension browser. very tough problem to
| solve.
| fitsumbelay wrote:
| I do alot quick manually scrapes via devtools
|
| you could try this
|
| Chrome web scraper extension -
| https://chromewebstore.google.com/detail/web-scraper-
| free-we...
| hugs wrote:
| Yup. Lately, I've been doing it a completely different
| way (but still from the outside)... Using a Raspberry Pi
| as a fake keyboard and mouse. (Makes more sense in the
| context of mobile automation than desktop.)
|
| What's good for security is generally bad for
| automation... and trying to automate from inside a
| heavily secured sandbox is... frustrating. It works a
| little bit (as Cypress folks more recently learned), but
| you can never get to 100% covering all the things you'd
| want to cover. Driving from the outside is easier... but
| still not easy!
| localfirst wrote:
| interesting so you are emulating hardware inputs from RPi
|
| how is it reading whats on the screen? computer vision?
| hugs wrote:
| Not to make this an ad for my project, but I'm starting
| to document it more here: https://valetnet.dev/
|
| The Raspberry Pi is configured to use the USB HID
| protocol to look and act like a mouse and keyboard when
| plugged into a phone. (Android and iOS now support mouse
| and keyboard inputs). For video, we have two models:
|
| - "Valet Link" uses an HDMI capture card (and a multi-
| port dongle) to pull the video signal directly from the
| phone if available. (This applies to all iPhones and
| high-end Samsung phones.)
|
| - "Valet Vision" which uses the Raspberry Pi V3 camera
| positioned 200mm above the phone to grab the video that
| way. Kinda crazy, but it works when HDMI output is not
| available. The whole thing is also enclosed in a black
| box so light from the environment doesn't affect the
| video capture.
|
| Then once we have an image, yes, you use whatever library
| you want to process and understand what's in the image. I
| currently use OpenCV and Tesseract (with Python). Could
| probably write a book about the lessons learned getting a
| "vision first" approach to automation working (as opposed
| to the lower-level Puppeteer/Playwright/Selenium/Appium
| way to do it.
| weaksauce wrote:
| are you using native messaging? there's a way to bridge a
| program running with full permissions inside the computer
| that could use puppeteer or the like.
| https://developer.mozilla.org/en-US/docs/Mozilla/Add-
| ons/Web...
|
| seems like it wouldn't be that hard to sync the two but
| the devil is in the details. also installing the native
| script is outside the purview of the webext so you need
| to have an installer.
| namukang wrote:
| I do this for https://browserflow.app (and the AI version
| in development at https://browserbot.ai) via the
| chrome.debugger API: https://developer.chrome.com/docs/ex
| tensions/reference/api/d...
| fitsumbelay wrote:
| webdriver bidi info
| -https://www.youtube.com/watch?v=6oXic6dcn9w
|
| local scraping howto - https://www.freecodecamp.org/news/web-
| scraping-in-javascript...
|
| local LLM framework - https://ollama.com/
| jgraham wrote:
| > Is it possible to now use Puppeteer from inside the
| browser?
|
| Talking about WebDriver (BiDi) in general rather than
| Puppeteer specifically, it depends what exactly you mean.
|
| Classic WebDriver is a HTTP-based protocol. WebDriver BiDi
| uses websockets (although other transports are a possibility
| for the future). Script running inside the browser can create
| HTTP connections and create websockets connections, so you
| can create a web page that implements a WebDriver or
| WebDriver BiDi client. But of course you need to have a
| browser to connect to, and that needs to be configured to
| actually allow connections from your host; for obvious
| security reasons that's not allowed by default.
|
| This sounds a bit obscure, but it can be useful. Firefox
| devtools is implemented in HTML+JS in the browser (like the
| rest of the Firefox UI), and can connect to a different
| Firefox instance (e.g. for debugging mobile Firefox from
| desktop). The default runner for web-platform-tests drives
| the browser from the outside (typically) using WebDriver, but
| it also provides an API so the in-browser tests can access
| some WebDriver commands.
| hoten wrote:
| Yes. I'm not aware of any documentation walking one through
| it though.
|
| There is a extension api that exposes a CDP connection [1][2]
|
| You can create a Puppeteer.Browser given a CDP connection.
|
| You can bundle Puppeteer in a browser (we do this in
| Lighthouse/Chrome DevTools[3]).
|
| These two things is probably enough to get it working, though
| it may be limited to the active tab.
|
| [1] https://chromedevtools.github.io/devtools-
| protocol/#:~:text=...
|
| [2] https://stackoverflow.com/a/55284340/24042444
|
| [3] https://source.chromium.org/chromium/chromium/src/+/main:
| thi...
| SomaticPirate wrote:
| If I wanted to write some simple web-automation as a DevOps
| engineer with little javascript (or webdev experience at all)
| what tool would you recommend?
|
| Some example use cases would be writing some basic tests to
| validate a UI or automate some form-filling on a javascript
| based website with no API.
| hugs wrote:
| Unironically, ask ChatGPT (or your favorite LLM) to create a
| hello world WebDriver or Puppeteer script (and installation
| instructions) and go from there.
| righthand wrote:
| "Go ask ChatGPT" is the new "RTFM".
| hugs wrote:
| sorry, not sorry?
| distortedsignal wrote:
| I don't think they're criticizing - I think it's
| observation.
|
| It makes a lot of sense, and we're early-ish to the tech
| cycle. Reading the Manual/Google/ChatGPT are all just
| tools in the toolbelt. If you (an expert) is giving this
| advice, it should become mainstream soon-ish.
| 0x1ch wrote:
| I think this is where personal problem solving skills
| matter. I use ChatGPT to start off a lot of new ideas or
| projects with unfamiliar tools or libraries I will be
| using, however the result isn't always good. From here, a
| good developer will take the information from the A.I
| tool and look further into current documentation to
| supplement.
|
| If you can't distinguish bad from good with LLMs, you
| might as well be throwing crap at the wall hoping it will
| stick.
| tssge wrote:
| >If you can't distinguish bad from good with LLMs, you
| might as well be throwing crap at the wall hoping it will
| stick.
|
| This is why I think LLMs are more of a tool for the
| expert rather than for the novice.
|
| They give more speedup the more experience one has on the
| subject in question. An experienced dev can usually spot
| bad advice with little effort, while a junior dev might
| believe almost any advice due to the lack of experience
| to question things. The same goes for asking the right
| questions.
| progmetaldev wrote:
| This is where I tell younger people thinking about
| getting into computer science or development that there
| is still a huge need for those skills. I think AI is a
| long way off from taking away problem solving skills.
| Most of us that have had the (dis)pleasure of needing to
| repeatedly change and build on our prompts to get close
| to what we're looking for will be familiar with this.
| Without the general problem solving skills we've
| developed, at best we're going to luck out and get just
| the right solution, but more than likely will at best
| have a solution that only gets partially towards what we
| actually need. Solutions will often be inefficient or
| subtly wrong in ways that still require knowledge in the
| technology/language being produced by the LLM. I even
| tell my teenage son that if he really does enjoy coding
| and wishes to pursue it as a career, that he should go
| for it. I shouldn't be, but I'm constantly astounded by
| the number of people that take output from a LLM without
| checking for validity.
| devsda wrote:
| I think it's the new "search/lookup xyz on Google".
|
| Because Google search and search in general is no longer
| reliable or predictable and top results are likely to be
| ads or seo optimized fluff pieces, it is hard to make a
| search recommendation these days.
|
| For now, ChatGPT is the new no-nonsense search
| engine(with caveats).
| abdusco wrote:
| Use playwright's code generator that turns turn page
| interactions into code.
|
| https://playwright.dev/python/docs/codegen-intro
| anothername12 wrote:
| Is the WebDriver standard a good one? (Relative to playwright I
| guess) I seem to recall some pains implementing it a few years
| ago.
| huy-nguyen wrote:
| What's the relationship between Selenium, Puppeteer and
| Webdriver BiDi? I'm a happy user of Playwright. Is there any
| reason why I should consider Selenium or Puppeteer?
| imiric wrote:
| > Is there any reason why I should consider Selenium or
| Puppeteer?
|
| I'm not a heavy user of these tools, but I've dabbled in this
| space.
|
| I think Playwright is far ahead as far as features and
| robustness go compared to alternatives. Firefox has been
| supported for a long time, as well as other features
| mentioned in this announcement like network interception and
| preload scripts. CDP in general is much more mature than
| WebDriver BiDi. Playwright also has a more modern API, with
| official bindings in several languages.
|
| One benefit of WebDriver BiDi is that it's in process of
| becoming a W3C standard, which might lead to wider adoption
| eventually.
|
| But today, I don't see a reason to use anything other than
| Playwright. Happy to read alternative opinions, though.
| Vinnl wrote:
| I think Playwright depends on forking the browsers to support
| the features they need, so that may be less stable than using
| a standard explicitly supported by the browsers, and/or more
| representative of realistic browser use.
| hugs wrote:
| Maybe you don't want to live in world where Microsoft owns
| everything (again)?
| fitsumbelay wrote:
| Been waiting for this. This _rocks_
| mstijak wrote:
| Are there any advantages to using Firefox over Chrome for
| exporting PDFs with Puppeteer?
| lol768 wrote:
| I've found Firefox to produce better PDFs than Chrome does, for
| what it's worth. There are some CSS properties that Chrome/Skia
| doesn't honour properly (e.g. repeating-linear-gradient) or
| ends up generating PDFs from that don't work universally.
| freedomben wrote:
| Indeed, Firefox uses PDF.js which I've found to produce
| really good results.
| mook wrote:
| Doesn't PDF.js go the other way (convert a PDF into HTML-
| and-friends for display in a browser, instead of "printing"
| a page into a PDF)?
|
| I haven't dug into it and am quite possibly incorrect,
| hence the request for confirmation!
| whatnotests2 wrote:
| For an alternative approach, try browserbase.com
|
| * https://browserbase.com/
| cebert wrote:
| Playwright is such a good experience. I don't understand why
| you would need something like browserbase.
| e12e wrote:
| What are reasons to prefer puppeteer to playwright which supports
| many browsers?
|
| > Cross-browser. Playwright supports all modern rendering engines
| including Chromium, WebKit, and Firefox.
|
| https://playwright.dev/
| Vinnl wrote:
| I said this in a subthread:
|
| > I think Playwright depends on forking the browsers to support
| the features they need, so that may be less stable than using a
| standard explicitly supported by the browsers, and/or more
| representative of realistic browser use.
|
| (And for Safari/WebKit to support it as well, but I'm not
| holding my breath for that one.) Though I hope Playwright will
| adopt BiDi at some point as well, as its testing features and
| API are really nice.
| yoavm wrote:
| I know this isn't what the WebDriver BiDi protocol is for, but I
| feel like it's 90% there to being a protocol through which you
| can create browsers, with swappable engines. Gecko has gone a
| long way since Servo, and it's actually quite performant these
| days. The sad thing is that it's so much easier to create a
| Chromium-based browser than it is to create a Gecko based one.
| But with APIs for navigating, intercepting requests, reading the
| console, executing JS - why not just embed the thing, remove all
| the browser chrome around it, and let us create customized
| browsers?
| djbusby wrote:
| I have dreamed about a swappable engine.
|
| Like, a wrapper that does my history and tabs and book marks -
| but let's me move from rendering in Chrome or Gecko or Servo or
| whatever.
| sorenjan wrote:
| There used to be an extension for Firefox called "IE Tab for
| Firefox" that used the IE rendering engine inside a Firefox
| tab, for sites that only worked in IE.
| hyzyla wrote:
| The same idea with built in Internet Explorer in Microsoft
| Edge, where you can switch to Internet Explorer mode and
| open website that only correctly works in Internet Exlorer
| burntcaramel wrote:
| This is great! I'm curious about the accessibility tree noted in
| the unsupported-for-now APIs. Accessing the accessibility tree
| was something that was in Playwright for the big 3 engines but
| got removed about a year ago. I think it was partly because as
| noted it was a dump of engine-specific internal data structures:
| "page.accessibility.snapshot returns a dump of the Chromium
| accessibility tree".
|
| I'd like to advocate for more focus on these accessibility trees.
| They are a distillation of every semantic element on the page,
| which makes them fantastic for snapshot "tests" or BDD tests.
|
| My dream would be these accessibility trees one day become
| standardized across the major browser engines. And perhaps from a
| web dev point-of-view accessible from the other layers like CSS
| and DOM.
___________________________________________________________________
(page generated 2024-08-07 23:00 UTC)