[HN Gopher] Helium: Lighter Web Automation with Python
___________________________________________________________________
Helium: Lighter Web Automation with Python
Author : mherrmann
Score : 135 points
Date : 2024-12-11 12:11 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| fermigier wrote:
| "We shut down the company at the end of 2019 and I felt it would
| be a shame if Helium simply disappeared from the face of the
| earth."
|
| I appreciate the effort. Thank you M. Hermann.
| wslh wrote:
| How does it compare with the "usual suspects"? I mean Playwright,
| Selenium, Cypress, and Puppeteer.
| mherrmann wrote:
| It's more high-level. Instead of saying "click element with ID
| xv9873", you can say "click Download".
| Yossarrian22 wrote:
| That's how Playwright works too
| mherrmann wrote:
| Doesn't work for logging into HN: from
| playwright.sync_api import sync_playwright
| playwright = sync_playwright().start() browser =
| playwright.chromium.launch() page =
| browser.new_page()
| page.goto('https://news.ycombinator.com/login?goto=news')
| page.get_by_label('username').fill('mherrmann') #
| playwright._impl._errors.TimeoutError: Locator.fill:
| Timeout 30000ms exceeded.
|
| I suspect Playwright expects there to be a <label> for an
| <input> element.
|
| It does work with Helium: from helium
| import * start_chrome('https://news.ycombinator.com
| /login?goto=news') write('mherrmann',
| into='username')
|
| The two scripts are equivalent, except Helium's works and
| is half as long.
| mdaniel wrote:
| Depending on when you tried that, there is no such label
| "username" in view-
| source:https://news.ycombinator.com/login?goto=news
| <table border="0"><tr><td>username:</td><td><input
| type="text" name="acct" size="20" autocorrect="off"
| spellcheck="false" autocapitalize="off" autofocus="true">
| </td></tr><tr><td>password:</td><td><input
| type="password" name="pw"
| size="20"></td></tr></table><br>
|
| as they only put _text_ username not <label> and the
| input is named "acct" (without even the common decency to
| include autocomplete=username)
|
| So _if_ your script really did write that string into a
| what it thinks is 'username' then that's arguably one
| more thing to debug when its wizardry goes awry in some
| unknown way
| mherrmann wrote:
| Yes, there is no such <label>. There is only such <td>.
| And Playwright isn't smart enough to understand that that
| is still the label for the <input> element. Helium _is_
| smart enough.
|
| _So if your script really did write that string into a
| what it thinks is 'username' then that's arguably one
| more thing to debug when its wizardry goes awry in some
| unknown way_
|
| I tested my script. It writes into the correct field. The
| logic is not hard: "Find an element to the right of the
| given label." If there are multiple, then Helium uses the
| one that's closest to the last element it interacted
| with. That's just how a human would do it. It works
| surprisingly well. In many years of using Helium, I
| barely recall this causing problems once.
|
| Try it before judging. You will be surprised by how well
| it works.
| mdaniel wrote:
| > And Playwright isn't smart enough to understand that
| that is still the label for the <input> element. Helium
| is smart enough.
|
| I'm glad you like your project, and I'm sure there are
| others who will similarly enjoy that kind of magick.
| However, it's super disingenuous to write an example that
| asks a standards based API to find a non-existent element
| and then clutch pearls because it didn't find a non-
| existent element. page.get_by_label("I dunno, I didn't
| read, do what I am thinking").fill('lol') similarly would
| not work but that's not the awesome dunk you think it is
| mherrmann wrote:
| It's not disingenuous. The HN example was literally the
| first one I tried. It's what happens in the real world.
| The real world doesn't adhere to standards, much of the
| time.
| bdcravens wrote:
| Just like what you've done with Selenium, such a wrapper
| could be written for Playwright (I think that's what most
| developers end up doing anyways, just in a more domain-
| specific manner)
| TimTheTinker wrote:
| This project was started _long_ before Playwright
| existed.
|
| I's an OSS tool that had very good reason to be made the
| way it was at that time, and continues to be useful (in
| my opinion).
| bdcravens wrote:
| It is a useful tool, similar to others that accomplish
| the same task (WATIR and Capybara in Ruby, for example).
| My point was that the comparison of a wrapper to an
| underlying library is a bit apples to oranges, as a
| similar wrapper could be written for Playwright as well.
| I haven't looked at the code, but I assume Helium's API
| could be used to support Playwright (which itself was an
| evolution of Puppeteer).
| vunderba wrote:
| I like what you're doing with Helium, and while you are
| technically correct that it's half as long - IMHO it's a
| bit disingenuous considering that in any _meaningful_ web
| automation script, you 'd only need to put in the
| initialization code a single time, e.g.:
| from playwright.sync_api import sync_playwright
| playwright = sync_playwright().start() browser =
| playwright.chromium.launch() page =
| browser.new_page()
| languagehacker wrote:
| Importing * is universally discouraged by most Python linters and
| best practice docs. You can always "import helium as h" if you're
| looking to type less.
|
| This looks largely like common workarounds that most people will
| write using Python-based browser automation. Most of the time, we
| accept that those capabilities aren't there by default because
| they are not explicit enough and can result in bugs and undefined
| behavior even when the elements that we expect to be on the page
| are actually there.
|
| Given the adage "explicit is better than implicit", I worry that
| a layer like this might create more trouble than it's worth for
| the sake of readability. When we get into the nitty-gritty of
| browser automation, it might just make it harder to debug than
| going straight to Selenium or Playwright.
| mherrmann wrote:
| _Importing * is universally discouraged by most Python linters
| and best practice docs._
|
| Yup, I would never do it in a .py file. But I do it all of the
| time in the interpreter, which is what the video shows.
|
| _This looks largely like common workarounds that most people
| will write using Python-based browser automation. Most of the
| time, we accept that those capabilities aren 't there by
| default because they are not explicit enough and can result in
| bugs and undefined behavior even when the elements that we
| expect to be on the page are actually there._
|
| It sounds like you haven't tried Helium yet. I think you
| should, and see for yourself whether the trade-off you talk
| about actually exists.
|
| _Given the adage "explicit is better than implicit", I worry
| that a layer like this might create more trouble than it's
| worth for the sake of readability._
|
| You could make the same argument about using C / assembly
| instead of Python. I suggest you try Helium before making
| statements about the "trouble" it may create. I believe you
| will find that there is no trouble.
| quickvi wrote:
| for lightweight automation outside the browser:
|
| https://github.com/elyase/screenium
| okso wrote:
| macOS only (uses Apple Vision framework)
| erikcw wrote:
| I've used SikuliX[0] in the past for similar purposes.
| Unfortunately the author hasn't had much time to maintain it
| recently.
|
| [0] https://github.com/RaiMan/SikuliX1
| oulipo wrote:
| Very cool! Could be a kind of open-source, text-based (eg
| recipes are .md with instructions) version of KeyboardMaestro!
|
| I'd love to see such an "open automation" format (could even be
| more general than pure software, could also automate your IoT
| or whatever, through extensions)
|
| eg you could have a file "Type my bank login password" for bank
| websites which doesn't let you use keyboard input but force you
| to click on stuff, like a self-documented script using .md with
| code # Type my bank login password
| ## Trigger ```trigger:hotkey key: cmd+l
| filter: frontmost-app=Chrome and
| chrome.tab.url=~mybank.com/login ```
| ## Deps ```ensure-deps shell-runner>=1.*
| screen-ocr>=1.* python-runner>=1.* ```
| Ensure that my system has the proper extensions for the
| framework, to run all tasks ## What it does
| This automation lets me input my password in a "click-only"
| input for my lousy bank UI ```run:shell
| /bin/sh:capture-output=password echo $(op --vault
| personal --site mybank) ``` (the above runs the
| shell script and captures the output as a "password" variable I
| can use in other scripts below) ```run:screen-
| ocr:capture-output=ocr-result window:chrome ```
| ...go on scripting using typescript/python to locate the
| numbers in the ocr-result
| nkrisc wrote:
| Having done some ad-hoc, temporary automation with Selenium in
| the past (to help fellow, less technically-inclined designers) I
| wish I had this at the time.
|
| Looks like a nice, almost natural language-like API around what
| is otherwise a quite cumbersome API.
| crazymoka wrote:
| Can it be headless?
| mherrmann wrote:
| Yes: start_chrome(headless=True)
| __mharrison__ wrote:
| Thanks for posting. All this AI has been interested in scraping
| personal sites.
| mherrmann wrote:
| I have actually been wondering whether Helium's more high-level
| API lends itself well for use by AI.
| grantc wrote:
| This. Seems like you could wedge this and a model into a
| scrappy version of computer use for browsers.
|
| Fwiw, thanks for contributing this. It seems apt for a number
| of repetitive things I probably do dozens of times a week and
| don't even notice as cruft anymore.
|
| I'm not sure why there were such hot takes on what this is or
| isn't. Maybe Big Selenium crisis actors? You made something
| cool, you shared it w/ world -- that should be the system
| prompt for people posting about it in my kinder world of
| things.
| wokwokwok wrote:
| How can a wrapper around selenium be lighter than it?
|
| A wrapper around an API is by definition heavier (more code, more
| functions) than using the lower level api.
|
| It's not using less resources.
|
| It's not faster (it has implicit waiting).
|
| It's not less code; it's literally a superset of selenium?
|
| Feels like a "selenium framework" is more accurate than light
| weight web automation?
|
| Anyway, there's no fixing automation tests with fancy APIs.
|
| No matter what you try to do, if people are only interested in
| writing quick dirty scripts, you're doomed to a pile of stupid
| spaghetti no matter what system or framework you have.
|
| If you want sustainable automation, you have to do Real Software
| Engineering and write actual composable modules; and you can do
| that in anything, even raw selenium.
|
| So... I'd be more interested if this was pitched as "composable
| lego for building automation" ...
|
| ...but, personally, as it stands all I can really see is "makes
| easy things easier with sensible defaults".
|
| That's nice for getting started; but getting started is not the
| problem with automation tests.
|
| It's maintaining them.
| mherrmann wrote:
| Its use can be lighter. That is, the wrapper can be easier to
| use.
|
| Helium helps with maintaining automation tests as well. _click(
| "Compose")_ is infinitely more maintainable than
| _document.getElementById( "eIu7Db").click()_. (I just took this
| example from Gmail's web interface.)
| wokwokwok wrote:
| How do you compose low level operations like "click here"
| into composable modules like:
|
| loginAsUser(user)
|
| id = createBooking(user)
|
| loginAsAdmin()
|
| approveBooking(id)
|
| ?
|
| Is it the same as selenium? Do whatever you want your self?
|
| That's what I'm talking about. Unless you have high level
| composable modules that let you express high level test
| activities then your tests will always fall apart.
|
| The syntax of the _low level operations_ doesn't matter
| because you will never ever care about a click("compose").
|
| That's not a test.
|
| A test might be:
|
| createEmail()
|
| attachFile(...)
|
| ... whatever your bespoke business requirements are.
|
| Having fancy wrappers?
|
| Is it nicer? Sure.
|
| Does it meaningfully improve the tests, maintaining tests?
|
| Nope.
|
| Because at the end of the day the low level operations _will_
| be bespoke, nasty, messy and different for each website;
| that's why you wrap them up in functions and _compose them_.
|
| At least, in my experience; this looks a lot like cypress; a
| high level set of operations with sensible defaults for easy
| tasks.
|
| ...but, practically, I'm skeptical that hiding the low level
| nasty details actually makes them go away; it's smoothing
| them over for the "happy path"; but automation tests are like
| 90% edge cases.
|
| > It's use can be lighter
|
| I don't think that's the generally accepted meaning of a
| light weight framework.
|
| ...but eh, fair enough. I understand what you mean.
| mherrmann wrote:
| > I'm skeptical that hiding the low level nasty details
| actually makes them go away
|
| It makes 90+% of them go away. That's a big win. Try it.
| Zardoz84 wrote:
| what you are asking is GEB
| n144q wrote:
| That's just some superficial changes that often lead to
| confusion and other negative consequences down the road,
| especially when not handled carefully.
|
| I would much rather directly rely on Selenium's stable APIs
| than someone else's wrapped APIs that is opionated and could
| be incomplete, incorrect, outdated and potentially
| unmaintained someday. There are always much more resources
| put into Selenium than these add-ons.
|
| If I really want, I can choose a few APIs that I actually use
| and wrap them within my codebase. That's more reliable than
| this.
| mherrmann wrote:
| You can freely mix Helium and Selenium API calls. You don't
| lose any of the power you are describing when you use
| Helium.
| pryelluw wrote:
| > but, personally, as it stands all I can really see is "makes
| easy things easier with sensible defaults".
|
| "Lighter" may be used as an alternative adjective to the word
| easy or easier. Your post, which comes off as very rude, misses
| the point of how the project is marketed.
|
| At least the OP did not call it Python automation for humans
| ...
| edm0nd wrote:
| Very neat!
|
| Rolling in a captcha solving service like DeathByCaptcha or
| AntiCaptcha and you got yourself a quick and easy script that can
| do anything on any website regardless of captchas.
| bg24 wrote:
| Nice work! I looked at the cheatsheet, and it is not obvious to
| me how to go through two factor authentication during login.
| mherrmann wrote:
| Thanks! Helium only automates browsers. If the 2FA is happening
| in the browser, then you can use Helium to automate the flow.
| If it's outside, then that part cannot be handled by Helium.
| slt2021 wrote:
| Thank you for sharing this project, this is really good
| giis wrote:
| Looks nice. Is it possible start_chrome() with specific chrome
| browser profile name or re-use existing open firefox/chrome
| browser session and launch a new tab with specific domain?
| mherrmann wrote:
| I don't know. Please check if Selenium supports this and if
| yes, use Helium's _set_driver(...)_ or _options_ argument to
| _start_chrome(...)_.
| bilater wrote:
| Nice - I can see some cool agentic flows created using this. A
| thing I want to look into is creating a sandbox instance
| (Ubuntu?) and letting an agent do its thing. Could be collecting
| data or answering questions and I can pull up the window to check
| in from time to time. It'll be like having an assistant.
| Byte64 wrote:
| This is so cool
| bryanrasmussen wrote:
| How easy is it to detect that this is automation as opposed to a
| real user? I suppose probably pretty easy, so not sure if it is
| useful if I want to automate the web for things I do every day as
| I would really be running the risk of turning off access to those
| things if they determined I am automating them.
| bdcravens wrote:
| This is a wrapper on top of Selenium, so unless the library
| implements additional techniques to improve stealth, it's on
| par with Selenium's detectability (which as you pointed out can
| be detected easily enough)
___________________________________________________________________
(page generated 2024-12-11 23:01 UTC)