[HN Gopher] Show HN: An API that takes a URL and returns a file ...
___________________________________________________________________
Show HN: An API that takes a URL and returns a file with browser
screenshots
Author : gkamer8
Score : 84 points
Date : 2025-02-06 18:48 UTC (4 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tantaman wrote:
| us ai?
| ge96 wrote:
| the very same
| bangaladore wrote:
| The website [1] is very strange. What does U.S. stand for? If I
| were to stumble on this I'd assume it was a fishing / scam
| website trying to impersonate the government. Bad vibes all
| around.
|
| [1] - https://us.ai/
| gkamer8 wrote:
| I'll try to improve the vibes :(
|
| I've been working at this startup for almost two years now
| and that page and branding etc has been changing a lot as you
| can imagine ...
| bangaladore wrote:
| But what is the branding?
|
| United States AI?
|
| Like the premise of the company name is bad. Real bad.
| bbor wrote:
| A) Thanks for sharing your OSS with the world!!
|
| B) I'm also a little confused. Surely that domain cost(s)
| $$$ -- why not go with a cute "us" branding rather than
| "U.S."? Unless you're looking to sell in other countries
| where maybe U.S. expertise is a selling point, this
| definitely comes across like you're pretending to be part
| of the government.
|
| EDIT: For comparison, we.ai costs $500,000/y (!!!)
|
| EDIT2: It looks like you're positioning yourself as a
| defense/govt contractor, thus the branding? That's
| certainly cool, but IMHO, if I were you and owned that
| domain, I'd offer it to Palantir for $$$$$ and just go with
| your second choice. They're currently starting in on a
| whole genocide/global war thing, so they have cash to burn!
| gkamer8 wrote:
| Hi thanks! The domain actually used to be a redirect link
| to U.S. Automotive Industries (a trade publication). I
| reached out to them and got a deal, so it was a lot for
| me but not, like, we.ai expensive lol.
|
| The name was always a corporate placeholder and I liked
| the idea of US Steel or General Electric type names. Some
| startups have done similar things, and many people
| actually like the name a ton. But I know it's
| controversial and so any products I made have their own
| names and branding that's pretty separate (see: Abbey).
|
| Over the past few months I've gone the gov contracting
| route and the name actually made some sense, so I've used
| it raw. Still, the plan is to get a DBA in the near
| future and switch it up. Thanks for the advice!
| tolerance wrote:
| The similarities with WhiteHouse.gov's design can't be much
| help either, I imagine.
| bangaladore wrote:
| I thought the same, but I didn't double check so I did not
| mention it.
| xp84 wrote:
| Just a totally normal domain for an Anguillan perspective on
| all things America
| johnmaguire wrote:
| Unrelated, but I continue to be confused by "ClaudeMind" a
| JetBrains Plugin by "73signals":
| https://plugins.jetbrains.com/plugin/25082-claudemind
|
| Their website doesn't even mention 73signals:
| https://claudemind.com/
|
| Surely Anthropic must have an issue with this use of their
| trademark? And 73signals seems so similar to 37signals as to
| be intentional.
| wildzzz wrote:
| It's one guy running his little AI startup fresh out of
| college. Claims to be a former national security analyst but
| makes no such claim on his LinkedIn.
| gkamer8 wrote:
| Thanks for the catch on my LinkedIn, I really should have
| that there now. It was originally something I kept private.
| xnx wrote:
| For anyone who might not be aware, Chrome also has the ability to
| save screenshots from the command line using: chrome --headless
| --screenshot="path/to/save/screenshot.png" --disable-gpu
| --window-size=1280,720 "https://www.example.com"
| martinbaun wrote:
| Oh man, I needed this so many times didn't even think of doing
| it like this. I tried using Selenium and all different external
| services. Thank you!
|
| Works in chromium as well.
| azhenley wrote:
| Very nice, I didn't know this. I used pyppeteer and selenium
| for this previously which seemed excessive.
| Onavo wrote:
| What features won't work without GPU?
| xnx wrote:
| LMGTFY (Let Me Gemini That For You) :-)
| https://gemini.google.com/share/e9428bb57a22
|
| I've used that when running unattended batches, but worth
| trying with and without disabling to see what is best for
| your use case.
| dingnuts wrote:
| oh good an AI summary with none of the facts checked,
| literally more useless than the old lmgtfy and somehow more
| rude
|
| "here's some output that looks relevant to your question
| but I couldn't even be arsed to look any of it up, or copy
| paste it, or confirm its validity"
| kylecazar wrote:
| This flag isn't valid anymore in the new chrome headless.
| Disable GPU doesn't exist unless your on the old version (and
| then, it was meant as a workaround for Windows users only).
|
| I've used this via selenium not too long ago
| cmgriffing wrote:
| Quick note: when trying to do full page screenshots, Chrome
| does a screenshot of the current view, then scrolls and does
| another screenshot. This can cause some interesting artifacts
| when rendering pages with scroll behaviors.
|
| Firefox does a proper full page screenshot and even allows you
| to set a higher DPS value. I use this a lot when making video
| content.
|
| Check out some of the args in FF using: `:screenshot --help`
| input_sh wrote:
| Firefox equivalent: firefox -screenshot
| file.png https://example.com --window-size=1280,720
|
| A bit annoyingly, it won't work if you have Firefox already
| open.
| cmgriffing wrote:
| LOL, you and I posted very similar replies at the same time.
| blueflow wrote:
| > it won't work if you have Firefox already open
|
| now try and go ahead how you could isolate these instances so
| they cannot see each other. this leads into a rabbit hole of
| bad design.
| UnlockedSecrets wrote:
| Does it work if you use a different profile with -p?
| paulryanrogers wrote:
| Maybe with --no-remote
| amelius wrote:
| > A bit annoyingly, it won't work if you have Firefox already
| open.
|
| I hate it when applications do this.
| aspeckt-112 wrote:
| I'm looking forward to giving this a go. Great idea!
| manmal wrote:
| Being a bit frustrated with Linkwarden's resource usage, I've
| thought about making my own self hosted bookmarking service. This
| could be a low effort way of loading screenshots for these links,
| very cool! It'll be interesting how many concurrent requests this
| can process.
| synthomat wrote:
| That's nice and everything but what to do about the EU cookie
| banners? Does hosting outside of the EU help?
| gkamer8 wrote:
| Yeah the EU cookie banners are annoying, I'm hoping to do some
| automation to click out of them before taking the screenshots
| cjr wrote:
| There are browser extensions you could run like consent-o-
| matic to try to click and hide the cookies from your
| screenshots:
|
| https://chromewebstore.google.com/detail/consent-o-
| matic/mdj...
|
| Otherwise using a combination of well-known class names,
| 'accept' strings, and heuristics such as z-index, position:
| fixed/sticky etc can also narrow down the number of likely
| elements that could be modals/banners.
|
| You could also ask a vision model whether a screenshot has a
| cookie banner, and ask for co-ordinates to remove it,
| although this could get expensive at scale!
| gkamer8 wrote:
| Thanks, that's a great idea! I was originally going to go
| the vision model route because I'd also like people to be
| able to send instructions to sign in with some credentials
| (like when visiting the nytimes or something).
| artur_makly wrote:
| yeah that's what we basically did here at
| https://VisualSitemaps.com, but it can also be quickly
| become over-the-top, and you may end up removing important
| content. That's why in the end we added a second option to
| just manually enter CSS classes.
| cess11 wrote:
| No. Tell the services you're using to stop with the malicious
| compliance.
| busymom0 wrote:
| Would recommend using SeleniumBase's CDP mode to search for
| those substrings, click accept on those cookie banners and then
| take screenshot.
| quink wrote:
| > SCREENSHOT_JPEG_QUALITY
|
| Not two words that should be near each other, and JPEG is the
| only option.
|
| Almost like it's designed to nerd-snipe someone into a PR to
| change the format based on Accept headers.
| gkamer8 wrote:
| > Almost like it's designed to nerd-snipe someone into a PR to
| change the format based on Accept headers
|
| pls
| mpetrovich wrote:
| Reminds me of this open source library I wrote to do the same
| thing: https://github.com/nextbigsoundinc/imagely
|
| It uses puppeteer and chrome headless behind the scenes.
| joshstrange wrote:
| This is cool but at this point MCP is the clear choice for
| exposing tools to LLMs, I'm sure someone will write a wrapper
| around this to provide the same functionality as an MCP-SSE
| server.
|
| I want to try this out though and see how I like it compared to
| the MCP Puppeteer I'm using now (which does a great job of
| visiting pages, taking screenshots, interacting with the page,
| etc).
| jot wrote:
| If you're worried about the security risks, edge cases,
| maintenance pain and scaling challenges of self hosting there are
| various solid hosted alternatives:
|
| - https://browserless.io - low level browser control
|
| - https://scrapingbee.com - scraping specialists
|
| - https://urlbox.com - screenshot specialists*
|
| They're all profitable and have been around for years so you can
| depend on the businesses and the tech.
|
| * Disclosure: I work on this one and was a customer before I
| joined the team.
| edm0nd wrote:
| https://www.scraperapi.com/ is good too. Been using them to
| scrape via their API on websites that have a lot of captchas or
| anti scraping tech like DataDome.
| rustdeveloper wrote:
| Happy to suggest another web scraping API alternative I rely
| on: https://scrapingfish.com
| bbor wrote:
| Do these services respect norobot manifests? Isn't this all
| kinda... illegal...? Or at least non-consensual?
| basilgohar wrote:
| robots.txt isn't legally binding. I am interested to know if
| and how services even interact with it. It's more like a clue
| on when the interesting content for scrapers is on your site.
| This is how I imagine it goes:
|
| "Hey, don't scrape the data here."
|
| "You know what? I'm scrape it even harder!"
| morbusfonticuli wrote:
| Similar project: gowitness [1].
|
| A really cool tool i recently discovered. Next to scraping and
| performing screenshots of websites and saving it in multiple
| formats (including sqlite3), it can grab and save the headers,
| console logs & cookies and has a super cool web GUI to access all
| data and compare e.g the different records.
|
| I'm planning to build my personal archive.org/waybackmachine-like
| web-log tool via gowitness in the not-so-distant future.
|
| [1] https://github.com/sensepost/gowitness
| westurner wrote:
| simonw/shot-scraper has a number of cli args, a GitHub actions
| repo template, and docs: https://shot-
| scraper.datasette.io/en/stable/
|
| From https://news.ycombinator.com/item?id=30681242 :
|
| > _Awesome Visual Regression Testing > lists quite a few tools
| and online services: https://github.com/mojoaxel/awesome-
| regression-testing_
|
| > _" visual-regression": https://github.com/topics/visual-
| regression _
___________________________________________________________________
(page generated 2025-02-06 23:00 UTC)