[HN Gopher] Forking Chrome to turn HTML into SVG
___________________________________________________________________
Forking Chrome to turn HTML into SVG
Author : fathyb
Score : 222 points
Date : 2022-11-13 17:06 UTC (5 hours ago)
(HTM) web link (fathy.fr)
(TXT) w3m dump (fathy.fr)
| lifthrasiir wrote:
| > What if we could also vectorize 2D <canvas> elements controlled
| by JavaScript? Turns out, Chromium has this capability built-in
| for printing:
|
| I'm very surprised to hear this. So printing, either to PDF or to
| actual printers, may reveal more information about what was drawn
| to the canvas than normal display, especially if no effort has
| been made to remove overdrawn paint records. That can have an
| interesting, if only hypothetical, consequence...
| tyingq wrote:
| Pretty sure canvas.toDataURL() is or was a fingerprinting
| method.
| andybak wrote:
| Being fair, if anyone is sending anything to the client and
| assumes it's not visible then they are fair game.
|
| I just hope the code around password entry fields is carefully
| audited. That's all on the client.
| kevincox wrote:
| Yes, but there is more than that there. What if I as the
| client try to print a page or export to PDF. I think that
| there is nothing sensitive visible on the page so I share the
| result. It turns out that there was actually sensitive info
| in the canvas that was not visible due to something like
| overdraw.
|
| As a simple example imagine that an image is drawn to the
| canvas and then blacked out. You wouldn't expect that the
| saved PDF may contain those as separate layers.
|
| Of course this highlights an existing issue with complex
| formats. You need to be very careful before sharing complex
| documents.
| version_five wrote:
| There's a story from years ago (I couldn't find it) about
| some government or legal documents having info redacted, but
| whoever did it just used some pdf editing tool to draw black
| boxes over the redacted parts, so all the info was still in
| the pdfs.
|
| Edit: I found this but I'm not sure it's the one I'm
| remembering: https://www.techdirt.com/2014/01/28/new-york-
| times-suffers-r...
| perth wrote:
| This happened with the Maxwell redacted court docs
| mk_stjames wrote:
| This is exactly the case. I've done conversions before where it
| was possible to see and extract underlying, hidden elements,
| that were not visible or even detectable in the rendered
| webpage in a browser.
|
| This is actually a somewhat common method when it comes to a
| bit of corporate sleuthing.. anytime you see a pretty website
| with vector-y graphics, maybe engineering-drawing
| representations.. if the data hasn't been stripped completely
| or redrawn you can extract information that otherwise people
| would assume unknowable.
|
| In a recent example... I did this on a startup company's page
| involving a product where they had a CAD-like side view drawing
| of one of their products... but the base file (in this case it
| was an SVG) driving the page actually contained multiple hidden
| views of the same product and other products and at the 'real'
| precision of what likely was a DXF export from a CAD program,
| given to the web team. This allowed a critical dimension of an
| unannounced product to be precisely determined (to three
| significant figures) which was a spec that had not been
| publicly released...
| aidos wrote:
| Totally. We see architectural drawings that go through a number
| of revisions and it's not uncommon for designers to simply
| cover a whole section with a white box and then draw on top of
| it.
|
| Also within PDFs (and svgs) you normally clip the area you're
| going to draw into to bound it (sort of like overflow:hidden)
| and anything outside of that doesn't display, but it's still
| there and accessible.
|
| I marvel more at the fact that software is capable of figuring
| out all the occlusions so you can print the stuff on a plotter.
| Cad drawings have up to 2M individual vectors in them. Its
| impressive that it works at all to be honest.
| [deleted]
| steren wrote:
| Much cleaner than my hack of Chrome -> PDF -> Inkscape -> SVG
|
| https://labs.steren.fr/2020/05/08/screenshot-as-svg/
| cjr wrote:
| ha, I'm also guilty of using this method on https://urlbox.io
| to power our SVG screenshots.
|
| To be honest, it works quite well, but there are quite a few
| bugs in chromium's pdf rendering, especially when it comes to
| determining the correct page width to apply media queries to,
| which sometimes affects the accuracy of these SVG's.
| SigmundA wrote:
| Reminds me of https://github.com/gliffy/canvas2svg at a different
| level of abstraction.
|
| I believe PDF.js incorporated some form canvas2svg to try and get
| a SVG backend working which would allow high resolution printing
| to PDF but not sure where that's at. I believe printing through
| PDF.js is blurry due to memory constraints since with normal
| canvas pdf pages just end up as bitmaps sent to the printer.
|
| SVG ends up staying vector through Chromiums print pipeline
| resulting in much less memory usage while having much higher dpi
| final output. I would imagine this is due to SVG being turned
| into Skia drawing commands that end up as PDF that then gets
| printed through PDFium?
| pornel wrote:
| It'd be wonderful if this could be integrated into the browser
| and the OS to provide SVG screenshots.
| hedora wrote:
| I'd love to see some sort of caching proxy that did this for news
| stories, etc.
|
| Basically, convert everything to an archival format, then I'll
| browse the archive instead of whatever adversarial server side /
| javascript junk the site is serving.
| bawolff wrote:
| Well both pdf and svg support javascript (albeit pdf is
| extremely limited)
| ccouzens wrote:
| If you could proxy the page to SVG without Javascript, couldn't
| you also proxy the page to HTML without Javascript?
|
| Either way, you'd probably want your proxy to wait to for any
| onload Javascript to run before snapshotting the page.
| metayrnc wrote:
| Can someone give some example usecases? I am curious as to how
| this is used. Thank you.
| commotionfever wrote:
| if it works how i think it does, this could be really nice to
| cook up some infographics in a css framework like tailwind.
| then make some svgs for a github readme
|
| for example i made this one[1] with tailwind but i just ended
| up taking a png screenshot
|
| [1]
| https://github.com/sentriz/socr/blob/master/.github/socr.png
| danielvaughn wrote:
| IMO the use case is limited but interesting. The most obvious
| would be product screenshots for landing pages, although
| typically design tools handle that well enough.
|
| I'm currently building a web-app for building web pages, and
| I'd love for the user to be able to view a thumbnail gallery of
| all the pages they've built. This tool would allow me to build
| a zooming feature pretty easily.
|
| Outside of those two, I'd imagine the use cases are fairly
| limited.
| btown wrote:
| The thing about having access to the Skia render graph is
| that all of a sudden you're no longer limited to product
| screenshots and screen recordings. Imagine a pipeline where
| you can export someone's interaction session with a site,
| pixel-perfect, into DaVinci Resolve or Blender or Unity as a
| fully annotated DOM-advised render node hierarchy, with
| consistent node identities over time, of every rendered
| element on the page as it changes across frames. That's _way_
| more powerful than just pixels.
|
| Imagine flying through your site in 3D (or even VR) with full
| control over timing, being able to explode and un-explode
| your DOM elements as they transition into being - the type of
| thing that only Apple would do for their WWDC demos with
| dedicated visualization teams.
|
| The start is to be able to see the rendering engine as a
| generator for not just raster data over time, but vector data
| over time. Of course, there's a lot of work to do from there,
| but this is the core leap.
| Scalene2 wrote:
| Great for screenshots to render in a video.
| convolvatron wrote:
| if we can reduce the size of the basis footprint for a browser
| implementation, we can more easily produce new browsers (i.e by
| implementing a fully general Path, and font rendering)
| mrkramer wrote:
| That would be cool if actually converting HTML to SVG would
| save you bandwidth and all the rest that goes with web
| requests. Imagine a web browser that only supports SVG and
| converts all HTML to SVG then when browsing the web you would
| only look at screenshots of websites and webpages. This would
| be something like read-only browser. It is already
| possible[0] tho but it is not enabled by default on Chrome
| nor it is exclusive feature.
|
| [0] https://frankgroeneveld.nl/2021/08/24/most-underused-
| browser...
| GranPC wrote:
| I do something similar - but using the Print command and
| converting the PDF to SVG - to import websites into Blender for
| flashy animations. This allows me to neatly animate things
| in/out, and zoom into details without pixelation.
| yvoschaap wrote:
| I tried something similar like this to render thumbnail of
| websites (at a very small file-size). E.g.
| https://twitter.com/yvoschaap/status/1446397003316047872
| mrkramer wrote:
| Isn't this something like archive.ph is doing? Snapshotting
| and screenshotting websites. I'm referring both to you and
| the op.
| dj_gitmo wrote:
| https://archive.ph/1NNZr
|
| That looks like a Web Archive (WARC) and a PNG screen shot.
| I think you can make a screenshot with CasperJS. The WARC
| can be created by wget.
| mrkramer wrote:
| Yea you are right about PNG but wrong about WARC.
| Archive.ph doesn't use WARC.
| dredmorbius wrote:
| What does it use, if you know?
|
| Source?
| mrkramer wrote:
| Their FAQ says: https://archive.ph/faq#:~:text=Which%20pa
| rts%20of,of%201024x....
|
| Wikipedia says: https://en.wikipedia.org/wiki/Archive.tod
| ay#:~:text=Web%20pa....
|
| So I assumed they doesn't use it but idk for sure.
| marginalia_nu wrote:
| How small are you getting them? I'm straight up
| screenshotting websites (e.g.
| https://search.marginalia.nu/screenshot/245804). Seem to come
| in at on average 17 Kb, based on a sample size of 550K
| screenshots.
| codetrotter wrote:
| That is super neat! Did you end up having any
| users/customers?
| mk_stjames wrote:
| I've done this for a project long ago, incredibly lazily, by
| using chrome/chromium to PDF and piping to a PDF to SVG tool.
| There are a few PDF to SVG pathways, I remember it using Cairo
| and the whole thing was quick and consistent.
| crazygringo wrote:
| That was my first thought as well.
|
| I'm genuinely curious if there are any advantages in
| Chrome->SVG as opposed to Chrome->PDF->SVG.
|
| Are there any graphical effects (e.g. produced by CSS, like
| blurry text shadows or something) that PDF can't render without
| falling back to bitmap but SVG can?
|
| Or is there other data that SVG usefully preserves that PDF
| discards, such as actual source text strings used for text? (As
| opposed to PDF where getting text out, e.g. when copying to
| clipboard, usually involves a lot of ugly "reverse
| engineering".)
| femto113 wrote:
| I think the path is more clearly thought of as HTML+CSS ->
| display list -> *. The display list is some abstract
| definition of what needs to be drawn by a renderer. In theory
| anything that fully describes all possible operations works,
| including bespoke things like SkPicture or general purpose
| graphical languages like SVG or PostScript. In practice
| there's never a single language that can describe everything,
| because display capabilities evolve and new operations are
| added all the time (e.g. advanced typography features for
| fonts). PDF can cover a really broad set of use cases, but it
| also wasn't designed as an intermediate format (it was
| closely tied to the PDF reader) so it's easier to get into
| than out of. SVG is possibly a better candidate, as it is
| already used effectively as an intermediate representation
| (e.g. D3.js "renders" to SVG).
| aidos wrote:
| Neither pdf or svg do text layout. They're both pretty
| similar really, though the pdf spec is really deep and broad
| to cater for a million things.
|
| My advice to everyone re pdfs is to crack them open by
| running `mutool clean -d file.pdf` And opening in a text
| editor. They're just a tree (well, graph, I guess) of obvious
| objects.
|
| Ps: mutool convert does a good job of converting from pdf to
| svg in a fairly faithful way.
| DrewADesign wrote:
| Do you mean there's no _dynamic_ text layout? Svg and pdf
| have perfect text placement capability, but I 've never
| even looked to see if it supports defining broadly
| applicable rules for text presentation.
| aidos wrote:
| I mean there's no layout engine to do things like
| wrapping and line height etc. Everything is explicitly
| positioned.
|
| PDF seems a bit more bonkers because you render text as
| strings of glyphs and the conversion back to text is an
| afterthought. There's a ToUnicode map that says "glyph 8
| in the embedded font is an 'X'" but that's there for copy
| pasting / searching - not for rendering. PDFs are built
| to render glyphs at positions.
|
| Edit: to go full meta, there are Type3 fonts where each
| glyph itself is defined as a PDF graphics stream. Which
| actually leads you in to what's inside a font. Guess
| what? lots of them look just like PDFs inside, because
| the glyphs are defined in postscript. Fonts are PDFs
| kinda grew up together, and once you start digging into
| them the similarities are striking.
| ccouzens wrote:
| If you print to PDF, you'll have the page's print css
| applied. And it is probably paginated.
|
| If you go direct to SVG the capture will use the screen css
| and not be paginated.
| mk_stjames wrote:
| Yes, this. In the project I was doing, using chromium as a
| command-line interface I remember having options to do the
| pagination to a custom resolution, which I used to define a
| render 'window' as if the browser screen was on something
| like a 1600x18000 monitor. so I had the entire webpage
| displayed like a full scroll without page breaking like it
| would have if you just printed a PDF from Chrome. And this
| allowed me to then extract this giant full length vector
| graphics result of diagrams and text into a single SVG that
| was perfectly spaced and rendered in the aspect ratio I
| wanted.
| cjr wrote:
| It's also possible to emulate screen media queries[0] so
| that the pdf output uses the regular screen css.
|
| [0] https://chromedevtools.github.io/devtools-
| protocol/tot/Emula...
| aidos wrote:
| I've been down a bit of this rabbit hole before. We work with
| PDFs, svgs, fonts and chromium too. While I don't have any need
| for this tool itself, I'd highly recommend flicking through this
| article as a nice overview of the graphics / font pipeline.
| bscphil wrote:
| Semi-related, there's this browser extension that somehow manages
| to mangle HTML into SVG with pretty good accuracy.
| https://addons.mozilla.org/en-US/firefox/addon/svg-screensho...
|
| I do stuff like this (vector representations of the DOM) for
| taking screenshots. Why?
|
| 1. High resolution screenshots are great when you're sharing from
| a low resolution device, or when you need to scale them up. I've
| seen enough crappy screenshots of Twitter in YouTube videos to
| last me the rest of my life.
|
| 2. If your device does sub-pixel anti-aliasing, then your
| screenshots all have noticeable color fringing around their text.
| The text rendering is done well before the data hits the buffer
| that the screenshot is capturing. A fun party trick is to
| identify someone's OS based purely on a screenshot of some text
| on a webpage.
|
| 3. On Linux (and maybe elsewhere, IDK), color correction (e.g.
| gamut mapping) is done (in X11) before the pixels get to the
| buffer that you capture. So with most screenshot tools, you end
| up capturing a bunch of distorted colors which you then have to
| map back to sRGB if you want them to look right in color
| calibrated software.
|
| You can frequently get away with printing a PDF and then
| rendering that out to a large PNG. In some cases, though,
| figuring out how to set the page size to match what you seen on
| the screen can be near-impossible, and more importantly in
| Firefox there's no way to disable print media CSS when printing a
| PDF. (You can do this in Chromium.) If you need to edit the image
| afterwards or want to put it on a website or something, this is
| far easier to do with the SVG format than with PDF.
| jancsika wrote:
| > Recently, an experimental SVG back-end has been added to Skia.
|
| That's curious.
|
| Anyone know why?
| return_to_monke wrote:
| While I am not a skia person, an use case I could imagine is
| (flutter) web apps.
|
| Flutter currently has 2 ways to run something on the web: 1.
| CanvasKit. Primarily, this uses webgl. Though, the app has to
| download a kind of webGl runtime on the first launch, iirc. If
| the browser does not support openGl, it will use Skia with a
| Canvas frontend, leading to blurry and poor performance results
| 2. webRender. This is flutter's way of trying to make a HTML
| DOM, but its not that great either. It's inconsistent with the
| rest of the flutter implementations, and has performance issues
| because it's not really mature/optimized and has a virtual Dom.
|
| I think an exciting use case would be something like 1. Instead
| of the blurry image and bad performance of canvas redrawing, it
| might try to manipulate an SVG in the browser. This is pure
| speculation tho, correct me if I'm wrong.
| TheRealPomax wrote:
| Calling it "recently" is a bit of a misnomer. The
| "experimental/svg/model/..." content was added almost five
| years ago.
| simpleintheory wrote:
| Interesting. Wonder how easily it would be to generalise this--
| turn into an API that gives out some image data that could be in
| turn converted to PDF, SVG, PNG, you name it... though not sure
| how the data would be structured though
| imhoguy wrote:
| You can make PDF or PNG from SVG.
| fathyb wrote:
| I had a lot of people reach out this week-end for PDF support,
| so I'm planning on implementing it with PNG support this week.
| Thanks to Skia, it should just require a few lines of code.
| justinclift wrote:
| PDF or PNG? It's not clear from your comment. :)
___________________________________________________________________
(page generated 2022-11-13 23:00 UTC)