[HN Gopher] Web Browser Engineering
       ___________________________________________________________________
        
       Web Browser Engineering
        
       Author : djoldman
       Score  : 331 points
       Date   : 2021-10-17 17:33 UTC (2 days ago)
        
 (HTM) web link (browser.engineering)
 (TXT) w3m dump (browser.engineering)
        
       | butz wrote:
       | Neat, but what about integrating widevine support?
        
       | simonw wrote:
       | I was interested to see that this uses the DukPy wrapper around
       | Duktape for the JavaScript interpreter:
       | https://browser.engineering/scripts.html
       | 
       | This made me start digging into whether this was considered a
       | "safe" way of executing untrusted JavaScript in a sandbox.
       | 
       | It's not completely clear to me if DukPy currently attempts safe
       | evaluation - it's missing options for setting time or memory
       | limits on executed code for example:
       | https://github.com/amol-/dukpy
       | 
       | There's a QuickJS Python wrapper here which offers those limits:
       | https://github.com/PetterS/quickjs
       | 
       | I'm pretty paranoid though any time it comes to security and
       | dependencies written in C, so I'd love to see a Python wrapper
       | around a JavaScript engine that has safe sandbox execution as a
       | key goal plus an extensive track record to back it up!
        
         | ameliaquining wrote:
         | If you want battle-hardened, I figure you can't do better than
         | V8. Here's a Python wrapper that I've poked at a bit (it's not
         | quite 100% feature-complete but it seems to essentially work):
         | https://github.com/sqreen/PyMiniRacer
        
           | simonw wrote:
           | That looks really good - especially how they've managed to
           | bundle a pre-compiled v8 into a 4MB Python wheel:
           | https://blog.sqreen.com/embedding-javascript-into-python/
           | 
           | The time limit and memory limit support looks good too: https
           | ://github.com/sqreen/PyMiniRacer/blob/f7b9da0d4987ca7d1...
        
         | devwastaken wrote:
         | I don't see any specific claims on isolation/memory safety or
         | safety in general on duktapes page. Both V8 and spider monkey
         | actively fix new JS vulnerabilities, and V8 isolates are used
         | in the wild to success. Cloudflare workers is an example.
        
       | ngai_aku wrote:
       | Awesome to see this here! The course that accompanied this
       | textbook was among my favorites
        
       | eatonphil wrote:
       | I can't wait for Browser Engineering to show up as a university
       | course a la compilers, operating systems, networks, etc.
        
         | amelius wrote:
         | A browser is basically an operating system inside an operating
         | system.
         | 
         | The funny thing is we can have the MINIX microkernel
         | discussions all over again :)
        
           | gnull wrote:
           | Looking at the book's table of contents, I disagree. Browsers
           | may resemble OSes by the size of the code base or by the
           | amount of optimization involved, or by their importance for
           | the modern world, but not by the types of technologies
           | involved. I doubt the Browsers course will intersect a lot
           | with OSes course.
           | 
           | EDIT: Reading you comment again I suspect you might have been
           | joking :)
        
             | groby_b wrote:
             | Which part of OSs do you expect not to be covered?
             | 
             | There's IPC. There's memory management. There's process
             | management. There's network management. There's security.
             | There's device management.
             | 
             | They all happen at a slightly higher layer, but they all
             | exist similar to an OS. (I'm not sure if the higher layer
             | makes it easier or harder to understand - but in terms of
             | what you need to know, an OS class or three is definitely
             | helpful)
        
               | gnull wrote:
               | > IPC, memory management, process management, network
               | management.
               | 
               | I imagine the non-trivial parts of these are done for the
               | JS VM (correct me if I'm wrong), and therefore a VM
               | design course would have more intersection with Browsers
               | with respect to these disciplines than an OS course.
               | 
               | > security
               | 
               | This one is everywhere, it has no special connection to
               | OSes.
               | 
               | > They all happen at a slightly higher layer
               | 
               | Slightly?! That's an understatement of the week! The
               | difference in abstraction levels is huge here and the
               | specifics of the two levels are very, very different.
               | 
               | > in terms of what you need to know, an OS class or three
               | is definitely helpful
               | 
               | Sure. But I think it's as useful as any systems
               | programming course. I can agree that systems programming
               | is a good preliminary for both Browsers and OSes, and
               | learning either of the two will teach you a good deal
               | about systems programming, but I doubt they will repeat
               | each other.
        
               | groby_b wrote:
               | These are all part of the browser outside the VM too.
               | 
               | Multi-process architecture requires you to think deeply
               | about IPC.
               | 
               | Memory management is all over the place - there isn't a
               | browser without custom allocators, investment into GC,
               | etc.
               | 
               | Process management -> see multi-process architecture.
               | 
               | Network management: Browsers need to handle a tremendous
               | amount of network issues. I mean... that's what they do.
               | Outside of the VM, too.
               | 
               | As for "security is everywhere" - the whole point of an
               | OS (and a browser) is to make it possible for security to
               | be everywhere. To provide the primitives that you can
               | securely build on.
               | 
               | > Slightly?! That's an understatement of the week!
               | 
               | Not really, no. I've worked on embedded networking
               | stacks, on full-fledged OSs, and on browsers. I stand by
               | "slightly". Yes, granted, a browser doesn't get quite as
               | bit-fiddly as a on-the-metal OS, but it's a matter of
               | degrees, not quality.
               | 
               | Can you work on many areas of a browser without ever
               | touching OS-like code? Absolutely. This particular book
               | has a good chance of avoiding most, because it focuses on
               | the rendering part.
               | 
               | But a browser, as a whole, provides an abstracted
               | platform just like an OS. And it echoes many concepts, if
               | in slightly different forms.
        
         | chrishtr wrote:
         | Author here.
         | 
         | That's exactly what we are hoping for.
         | 
         | http://browser.engineering/preface.html
         | 
         | So far, my co-author Pavel has taught from this book multiple
         | times (including this semester). In the spring at least one
         | other university will offer a course. We'll list all known
         | courses offerings on the website.
         | 
         | Also, if anyone would like to teach from this book, please get
         | in touch!
        
       | varispeed wrote:
       | Many years ago, probably 20, I went on a task of implementing a
       | web browser. I remember I gave up at rendering tables. I couldn't
       | wrap my head around on how to properly size them. It has become
       | extremely complex quickly to address edge cases and I eventually
       | gave up when I couldn't understand what's going after having a
       | two weeks break. Probably if I had money and was able to commit
       | full time I could eventually get it, but I had to focus on
       | commercial work and putting food on my table.
       | 
       | edit: it's a great article! But nothing on rendering tables :-)
        
         | dmitriid wrote:
         | > Many years ago, probably 20, I went on a task of implementing
         | a web browser. I remember I gave up at rendering tables.
         | 
         | HTML 5 effort has cleaned up a lot of behaviors and specified
         | how browser tags should behave. So it is, possibly, an
         | approachable task now. Still daunting though.
        
           | pavpanchekha wrote:
           | Tables are still not that clean! But luckily tables are way
           | way less important than they were in the past, so much so
           | that browser differences in table rendering leaves most pages
           | readable.
        
       | baybal2 wrote:
       | It's now nearly impossible to build a web browser from scratch
       | because of runaway explosion of web browser features, and
       | proprietary API extensions.
       | 
       | W3C here is unfortunately a part to the problem.
       | 
       | Standardisation is good, but letting google pour streams
       | halfassedly written RFCs onto other browsermakers is not good.
       | 
       | Non-enforcement of standards is also bad, and it's bad to extend
       | W3C privileges to companies who themselves selectively implement
       | their own proposals, so others' browsers can't match their
       | behaviour.
        
         | jahewson wrote:
         | Actually a from-scratch web browser is being built:
         | 
         | https://www.fastcompany.com/90611677/flow-ekioh-web-browser-...
         | 
         | Also you should take a look at the WHATWG because it's far more
         | relevant than the W3C nowadays.
        
           | dmitriid wrote:
           | "The total word count of the W3C specification catalogue is
           | 114 million words at the time of writing. If you added the
           | combined word counts of the C11, C++17, UEFI, USB 3.2, and
           | POSIX specifications, all 8,754 published RFCs, and the
           | combined word counts of everything on Wikipedia's list of
           | longest novels, you would be 12 million words short of the
           | W3C specifications"
           | 
           | https://drewdevault.com/2020/03/18/Reckless-limitless-
           | scope....
           | 
           | No idea how Flow does it, but building a browser is nearly
           | impossible.
        
             | fabrice_d wrote:
             | It's well known that Drew Devault count is meaningless
             | since it includes dupes, drafts, and unrelated specs.
             | Still, the space to cover for a from scratch browser is
             | huge.
             | 
             | Flow didn't start "from scratch" recently, it's an
             | evolution of a primarily SVG+CSS renderer for set top
             | boxes. They also re-use Spidermonkey as their Javascript
             | engine.
        
               | dmitriid wrote:
               | > It's well known that Drew Devault count is meaningless
               | since it includes dupes, drafts, and unrelated specs.
               | 
               | It's not meaningless. Because in order to implement a
               | browser, you have to figure out which of them are dupes,
               | deprecated, drafts etc.
               | 
               | And even that won't help you. Because a huge amount of
               | "deprecated" standards are in the browsers. A huge amount
               | of stuff in the browsers is still at the "community
               | draft" stage, and yes, you have to implement that, too.
               | 
               | Microsoft simply gave up, forked Chromium... And they
               | still can't keep up: https://web-
               | confluence.appspot.com/#!/confluence
        
               | fabrice_d wrote:
               | The Edge graph stops before they switched to chromium
               | though...
        
               | dmitriid wrote:
               | Yes, that was my mistake. Doesn't make my words any less
               | true.
        
               | carapace wrote:
               | > it includes dupes, drafts, and unrelated specs.
               | 
               | Even if it's overblown by, say, three times, that's still
               | over thirty million words.
        
               | fabrice_d wrote:
               | Note that a large effort has been made to make specs more
               | precise, to be easier to implement in an interoperable
               | way.
               | 
               | That contributes to "word bloat", but it's not
               | necessarily a bad thing. Picking the right metrics is not
               | always that easy!
        
               | dmitriid wrote:
               | > Note that a large effort has been made to make specs
               | more precise, to be easier to implement in an
               | interoperable way
               | 
               | This is true for HTML5 which defined full browser
               | behaviour, including things like improperly closed and
               | improperly nested tags.
               | 
               | Many, many other specs? Not so much. Especially the crap
               | that Chrome has been pumping out the past several years.
        
           | carapace wrote:
           | Flow browser isn't FOSS.
           | 
           | > WHATWG [is] far more relevant than the W3C nowadays.
           | 
           | Which is arguably part of the problem.
        
       | kizer wrote:
       | Love first principles stuff. Great job.
        
       | ofou wrote:
       | this builds a web browser like the build-your-own-x movement?
        
         | pavpanchekha wrote:
         | Author here--the browser the book works through is, uhh, pretty
         | limited, so I don't imagine you'd want to actually use it for
         | web browsing. It's more like writing your own toy compiler or
         | operating system, to learn how they work.
        
           | ofou wrote:
           | It's perfect then! I'll read and work the exercises.
           | 
           | Thanks for answer!
        
       | jturpin wrote:
       | This is awesome. I've always wanted to know how the actual layout
       | portion works (or at least, can work in a simple way). I think
       | these kinds of resources are really valuable and people should be
       | empowered to make bespoke-ish web renderers as the need arises.
        
         | pavpanchekha wrote:
         | Author here. Layout is my favorite part of the browser--my
         | dissertation is largely about formalizing browser layout--and
         | so far the book only covers the basics, like the layout tree
         | and how layout is computed with multiple tree traversals, but
         | even understanding those basics I think gives you a huge
         | advantage when thinking about web page layout tasks.
        
       | game_the0ry wrote:
       | As a front end developer, I am really happy to see resources like
       | this.
       | 
       | Developing for the browser is a real challenge. I think working
       | with html / css /js has been a neglected skill for a long time -
       | most software engineers look down on that type of work and its
       | rarely covered in comp sci course work.
       | 
       | Still, its good to see a lot of progress has been made, this book
       | included.
       | 
       | My only critique - why use python instead of node.js?
        
         | pavpanchekha wrote:
         | Author here. I wrote up my answer here:
         | http://browser.engineering/blog/why-python.html
         | 
         | Basically: server-side JavaScript is just not as widely known
         | as Python, and it'd be additionally confusing when our browser
         | starts running JavaScript. And in-browser JavaScript is a bit
         | too restricted (by things like the same-origin policy) to do
         | the whole thing inside a browser.
        
           | gnull wrote:
           | Great blog design, btw! I love the way it displays footnotes
           | (if you can call them that now) on page margins.
           | 
           | Does it have RSS?
        
             | pavpanchekha wrote:
             | Yep! You can point your RSS reader at
             | https://browser.engineering/rss.xml
        
           | game_the0ry wrote:
           | I see. I tried searching the table of contents as to "why
           | python" and could not find it, but that link does more than
           | enough to explain the "why."
           | 
           | I am resisting the urge to disagree, but since you did
           | (literally) write a book about building a browser, I will
           | defer to your expertise and try to learn from you :)
        
           | bogomipz wrote:
           | This looks fantastic! Really excited to see this. I'm looking
           | forward to reading the updates as well. Cheers.
        
         | tablespoon wrote:
         | > Developing for the browser is a real challenge. I think
         | working with html / css /js has been a neglected skill for a
         | long time - most software engineers look down on that type of
         | work and its rarely covered in comp sci course work.
         | 
         | IMHO, those are job skills and not comp sci topics. They
         | shouldn't be part of a degree program (except the most
         | superficial treatment required to get some ugly UI up that may
         | be required for something else). You have your whole career to
         | pick them up.
        
         | dgb23 wrote:
         | I agree, JS _the language_ would be a more "obvious" choice for
         | this, since it is both generally much faster and more popular
         | in web development. I assume they are using Python _the
         | ecosystem_ here. It probably comes with packages better suited
         | for rendering specifically?
         | 
         | I don't think either is a super compelling choice anyways for
         | this type of work. I think you want to use a systems language
         | here. However Python is completely fine as a teaching language.
         | Pretty much anyone who knows a similarly structured language
         | can read it. And there is very little noise. So it can serve as
         | a good reference if you want to follow along with a different
         | language.
        
           | traverseda wrote:
           | I think writing this in javascript would add some confusion,
           | as people learning about this might have a hard time
           | understanding exactly where the javascript is running.
        
       | richm44 wrote:
       | This video is good background for khtml/webkit on which chrome
       | was based https://www.youtube.com/watch?v=Tldf1rT0Rn0
        
       | harry-wood wrote:
       | Building a browser as a desktop application would be quite hard,
       | but I reckon I do it as a web application.
        
         | posedge wrote:
         | Maybe we should have a SaaS browser.
        
       | amelius wrote:
       | Is this written by actual web browser engineers? If so, what
       | fields did they specialize in?
        
         | bla3 wrote:
         | https://mobile.twitter.com/chrishtr says "Rendering lead for
         | Chrome", so at least one of the authors seems to do this
         | professionally.
        
           | imajoredinecon wrote:
           | More about his team's work:
           | https://www.chromium.org/teams/rendering
        
         | chrishtr wrote:
         | Author here.
         | 
         | I'm the rendering lead for Chrome, and know quite a lot about
         | how it works. I also recently wrote a series of articles about
         | the new rendering architecture of Chromium, see here:
         | 
         | https://developer.chrome.com/blog/renderingng/
         | 
         | Pavel is a professor at the University of Utah and has
         | extensively studied CSS from an academic point of view. He also
         | has a lot of experience teaching the material and making it
         | accessible to students.
        
       | tester756 wrote:
       | I recommend this:
       | 
       | https://www.html5rocks.com/en/tutorials/internals/howbrowser...
        
         | chrishtr wrote:
         | Author here.
         | 
         | That article, along with a number of other resources, are
         | listed here:
         | 
         | https://browser.engineering/bibliography.html
         | 
         | In my view, a critical part of really learning how something as
         | complicated as a browser works is by trying to build it
         | yourself. That's why our book is oriented around building a
         | browser as you go.
        
         | bogomipz wrote:
         | This also looks great. Might you or anyone else know where to
         | find the accompanying video? I get "Sorry This video does not
         | exist." for the embedded video link.
        
       | a_c wrote:
       | Love the bibliography section. I have always wanted to
       | reinterpret HTML into other representations. These resources give
       | me good reference
        
       ___________________________________________________________________
       (page generated 2021-10-19 23:01 UTC)