[HN Gopher] Laptop to Lambda: Outsourcing Everyday Jobs to Thous...
       ___________________________________________________________________
        
       Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of
       Transient Containers
        
       Author : mlerner
       Score  : 99 points
       Date   : 2021-07-25 16:06 UTC (6 hours ago)
        
 (HTM) web link (www.micahlerner.com)
 (TXT) w3m dump (www.micahlerner.com)
        
       | chubot wrote:
       | 2019 thread on the paper:
       | https://news.ycombinator.com/item?id=20433315
       | 
       | Another review: https://buttondown.email/nelhage/archive/papers-
       | i-love-gg/
        
       | [deleted]
        
       | keithwinstein wrote:
       | Thank you for this nice writeup! This paper was led by my student
       | Sadjad Fouladi (https://sadjad.org/), part of a broader theme of
       | coercing a "purely functional-ish" design onto everyday
       | applications. There's a less academic-ese version with a few
       | extended results that was published in ;login: magazine (https://
       | www.usenix.org/system/files/login/articles/login_fal...). There
       | was also a good analysis here
       | (https://buttondown.email/nelhage/archive/papers-i-love-gg/) and
       | don't miss https://buttondown.email/nelhage/archive/http-
       | pipelining-s3-... .
       | 
       | Some of Sadjad's other work has included:
       | 
       | - ExCamera, which somewhat kicked off the trend of "fire up 4,000
       | lambda workers in a burst, all working on one job" -- for things
       | like making a neural network search a video frame-by-frame, video
       | compression in parallel at sub-GOP granularity, etc.
       | (https://news.ycombinator.com/item?id=16197253)
       | 
       | - Salsify, which reused the "purely functional" video codec from
       | ExCamera to improve WebRTC/Zoom-style live video
       | (https://news.ycombinator.com/item?id=16964112 ,
       | https://news.ycombinator.com/item?id=20794541). Sadjad is giving
       | an Applied Networking Research Prize talk about this work at IETF
       | tomorrow.
       | 
       | - 3D ray-tracing (running PBRT on thousands of Lambdas, sending
       | rays across the network), SMT/SAT solving, etc.
       | 
       | We're working to extend this line of work towards a more general,
       | Wasm-based, "purely functional" operating system where most
       | computations operate on content-addressed data and are content-
       | addressed themselves, and determinism and reproducibility is
       | properties guaranteed by the OS. Sort of analogous to how the
       | operating systems of today (try to) guarantee memory isolation
       | between processes. Imagine, e.g., a Git repository where you
       | could represent the fact that "blob <x> is the result of running
       | computation <y> given tree <z> as input," and anybody can verify
       | that result, or rebase the computation to run on top of their own
       | input. If you're interested in this general area, please consider
       | doing a PhD at Stanford and/or get in touch -- I'm hiring.
        
         | boulos wrote:
         | Hi, Keith! Glad to see you're still enjoying hipster compute
         | :).
         | 
         | How/what do you think about Cloudflare workers, fly.io, and
         | similar "run pure-ish functions anywhere"? I no longer have any
         | skin in the game, but it seems to me that "ignoring locality"
         | just means having to reinvent locality later on.
        
           | keithwinstein wrote:
           | Heya, great to see you pop up here! I gotta be honest -- I
           | think EC2 (and, in general, doing computation in units of
           | VMware/Xen-style virtual PCs) is the actual hipster compute
           | substrate. AWS Lambda feels closer to cgi-bin from 1995, i.e.
           | back when things still made sense. (Have you ever joined a
           | tech company and been handed a 10 gigabyte VM image that uses
           | a Vagrant pipeline to provision itself so you can get a
           | working dev environment, except the pipeline only works if
           | 100% of its 10,000 downloads succeed, so the whole thing is
           | super-flaky, but nobody at the company knows because they
           | only ran it once when they first joined and have just kept
           | the same local dev VM ever since? That's what hipster compute
           | means to me.)
           | 
           | All that aside, Cloudflare workers/fly.io/Fastly
           | Compute@edge/Lucet/Google Cloud Run seem really cool, and the
           | resulting work on Wasm and its ecosystem is fantastic, but
           | they're also not exactly what excites me. Deploying code
           | close to the edge (or "anywhere" in particular) isn't very
           | important if the application only makes one round-trip. Even
           | if my code is pure, it's not like fly.io is willing to sign a
           | certificate saying, "We evaluated function <y> on input <z>
           | and the correct answer is <x>, signed Fly Inc., and if you
           | can prove us wrong in the next 10 years, our insurance
           | company will pay you $1 million from our E&O policy." Which
           | would really be cool. And, I don't know of people spinning up
           | 4,000 nodes on those systems in 100 ms to do a 1-second-long
           | computation. I haven't seen any of the providers or outsiders
           | benchmarking the "burst-to-N,000-nodes latency" numbers
           | averaged over many trials at various times of day. (We
           | measured GKE a small number of times in the gg paper [fig. 7]
           | and found it to be... really slow at that particular metric.)
           | 
           | I don't think we want to ignore locality! But I do want the
           | OS to be able to secure access to thousands of cores in <1
           | second for <10 second duration workloads, and I think many
           | applications would be willing to _compromise_ on locality, or
           | accept heterogeneous /irregular locality, in exchange for
           | that. I'd still love _visibility_ into the locality I end up
           | with, I 'd love not to have to do flaky NAT-traversal hacks
           | to get direct communication among nodes, and I could imagine
           | the application bidding more to persuade the infrastructure
           | owner to provide computation in larger units (i.e. more cores
           | on fewer machines, machines in a placement group with full
           | bisection bandwidth, etc.), which is sort of where Lambda
           | seems to be heading already.
           | 
           | (Long term, I don't really think applications should be
           | renting cores and RAM per unit time and thinking about
           | locality; I'd love to be dealing with the infrastructure
           | provider in terms of some higher-level abstraction, because
           | then you could imagine the provider might be genuinely
           | incentivized to discover better ways of computing the same
           | answer, to our mutual benefit.)
        
             | r3trohack3r wrote:
             | I'm loving this train of thought Keith.
             | 
             | What are your thoughts on program correctness and runaway
             | cost. I'm a little uncomfortable running a workload that
             | could scale unexpectedly to a denial of wallet.
             | 
             | For this research, how did you enforce bounds on your
             | workload to prevent exceeding your funding budget? Is the
             | whole compute graph calculated locally? The recursive
             | workloads seem particularly anxiety inducing.
        
         | seg_lol wrote:
         | I too have been thinking about <<general, Wasm-based, "purely
         | functional">> content addressed computation.
         | 
         | I think it can support both legacy applications as well as
         | purely functional uses. I really want to support both, the case
         | for linking against any git commit and doing live differential
         | testing is really enticing. I toyed with a serverless
         | deployment system years ago where code was callable by githash
         | and was run directly from git. One could execute any version at
         | any time. This system would be able to automatically rerun
         | executions against new code to track regressions across many
         | dimensions. On failure, the system could fall back to older
         | code paths. TBD how to manage modularity and coherence across
         | sets of functions, restart might need to happen at a much
         | higher level.
         | 
         | For processing an input stream, I think the lambdas would need
         | to be tail recursive so that the internal state could be
         | externally checkpointed,
         | stream_setup/process_chunk/stream_close. process_chunk would
         | need to emit either a total copy of its internal state, or a
         | token linking to persistent storage.
         | 
         | Curious what your current set of basis functions are and how
         | failures are accounted for?
        
       | haolez wrote:
       | This could be very useful for quantum chemistry simulations,
       | which are generally parallelizable and very CPU intensive. If gg
       | gets tweaked to support MPI, this niche could have a
       | breakthrough!
        
       | z3ncyberpunk wrote:
       | Disgustingly wasteful
        
       | JZL003 wrote:
       | My work uses GCP not AWS so I've been experimenting with google
       | cloud run (it's actually parallelizing R code so need the docker
       | container infra). My only problem is that I have very bursty
       | useage and the auto-scaling is too slow. I made one attempt [1]
       | to encourage larger allocation but don't know another way. Do
       | people have experience with this
       | 
       | [1] Slightly costly but ~ 5minutes before I need it, I set the
       | minimum instance size to a larger number so it starts ramping up,
       | then when I'm done I lower it
        
       | neolog wrote:
       | Hi Micah, I'd like to follow these posts but I don't like signing
       | up. Would you mind adding an Atom/RSS feed?
        
         | mlerner wrote:
         | Thanks for the feedback! Does this Atom feed work for you?
         | https://www.micahlerner.com/feed.xml
        
           | neolog wrote:
           | Yep, thanks.
        
         | MarkSweep wrote:
         | There is a feed. It is title "untitled" though so it may be
         | hard to find in your feed reader after you add it:
         | 
         | https://www.micahlerner.com/feed.xml
        
           | gumby wrote:
           | Good find -- I searched in the page source and there was no
           | reference to it.
        
         | gumby wrote:
         | I filed a github issue asking the author to enable the RSS feed
         | which their web tool (Hugo) has built in.
        
           | mlerner wrote:
           | Thanks!
        
       ___________________________________________________________________
       (page generated 2021-07-25 23:01 UTC)