[HN Gopher] Unfashionably secure: why we use isolated VMs
       ___________________________________________________________________
        
       Unfashionably secure: why we use isolated VMs
        
       Author : mh_
       Score  : 157 points
       Date   : 2024-07-25 17:00 UTC (6 hours ago)
        
 (HTM) web link (blog.thinkst.com)
 (TXT) w3m dump (blog.thinkst.com)
        
       | PedroBatista wrote:
       | As a permanent "out of style" curmudgeon in the last ~15 years, I
       | like that people are discovering that maybe VMs are in fact the
       | best approach for a lot of workloads and the LXC cottage industry
       | and Docker industrial complex that developed around solving
       | problems created by themselves or solved decades ago might need
       | to take a hike.
       | 
       | Modern "containers" were invented to make things more
       | reproducible ( check ) and simplify dev and deployments ( NOT
       | check ).
       | 
       | Personally FreeBSD Jails / Solaris Zones are the thing I like to
       | dream are pretty much as secure as a VM and a perfect fit for a
       | sane dev and ops workflow, I didn't dig too deep into this is
       | practice, maybe I'm afraid to learn the contrary, but I hope not.
       | 
       | Either way Docker is "fine" but WAY overused and overrated IMO.
        
         | ganoushoreilly wrote:
         | Docker is great, way overused 100%. I believe a lot of it
         | started as "cost savings" on resource usage. Then it became the
         | trendy thing for "scalability".
         | 
         | When home enthusiasts build multi container stacks for their
         | project website, it gets a bit much.
        
           | applied_heat wrote:
           | Solves dependency version hell also
        
             | theLiminator wrote:
             | Solves it in the same sense that it's a giant lockfile. It
             | doesn't solve the other half where updates can horribly
             | break your system and you run into transitive version
             | clashes.
        
               | bornfreddy wrote:
               | But at least you can revert back to the original
               | configuration (as you can with VM, too).
        
               | Spivak wrote:
               | It solves it in the sense that it empowers the devs to
               | update their dependencies on their own time and ops can
               | update the underlying infrastructure fearlessly. It
               | turned a coordination problem into a non-problem.
        
             | sitkack wrote:
             | It doesn't solve it, it makes it tractable so you can use
             | the scientific method to fix problems as opposed to voodoo.
        
         | everforward wrote:
         | > Modern "containers" were invented to make thinks more
         | reproducible ( check ) and simplify dev and deployments ( NOT
         | check ).
         | 
         | I do strongly believe deployments of containers are easier. If
         | you want something that parallels a raw VM, you can "docker
         | run" the image. Things like k8s can definitely be complicated,
         | but the parallel there is more like running a whole ESXi
         | cluster. Having done both, there's really only a marginal
         | difference in complexity between k8s and an ESXi cluster
         | supporting a similar feature set.
         | 
         | The dev simplification is supposed to be "stop dealing with
         | tickets from people with weird environments", though it
         | admittedly often doesn't apply to internal application where
         | devs have some control over the environment.
         | 
         | > Personally FreeBSD Jails / Solaris Zones are the thing I like
         | to dream are pretty much as secure as a VM and a perfect fit
         | for a sane dev and ops workflow
         | 
         | I would be interested to hear how you use them. From my
         | perspective, raw jails/zones are missing features and
         | implementing those features on top of them ends up basically
         | back at Docker (probably minus the virtual networking). E.g.
         | jails need some way to get new copies of the code that runs in
         | them, so you can either use Docker or write some custom
         | Ansible/Chef/etc that does basically the same thing.
         | 
         | Maybe I'm wrong, and there is some zen to be found in raw-er
         | tools.
        
         | dboreham wrote:
         | For me it's about the ROAC property (Runs On Any Computer). I
         | prefer working with stuff that I can run. Running software is
         | live software, working software, loved software. Software that
         | only works in weird places is bad, at least for me. Docker is
         | pretty crappy in most respects, but it has the ROAC going for
         | it.
         | 
         | I would _love_ to have a  "docker-like thing" (with ROAC) that
         | used VMs not containers (or some other isolation tech that
         | works). But afaik that thing does not yet exist. Yes there are
         | several "container-tool, but we made it use VMs" (firecracker
         | and downline), but they all need weirdo special setup, won't
         | run on my laptop, or a generic Digitalocean VM.
        
           | 01HNNWZ0MV43FF wrote:
           | Yeah that's kind of a crummy tradeoff.
           | 
           | Docker is "Runs on any Linux, mostly, if you have a new
           | enough kernel" meaning it packages a big VM anyway for
           | Windows and macOS
           | 
           | VMs are "Runs on anything! ... Sorta, mostly, if you have VM
           | acceleration" meaning you have to pick a VM software and hope
           | the VM doesn't crash for no reason. (I have real bad luck
           | with UTM and VirtualBox on my Macbook host for some reason.)
           | 
           | All I want is everything - An APE-like program that runs on
           | any OS, maybe has shims for slightly-old kernels, doesn't
           | need a big installation step, and runs any useful guest OS.
           | (i.e. Linux)
        
             | neaanopri wrote:
             | The modern developer yearns for Java
        
               | smallmancontrov wrote:
               | I had to use eclipse the other day. How the hell is it
               | just as slow and clunky as I remember from 20 years ago?
               | Does it exist in a pocket dimension where Moore's Law
               | doesn't apply?
        
               | TurningCanadian wrote:
               | That's not Java's fault though. IntelliJ IDEA is also
               | built on Java and runs just fine.
        
               | qwery wrote:
               | I think it's pretty remarkable to see any application in
               | continuous use for so long, especially with so few
               | changes[0] -- Eclipse must be doing something right!
               | 
               | Maintaining (if not actively improving/developing) a
               | piece of useful software without performance
               | _degradation_ -- that 's a win.
               | 
               | Keeping that up for decades? That's exceptional.
               | 
               | [0] "so few changes": I'm not commenting on the amount of
               | work done on the project or claiming that there is no
               | useful/visible features added or upgrades, but referring
               | to Eclipse of today feeling like the same application as
               | it always did, and that Eclipse hasn't had multiple
               | alarmingly frequent "reboots", "overhauls", etc.
               | 
               | [?] keeping performance constant over the last decade or
               | two is a win, relatively speaking, anyway
        
               | dijit wrote:
               | I agree, that you've pointed it out to me makes it
               | obvious that this is not the norm, and we _should_
               | celebrate this.
               | 
               | I'm reminded of Casey Muratori's rant on Visual Studio; a
               | program that largely feels like it hasn't changed much
               | but clearly has regressed in performance massively;
               | https://www.youtube.com/watch?v=GC-0tCy4P1U
        
               | password4321 wrote:
               | > _without performance degradation_
               | 
               | Not accounting for Moore's Law, yikes. Need a comparison
               | adjusted for "today's dollars".
        
               | gryfft wrote:
               | Maybe just the JVM.
        
               | mschuster91 wrote:
               | Java's ecosystem is just as bad. Gradle is insanely
               | flexible but people create abominations out of it, Maven
               | is extremely rigid so people resort to even worse
               | abominations to get basic shit done.
        
             | compsciphd wrote:
             | docker is your userspace program carries all its user space
             | dependencies with it and doesn't depend on the userspace
             | configuration of the underlying system.
             | 
             | What I argued in my paper is that systems like docker (i.e.
             | what I created before it), improve over VMs and (even
             | Zones/ZFS) in their ability to really run ephemeral
             | computation. i.e. if it takes microseconds to setup the
             | container file system, you can run a boatload of
             | heterogeneous containers even if they only needed to run
             | for very shot periods of time). Solaris Zones/ZFS didn't
             | lend itself to heterogeneous environments, but simply
             | cloning as single homogeneous environment, while VMs
             | suffered from that problem, they also (at least at the
             | time, much improved as of late) required a reasonably long
             | bootup time.
        
           | ThreatSystems wrote:
           | Vagrant / Packer?
        
             | gavindean90 wrote:
             | With all the mind share that terraform gets you would thing
             | vagrant would at least be known but alas
        
               | tptacek wrote:
               | Somebody educate me about the problem Packer would solve
               | for you in 2024?
        
               | kasey_junk wrote:
               | I think the thread is more about how docker was a
               | reaction to the vagrant/packer ecosystem that was deemed
               | overweight but was in many ways was a "docker like thing"
               | but VMs.
        
               | tptacek wrote:
               | Oh, yeah, I'm not trying to prosecute, I've just always
               | been Packer-curious.
        
               | yjftsjthsd-h wrote:
               | What's a better way to make VM images?
        
         | gryfft wrote:
         | I've been meaning to do a bhyve deep dive for years, my gut
         | feelings being much the same as yours. Would appreciate any
         | recommended reading.
        
           | Gud wrote:
           | Read the fine manual and handbook.
        
         | nimish wrote:
         | Clear Containers/Kata Containers/firecracker VMs showed that
         | there isn't really a dichotomy here. Why we aren't all using HW
         | assisted containers is a mystery.
        
           | turtlebits wrote:
           | Engineers are lazy, especially Ops. Until it's easier to get
           | up and running and there are tangible benefits, people won't
           | care.
        
           | tptacek wrote:
           | It's not at all mysterious: to run hardware-virtualized
           | containers, you need your compute hosted on a platform that
           | will allow KVM. That's a small, expensive, tenuously
           | available subset of AWS, which is by far the dominant compute
           | platform.
        
             | Spivak wrote:
             | So... Lambda, Fargate, and EC2. The only thing you can't
             | really do this with is EKS.
             | 
             | Like Firecracker was made by AWS to run containers on their
             | global scale KVM, EC2.
        
               | tptacek wrote:
               | Lambda and Fargate are implementations of the idea, not a
               | way for you yourself to do any kind of KVM container
               | provisioning. You can't generally do this on EC2; you
               | need special instances for it.
               | 
               | For a variety of reasons, I'm pretty familiar with
               | Firecracker.
        
         | turtlebits wrote:
         | Honestly, it really doesn't matter whether it's VMs or Docker.
         | The docker/container DX is so much better than VMWare/QEMU/etc.
         | Make it easy to run workloads in VMs/Firecracker/etc and you'll
         | see people migrate.
        
           | packetlost wrote:
           | I mean, Vagrant was basically docker before docker. People
           | used it. But it turns out the overhead over booting a full VM
           | + kernel adds latency which is undesirable for development
           | workloads. The techniques used by firecracker could be used,
           | but I suspect the overhead of allocating a namespace and
           | loading a process will always be less than even restoring
           | from a frozen VM, so I wouldn't hold my breath on it swinging
           | back in VM's direction for developer workloads ever.
        
             | yjftsjthsd-h wrote:
             | It would be interesting to see a microvm
             | (kata/firecracker/etc.) version of vagrant. And open
             | source, of course. I can't see any technical reason why it
             | would be particularly difficult.
        
         | compsciphd wrote:
         | As the person who created docker (well, before docker - see
         | https://www.usenix.org/legacy/events/atc10/tech/full_papers/...
         | and compare to docker), I argued that it wasn't just good for
         | containers, but could be used to improve VM management as well
         | (i.e. a single VM per running image - seehttps://www.usenix.org
         | /legacy/events/lisa11/tech/full_papers...)
         | 
         | I then went onto built a system with kubernetes that enabled
         | one to run "kubernetes pods" in independent VMs -
         | https://github.com/apporbit/infranetes (as well as create
         | hybrid "legacy" VM / "modern" container deployments all managed
         | via kubernetes.)
         | 
         | - as a total aside (while I toot my own hort on the topic of
         | papers I wrote or contributed to), note the reviewer of this
         | paper that originally used the term Pod for a running container
         | -
         | https://www.usenix.org/legacy/events/osdi02/tech/full_papers...
         | - explains where Kubernetes got the term from.
         | 
         | I'd argue that FreeBSD Jails / Solaris Zones (Solaris Zone/ZFS
         | inspired my original work) really aren't any more secure than
         | containers on linux, as they all suffer from the same
         | fundamental problem of the entire kernel being part of one's
         | "tcb", so any security advantage they have is simply due lack
         | of bugs, not simply a better design.
        
           | ysnp wrote:
           | Would you say approaches like gvisor or nabla containers
           | provide more/enough evolution on the security front? Or is
           | there something new on the horizon that excites you more as a
           | prospect?
        
             | compsciphd wrote:
             | been out of the space for a bit (though interviewing again,
             | so might get back into it), gvisor at least as the
             | "userspace" hypervisor, seemed to provide minimal value vs
             | modern hypervisor systems with low overhead / quick boot
             | VMs (ala firecracker). With that said, I only looked at it
             | years ago, so I could very well be out of date on it.
             | 
             | Wasn't aware of Nabla, but they seem to be going with the
             | unikernel approach (based on a cursory look at them).
             | Unikernels have been "popular" (i.e. multiple attempts) in
             | the space (mostly to basically run a single process app
             | without any context switches), but it creates a process
             | that is fundamentally different than what you develop and
             | is therefore harder to debug.
             | 
             | while the unikernels might be useful in the high frequency
             | trading space (where any time savings are highly valued),
             | I'm personally more skeptical of them in regular world
             | usage (and to an extent, I think history has born this out,
             | as it doesn't feel like any of the attempts at it, has
             | gotten real traction)
        
               | tptacek wrote:
               | Modern gVisor uses KVM, not ptrace, for this reason.
        
               | compsciphd wrote:
               | so I did a check, it would seem that gvisor with kvm,
               | mostly works for bare metal, not on existing VMs (nested
               | virtualization).
               | 
               | https://gvisor.dev/docs/architecture_guide/platforms/
               | 
               | "Note that while running within a nested VM is feasible
               | with the KVM platform, the systrap platform will often
               | provide better performance in such a setup, due to the
               | overhead of nested virtualization."
               | 
               | I'd argue then for most people (unless have your own
               | baremetal hyperscaler farm), one would end up using
               | gvisor without kvm, but speaking from a place of
               | ignorance here, so feel free to correct me.
        
             | the_duke wrote:
             | GVisor basically works by intercepting all Linux syscalls,
             | and emulating a good chunk of the Linux kernel in userspace
             | code. In theory this allows lowering the overhead per VM,
             | and more fine-grained introspection and rate limiting /
             | balancing across VMs, because not every VM needs to run
             | it's own kernel that only interacts with the environment
             | through hardware interfaces. Interaction happens through
             | the Linux syscall ABI instead.
             | 
             | From an isolation perspective it's not more secure than a
             | VM, but less, because GVisor needs to implement it's own
             | security sandbox to isolate memory, networking, syscalls,
             | etc, and still has to rely on the kernel for various
             | things.
             | 
             | It's probably more secure than containers though, because
             | the kernel abstraction layer is separate from the actual
             | host kernel and runs in userspace - if you trust the
             | implementation... using a memory-safe language helps there.
             | (Go)
             | 
             | The increased introspectioncapabiltiy would make it easier
             | to detect abuse and to limit available resources on a more
             | fine-grained level though.
             | 
             | Note also that GVisor has quite a lot of overhead for
             | syscalls, because they need to be piped through various
             | abstraction layers.
        
               | compsciphd wrote:
               | I actually wonder how much "overhead" a VM actually has.
               | i.e. a linux kernel that doesn't do anything (say perhaps
               | just boots to an init that mounts proc and every n
               | seconds read in/prints out /proc/meminfo) how much memory
               | would the kernel actually be using?
               | 
               | So if processes in gvisor map to processes on the
               | underlying kernel, I'd agree it gives one a better
               | ability to introspect (at least in an easy manner).
               | 
               | It gives me an idea that I'd think would be interesting
               | (I think this has been done, but it escapes me where), to
               | have a tool that is external to the VM (runs on the
               | hypervisor host) that essentially has "read only" access
               | to the kernel running in the VM to provide visibility
               | into what's running on the machine without an agent
               | running within the VM itself. i.e. something that knows
               | where the processes list is, and can walk it to enumerate
               | what's running on the system.
               | 
               | I can imagine the difficulties in implementing such a
               | thing (especially on a multi cpu VM), where even if you
               | could snapshot the kernel memory state efficiently, it be
               | difficult to do it in a manner that provided a
               | "safe/consistent" view. It might be interesting if the
               | kernel itself could make a hypercall into the hypervisor
               | at points of consistency (say when finished making an
               | update and about to unlock the resource) to tell the tool
               | when the data can be collected.
        
               | xtacy wrote:
               | > to have a tool that is external to the VM (runs on the
               | hypervisor host) that essentially has "read only" access
               | to the kernel running on the VM to provide visibility
               | into what's running on the machine without an agent
               | running within the VM itself
               | 
               | Not quite what you are after, but comes close ... you
               | could run gdb on the kernel in this fashion and inspect,
               | pause, step through kernel code:
               | https://stackoverflow.com/questions/11408041/how-to-
               | debug-th....
        
               | stacktrust wrote:
               | https://github.com/Wenzel/pyvmidbg
               | LibVMI-based debug server, implemented in Python.
               | Building a guest aware, stealth and agentless full-system
               | debugger.. GDB stub allows you to debug a remote process
               | running in a VM with your favorite GDB frontend. By
               | leveraging virtual machine introspection, the stub
               | remains stealth and requires no modification of the
               | guest.
               | 
               | more: https://github.com/topics/virtual-machine-
               | introspection
        
               | ecnahc515 wrote:
               | > I actually wonder how much "overhead" a VM actually
               | has. i.e. a linux kernel that doesn't do anything (say
               | perhaps just boots to an init that mounts proc and every
               | n seconds read in/prints out /proc/meminfo) how much
               | memory would the kernel actually be using?
               | 
               | There's already some memory sharing available using DAX
               | in Kata Containers at least: https://github.com/kata-
               | containers/kata-containers/blob/main...
        
           | bombela wrote:
           | > As the person who created docker (well, before docker - see
           | https://www.usenix.org/legacy/events/atc10/tech/full_papers/.
           | .. and compare to docker)
           | 
           | I picked the name and wrote the first prototype (python2) of
           | Docker in 2012. I had not read your document (dated 2010). I
           | didn't really read English that well at the time, I probably
           | wouldn't have been able to understand it anyways.
           | 
           | https://en.wikipedia.org/wiki/Multiple_discovery
           | 
           | More details for the curious: I wrote the design doc and
           | implemented the prototype. But not in a vacuum. It was a lot
           | work with Andrea, Jerome and Gabriel. Ultimately, we all
           | liked the name Docker. The prototype already had the notion
           | of layers, lifetime management of containers and other
           | fundamentals. It exposed an API (over TCP with zerorpc). We
           | were working on container orchestration, and we needed a
           | daemon to manage the life cycle of containers on every
           | machine.
        
             | compsciphd wrote:
             | I'd note I didn't say you copied it, just that I created it
             | first (i.e. "compare paper to docker". also, as you note,
             | its possible someone else did it too, but at least my
             | conception got through academic peer-review / patent
             | office, yeah, there's a patent, never been attempted to be
             | enforced though to my knowledge).
             | 
             | when I describe my work (I actually should have used quotes
             | here), I generally give air quotes when saying it, or say
             | "proto docker", as it provides context for what I did
             | (there's also a lot of people who view docker as synonymous
             | with containerization as a whole, and I say that containers
             | existed way before me). I generally try to approach it
             | humbly, but I am proud that I predicted and built what the
             | industry seemingly needed (or at least is heavily using).
             | 
             | people have asked me why I didn't pursue it as a company,
             | and my answer is a) I'm not much of an entrepreneur (main
             | answer), and b) I felt it was a feature, not a "product",
             | and would therefore only really profitable for those that
             | had a product that could use it as a feature (which one
             | could argue that product turned out to be clouds, i.e. they
             | are the ones really making money off this feature). or as
             | someone once said a feature isn't necessarily a product and
             | a product isn't necessarily a company.
        
         | anonfordays wrote:
         | >Personally FreeBSD Jails / Solaris Zones are the thing I like
         | to dream are pretty much as secure as a VM and a perfect fit
         | for a sane dev and ops workflow, I didn't dig too deep into
         | this is practice, maybe I'm afraid to learn the contrary, but I
         | hope not
         | 
         | Having run both at scale, I can confirm and assure you they are
         | not as secure as VMs and did not produce sane devops workflows.
         | Not that Docker is much better, but it _is_ better from the
         | devops workflow perspective, and IMHO that 's why Docker "won"
         | and took over the industry.
        
           | kkfx wrote:
           | A sane DevOps workflow is with declarative systems like NixOS
           | or Guix System, definitively not on a VM infra in practice
           | regularly not up to date, full of useless deps, on a host
           | definitively not up to date, with the entire infra typically
           | not much managed nor manageable and with an immense attack
           | surface...
           | 
           | VMs are useful for those who live on the shoulder of someone
           | else (i.e. *aaS) witch is ALL but insecure.
        
             | secondcoming wrote:
             | I'm not sure what you're referring to here?
             | 
             | Our cloud machines are largely VMs. Deployments mean
             | building a new image and telling GCP to deploy that as
             | machines come and go due to scaling. The software is up to
             | date, dependencies are managed via ansible.
             | 
             | Maybe you think VMs means monoliths? That doesn't have to
             | be the case.
        
             | nine_k wrote:
             | VMs are useful when you don't own or rent dedicated
             | hardware. Which is a lot of cases, especially when your
             | load varies seriously over the day or week.
             | 
             | And even if you do manage dedicated servers, it's often
             | wise to use VMs on them to better isolate parts of the
             | system, aka limit the blast radius.
        
         | analognoise wrote:
         | What do you think of Nix/NixOS?
        
           | egberts1 wrote:
           | Nix is trying to be like macOS's DMG but its image file is
           | bit more parse-able.
        
           | reddit_clone wrote:
           | But that comes _after_ you have chosen VMs over Containers
           | yes?
           | 
           | If you are using VMs, I think NixOs/Guix is a good choice.
           | Reproducible builds, Immutable OS, Immutable binaries and
           | Dead easy rollback.
           | 
           | It still looks somewhat futuristic. Hopefully gets traction.
        
             | solarpunk wrote:
             | if you're using nixos, just to do provisioning, I would
             | argue OStree is a better fit.
        
             | bspammer wrote:
             | Nix is actually a really nice tool for building docker
             | images: https://xeiaso.net/talks/2024/nix-docker-build/
        
         | vundercind wrote:
         | Docker's the best cross-distro rolling-release package manager
         | and init system for services--staying strictly out of managing
         | the base system, which is great--that I know of. I don't know
         | of anything that's even close, really.
         | 
         | All the other stuff about it is way less important to me than
         | that part.
        
           | pxc wrote:
           | [delayed]
        
         | topspin wrote:
         | Isn't this discussion based on a false dichotomy? I, too, use
         | VMs to isolate customers, and I use containers within those
         | VMs, either with or without k8s. These tools solve different
         | problems. Containers solve software management, whereas VMs
         | provide a high degree of isolation.
         | 
         | Container orchestration is where I see the great mistake in all
         | of this. I consider everything running in a k8s cluster to be
         | one "blast domain." Containers can be escaped. Faulty
         | containers impact everyone relying on a cluster. Container
         | orchestration is the thing I believe is "overused." It was
         | designed to solve "hyper" scale problems, and it's being
         | misused in far more modest use cases where VMs should prevail.
         | I believe the existence of container orchestration and its
         | misapplication has retarded the development of good VM tools: I
         | dream of tools that create, deploy and manage entire VMs with
         | the same ease as Docker, and that these tools have not matured
         | and gained popularity because container orchestration is so
         | easily misapplied.
         | 
         | Strongly disagree about containers and dev/deployment ("NOT
         | check"). I can no longer imagine development without
         | containers: it would be intolerable. Container repos are a
         | godsend for deployment.
        
         | tptacek wrote:
         | Jails/Zones are not pretty much as secure as a VM. They're
         | materially less secure: they leave cotenant workloads sharing a
         | single kernel (not just the tiny slice of the kernel KVM
         | manages). Most kernel LPEs are probably "Jail" escapes, and
         | it's not feasible to filter them out with system call
         | sandboxing, because LPEs occur in innocuous system calls, too.
        
         | tomjen3 wrote:
         | If anything Docker is underused. You should have a very good
         | reason to make a deploy that is not Docker, or (if you really
         | need the extra security) a VM that runs one thing only (and so
         | is essentially a more resource requiring Docker).
         | 
         | If you don't, then it becomes much harder to answer the
         | question of what exactly is deployed on a given server and what
         | it takes to bring it up again if it goes down hard. If you but
         | everything in Docker files, then the answer is whatever is set
         | in the latest docker-compose file.
        
         | m463 wrote:
         | I've always hated the docker model of the image namespace. It's
         | like those cloud-based routers you can buy.
         | 
         | Docker actively prevents you from having a private repo. They
         | don't want you to point away from their cloud.
         | 
         | Redhat understood this and podman allows you to have a private
         | docker infrastructure, disconnected from docker hub.
         | 
         | For my personal stuff, I would like to use "FROM scratch" and
         | build my personal containers in my own ecosystem.
        
           | Carrok wrote:
           | > Docker actively prevents you from having a private repo.
           | 
           | In what ways? I use private repos daily with no issues.
        
         | ranger207 wrote:
         | Docker's good at packaging, and Kubernetes is good at providing
         | a single API to do all the infra stuff like scheduling,
         | storage, and networking. I think that if someone sat down and
         | tried to create a idealized VM management solution that covered
         | everything between "dev pushes changes" to "user requests
         | website" then it'd probably have a single image for each VM to
         | run (like Docker has a single image for each container to run)
         | then management of VM hosts, storage, networking, and
         | scheduling VMs to run on which host would wind up looking a lot
         | like k8s. You could certainly do that with VMs but for various
         | path dependency reasons people do that with containers instead
         | and nobody's got a well adopted system for doing the same with
         | VMs
        
       | osigurdson wrote:
       | When thinking about multi-tenancy, remember that your bank
       | doesn't have a special VM or container, just for you.
        
         | 01HNNWZ0MV43FF wrote:
         | My bank doesn't even have 2FA
        
           | jmnicolas wrote:
           | Mine neither and they use a 6 numbers pincode! This is
           | ridiculous, in comparison my home wifi password is 60+ random
           | chars long.
        
             | leononame wrote:
             | But they do ask you only two digits of the pin on each try
             | and they probably will lock your account after three
             | incorrect attempts. Not saying 6 digits is secure, but it's
             | better than everyone using "password" if they have a string
             | policy on incorrect attempts.
             | 
             | And don't hm they have 2FA for executing transactions?
             | 
             | I'm pretty sure banks are some of the most targeted IT
             | systems. I don't trust them blindly, but when it comes to
             | online security, I trust that they built a system that's
             | reasonably well secured and other cases, I'd get my money
             | back, similar to credit cards.
        
         | dspillett wrote:
         | No, but they do have their own VM/container(s) separate from
         | all the other banks that use the same service, with persisted
         | data in their own storage account with its own encryption keys,
         | etc.
         | 
         | We deal with banks in DayJob - they have separate
         | VMs/containers for their own UAT & training environments, and
         | when the same bank that works in multiple regulatory
         | jurisdictions they usually have systems servicing those
         | separated too as if there were completely separate entities
         | (only bringing aggregate data back together for higher-up
         | reporting purposes).
        
       | jonathanlydall wrote:
       | Sure, it's an option which eliminates the possibility of certain
       | types of errors, but it's costing you the ability to pool
       | computing resources as efficiently as you could have with a
       | multi-tenant approach.
       | 
       | The author did acknowledge it's a trade off, but the economics of
       | this trade off may or may not make sense depending on how much
       | you need to charge your customers to remain competitive with
       | competing offerings.
        
       | bobbob1921 wrote:
       | My big struggle with docker/containers vs VMs is the storage
       | layer (on containers). I'm sure it's mostly lack of experience /
       | knowledge on my end, but I never have a doubt or concern that my
       | storage is persistent and clearly defined when using a VM based
       | workload. I cannot say the same for my docker/container based
       | workloads, I'm always a tad concerned about the persistence of
       | storage, (or the resource management in regards to storage). This
       | becomes even more true as you deal with networked storage on both
       | platforms
        
         | imp0cat wrote:
         | Mount those paths that you care about to local filesystem.
         | Otherwise, you're always one `docker system prune -a -f
         | --volumes` from a disaster.
        
         | amluto wrote:
         | It absolutely boggles my mind that read-only mode is not the
         | default in Docker. By default, every container has an extra,
         | unnamed, writable volume: its own root. Typo in your volume
         | mount? You're writing to root, and you _will_ lose data.
         | 
         | Of course, once this is fixed and you start using read-only
         | containers, one wonders why "container" exists as a persistent,
         | named concept.
        
       | fsckboy wrote:
       | just as a meta idea, i'm mystified that systems folks find it
       | impossible to create protected mode operating systems that are
       | protected, and then we all engage in wasteful kluges like VMs.
       | 
       | i'm not anti-VM, they're great technology, i just don't think it
       | should be the only way to get protection. VMs are incredibly
       | inefficient... what's that you say, they're not? ok, then why
       | aren't they integrated into protected mode OSes so that they will
       | actually be protected?
        
         | bigbones wrote:
         | Because it would defeat the purpose. Turns out we don't trust
         | the systems folks all that much
        
         | Veserv wrote:
         | You are right, but not in the way you think. You are completely
         | correct that techniques that can make virtual machines secure
         | could also be applied to make operating systems secure. So, if
         | we have secure virtual machines, why do we not have secure
         | operating systems?
         | 
         | Trick question. The virtual machines are not secure, they are
         | just more obscure. There are plenty of virtual machine escapes
         | that invalidate the security of the system and allow lateral
         | takeover. The only real difference is that hypervisors are
         | generally configured to have less ambient sharing by default,
         | so you need actual hypervisor vulnerabilities rather than just
         | relying on dumb service configuration.
         | 
         | To quote Theo de Raadt (OpenBSD BDFL) [1]:
         | 
         | "x86 virtualization is about basically placing another nearly
         | full kernel, full of new bugs, on top of a nasty x86
         | architecture which barely has correct page protection. Then
         | running your operating system on the other side of this brand
         | new pile of shit.
         | 
         | You are absolutely deluded, if not stupid, if you think that a
         | worldwide collection of software engineers who can't write
         | operating systems or applications without security holes, can
         | then turn around and suddenly write virtualization layers
         | without security holes.
         | 
         | You've seen something on the shelf, and it has all sorts of
         | pretty colours, and you've bought it."
         | 
         | [1] https://marc.info/?l=openbsd-misc&m=119318909016582
        
           | elric wrote:
           | Hah, I was going to post the same quote when I read the
           | parent comment. Glad to see I'm not the only grump who
           | remembers TDR quotes.
           | 
           | But he's right. And with the endless stream of leaky CPUs and
           | memory (spectre, rowhammer, etc) he's even more right now
           | than he was 17 years ago.
           | 
           | There are all kinds of things being done to mitigate multi-
           | tenant security risks in the Confidential Computing space
           | (with Trusted Execution Environments, Homomorphic Encryption,
           | or even Secure Multiparty Computation), but these are all
           | incredibly complex and largely bolted on to an insecure base.
           | 
           | It's just really, *really*, hard to make something non-
           | trivial fully secure. "It depends on your threat model" used
           | to be a valid statement, but with everyone running all of
           | their code on top of basically 3 platforms owned by
           | megacorps, I'm not sure even that is true anymore.
        
             | tptacek wrote:
             | Microarchitectural attacks are an even bigger problem for
             | shared-kernel multitenant systems!
        
           | tptacek wrote:
           | This Theo quote from 18 years ago gets brought up a lot. It's
           | referring to a different era in virtualization (it
           | practically predates KVM, and certainly widespread use of
           | KVM). You can more or less assume he's talking about running
           | things under VMWare.
           | 
           | In the interim:
           | 
           | * The Linux virtualization interface has been standardized
           | --- everything uses the same small KVM interface
           | 
           | * Security research matured and, in particular, mobile device
           | jailbreaking have made the LPE attack surface relevant, so
           | people have audited and fuzzed the living hell out of KVM
           | 
           | * Maximalist C/C++ hypervisors have been replaced by
           | lightweight virtualization, which codebases are generally
           | written in memory-safe Rust.
           | 
           | At the very least, the "nearly full kernel" thing is totally
           | false now; that "extra" kernel (the userland hypervisor) is
           | now probably the most trustworthy component in the whole
           | system.
           | 
           | I would be surprised if even Theo stuck up for that argument
           | today, but if he did, I think he'd probably get rinsed.
        
             | Veserv wrote:
             | Are you claiming it has no security vulnerabilities? If
             | yes, care to present a proof. If no, then please estimate
             | how big of a bug bounty would result in a reported critical
             | vulnerability.
             | 
             | If I put up a 1 G$ bug bounty, do you think somebody would
             | be able to claim it within a year? How about 10 M$? Please
             | justify this in light of Google only offering 250 k$ [1]
             | for a vulnerability that would totally compromise the
             | security foundation of the multi-billion (trillion?) dollar
             | Google Cloud.
             | 
             | Please also justify why the number you present is adequate
             | for securing the foundation of the multi-trillion dollar
             | cloud industry. I will accept that element on its face if
             | you say the cost would be 10 G$, but then I will demand
             | basic proof such as formal proofs of correctness.
             | 
             | [1] https://security.googleblog.com/2024/06/virtual-escape-
             | real-...
        
               | tptacek wrote:
               | I have no idea who you're talking to, but nobody on this
               | thread has claimed anything has "no security
               | vulnerabilities". If you think there isn't an implicit
               | 7-figure bounty on KVM escapes, we are operating from
               | premises too far apart for further discussion to be
               | productive.
               | 
               | My bigger problem though: I gave you a bunch of
               | substantive, axiomatic arguments, and you responded to
               | none of them. Of the three of them, which were you
               | already aware of? How did your opinion change after
               | learning about the other ones? You cited a 2007 Theo
               | argument in 2024, so I'm going to have trouble with the
               | idea that you were aware of all of them; again, I think
               | even Theo would be correcting your original post.
               | 
               |  _later_
               | 
               | You've written about the vulnerability brokers you know
               | in other posts here; I assume we can just have a
               | substantive, systems based debate about this claim,
               | without needing to cite Theo or Joanna Rutkowska or
               | whatever.
        
               | Veserv wrote:
               | You presented arguments, but did not present any
               | substantive, quantitative effects attributed to those
               | changes. You have presented no quantitative means of
               | evaluating security.
               | 
               | Furthermore, you have presented no empirical evidence
               | that those changes actually result in _meaningful_
               | security. No, I do not mean "better", I mean meaningful,
               | as in can protect against commercially-motivated hackers.
               | 
               | None of the systems actually certified to protect against
               | state-actors used such a nonsensical process as imagining
               | improvements and then just assuming things are better.
               | Show a proof of correctness and a NSA pentest that fails
               | to find any vulnerabilities, then we can start talking.
               | Barring that, the explicit, above-board bug bounty
               | provides a okay lower bound on security. You really need
               | a more stable process, but it is at least a starting
               | point.
               | 
               | And besides that, a 7-figure number is paltry. Google
               | Cloud brings in, what, 11 figures? The operations of a
               | billion dollar company should not be secured to a level
               | of only a few million dollars.
               | 
               | So again, proofs of correctness and demonstrated
               | protection against teams with tens to hundreds of
               | millions in budget (i.e team of 5 competent offensive
               | security specialists for 2 years, NSO group for a year,
               | etc.). Anything less is insufficient to bet trillions of
               | dollars of commerce and countless lives on.
        
               | tptacek wrote:
               | So that's a no, then.
               | 
               | Actual LOL at "an NSA pentest".
               | 
               |  _Slightly later_
               | 
               | A friend points out I'm being too harsh here, and that
               | lots of products do in fact get NSA pentests. They just
               | never get the pentest report. We regret the error.
        
           | fsflover wrote:
           | Show me a recent escape from VT-d and then you will have a
           | point.
        
             | Veserv wrote:
             | VT-x. You should get the name of the technology right
             | before defending it. VT-d is the I/O virtualization
             | technology.
             | 
             | When did it become customary to defend people making claims
             | of security instead of laughing in their face even though
             | history shows them such claims to be a endless clown
             | parade?
             | 
             | How about you present the extraordinary evidence needed to
             | support the extraordinary claim that there are no
             | vulnerabilities? I will accept simple forms of proof such
             | as a formal proof of correctness or a unclaimed 10 M$ bug
             | bounty that has never been claimed.
        
               | tptacek wrote:
               | Not an especially impressive flex, but I'm not above
               | trying to dunk on people for misspelling things either,
               | so I'm not going to high-horse you about it (obviously i
               | am).
               | 
               | The history of KVM and hardware virtualization is not an
               | endless clown parade.
               | 
               | Find a vulnerability researcher to talk to about OpenBSD
               | sometime, though.
               | 
               | https://isopenbsdsecu.re/
        
               | Veserv wrote:
               | OpenBSD is not secure by any measure. That Theo happens
               | to be right about the endless clown parade is independent
               | of his ability to develop a secure operating system.
               | 
               | I mean, jeez, even Joanna Rutkowska acknowledges the
               | foundations are iffy enough to only justify claiming
               | "reasonably secure" for Qubes OS.
               | 
               | You are making a extraordinary claim of security which
               | stands diametrically opposed to the consensus that things
               | are easily hacked. You need to present extraordinary
               | evidence to support such a claim. You can see my other
               | reply for what I would consider minimal criteria for
               | evidence.
        
               | tptacek wrote:
               | So far all I'm seeing here are appeals to the names of
               | people who I don't believe agree with your take. You're
               | going to need to actually defend the argument you made.
        
               | yjftsjthsd-h wrote:
               | > Find a vulnerability researcher to talk to about
               | OpenBSD sometime, though.
               | 
               | > https://isopenbsdsecu.re/
               | 
               | Notice that at no point does anyone actually show up with
               | a working exploit.
        
         | toast0 wrote:
         | Windows has Virtualization Based Security [1], where if your
         | system has the right hardware and the right settings, it will
         | use the virtualization support to get you a more protected
         | environment. IO-MMU seems like it was designed for
         | virtualization, but you can use it in a non-virtualized setting
         | too, etc.
         | 
         | [1] https://learn.microsoft.com/en-us/windows-
         | hardware/design/de...
        
         | ploxiln wrote:
         | The industry tends to do this everywhere: we have a system to
         | contain things, we made a mess of it, now we want to contain
         | separate instances of the systems.
         | 
         | For example, in AWS or GCP, you can isolate stuff for different
         | environments or teams with security groups and IAM policies.
         | You can separate them with separate VPCs that can't talk to
         | each other. In GCP you can separate them with "projects". But
         | soon that's not enough, companies want separate AWS accounts
         | for separate teams or environments, and they need to be grouped
         | under a parent org account, and you can have policies that
         | grant ability to assume roles cross-account ... then you need
         | separate associated groups of AWS accounts for separate
         | divisions!
         | 
         | It really never ends, companies will always want to take
         | whatever nested mess they have, and instead of cleaning it up,
         | just nest it one level further. That's why we'll be running
         | wasm in separate processes in separate containers in separate
         | VMs on many-core servers (probably managed with another level
         | of virtualization, but who can tell).
        
       | stacktrust wrote:
       | A modern virtualization architecture can be found in the OSS pKVM
       | L0 nested hypervisor for Android Virtualization Framework, which
       | has some architectural overlap with HP/Bromium AX L0 + [Hyper-V |
       | KVM | Xen] L1 + uXen L2 micro-VMs with copy-on-write memory.
       | 
       | A Bromium demo circa 2014 was a web browser where every tab was
       | an isolated VM, and every HTTP request was an isolated VM.
       | Hundreds of VMs could be launched in a couple of hundred
       | milliseconds. Firecracker has some overlap.
       | 
       |  _> Lastly, this approach is almost certainly more expensive. Our
       | instances sit idle for the most part and we pay EC2 a pretty
       | penny for the privilege._
       | 
       | With many near-idle server VMs running identical code for each
       | customer, there may be an opportunity to use copy-on-memory-write
       | VMs with fast restore of unique memory state, using the
       | techniques employed in live migration.
       | 
       | Xen/uXen/AX:
       | https://www.platformsecuritysummit.com/2018/speaker/pratt/
       | 
       | pKVM: https://www.youtube.com/watch?v=9npebeVFbFw
        
       | jefurii wrote:
       | Using VMs as the unit allows them to move to another provider if
       | they need to. They could even move to something like an on-prem
       | Oxide rack if they wanted. [Yes I know, TFA lists this as a
       | "false benefit" i.e. something they think doesn't benefit them.]
        
       | mikewarot wrote:
       | It's nice to see the Principle Of Least Access (POLA) in
       | practical use. Some day, we'll have operating systems that
       | respect it as well.
       | 
       | As more people wake up to the realization that we shouldn't trust
       | code, I expect that the number of civilization wide outages will
       | decrease.
       | 
       | Working in the cloud, they're not going to be able to use my
       | other favorite security tool, the data diode. Which can
       | positively guarantee ingress of control, while still allowing
       | egress of reporting data.
        
         | fsflover wrote:
         | > Some day, we'll have operating systems that respect it as
         | well.
         | 
         | Qubes OS has been relying on it for many years. My daily
         | driver, can't recommend it enough.
        
         | nrr wrote:
         | If you're coming by after the fact and scratching your head at
         | what a data diode is, Wikipedia's page on the subject is a
         | decent crib document.
         | <https://en.wikipedia.org/wiki/Unidirectional_network>
        
       | SunlitCat wrote:
       | VMs are awesome for what they can offer. Docker (and the like)
       | are kinda a lean VM for a specific tool scenario.
       | 
       | What I would like to see, would be more App virtualization
       | software which isolates the app from the underlying OS enough to
       | provide an safe enough cage for the app.
       | 
       | I know there are some commercial offerings out there (and a free
       | one), but maybe someone can chime in has some opinions about them
       | or know some additional ones?
        
         | stacktrust wrote:
         | HP business PCs ship with SureClick based on OSS uXen,
         | https://news.ycombinator.com/item?id=41071884
        
           | SunlitCat wrote:
           | Thank you for sharing, didn't know that one!
        
             | stacktrust wrote:
             | It's from the original Xen team. Subsequently cloned by MS
             | as MDAG (Defender Application Guard).
        
               | SunlitCat wrote:
               | Cool! I know MDAG and actually it's a pretty neat
               | concept, kinda.
        
         | peddling-brink wrote:
         | That's what containers attempt to do. But it's not perfect.
         | Adding a layer like gvisor helps, but again the app is still
         | interacting with the host kernel so kernel exploits are still
         | possible. What additional sandboxing are you thinking of?
        
           | SunlitCat wrote:
           | Maybe I am a bit naive, but in my mind it's just a simple
           | software running between the OS and the tool in question
           | which runs said software in some kind of virtualization,
           | passing all requests to the OS after a check what they might
           | want to do.
           | 
           | I know that's what said tools are offering, but installing
           | (and running) docker on Windows feels like loading up a whole
           | other OS insides OS, so that even VM (Software) looks lean
           | compared to that!
           | 
           | But I admit, that I have no real experience with docker and
           | the like.
        
       | smitty1e wrote:
       | > Switching to another provider would be non-trivial, and I don't
       | see the VM as a real benefit in this regard. The barrier to
       | switching is still incredibly high.
       | 
       | This point is made in the context of VM bits, but that switching
       | cost could (in theory, haven't done it myself) be mitigated
       | using, e.g. Terraform.
       | 
       | The brace-for-shock barrier at the enterprise level is going to
       | be exfiltrating all of that valuable data. Bezos is running a
       | Hotel California for that data: "You can checkout any time you
       | like, but you can never leave" (easily).
        
         | tetha wrote:
         | Heh. We're in the process of moving a service for a few of our
         | larger customers over due to some variety of emergencies, let's
         | keep it at that.
         | 
         | It took us 2-3 days of hustling to get the stuff running and
         | production ready and providing the right answers. This is the
         | "Terraform and Ansible-Stuff" stage of a real failover. In a
         | full infrastructure failover, I'd expect it to take us 1-2 very
         | long days to get 80% running and then up to a week to be fully
         | back on track and another week of shaking out strange issues.
         | And then a week or two of low-availability from the ops-team.
         | 
         | However, for 3 large customers using that product,
         | cybersecurity and compliance said no. They said no about 5-6
         | weeks ago and project to have an answer somewhere within the
         | next 1-2 months. Until then, the amount of workarounds and
         | frustration growing around it is rather scary. I hope I can
         | contain it to some places in which there is no permanent damage
         | for the infrastructure.
         | 
         | Tech isn't necessarily the hardest thing in some spaces.
        
       | kkfx wrote:
       | As much stuff you add as much attack surface you have.
       | Virtualized infra are a commercial need, an IT and Operation
       | OBSCENITY definitively never safe in practice.
        
       | tptacek wrote:
       | The cool kids have been combining containers and hardware
       | virtualization for something like 10 years now (back to QEMU-Lite
       | and kvmtool). Don't use containers if the abstraction gets in
       | your way, of course, but if they work for you --- as a mechanism
       | for packaging and shipping software and coordinating deployments
       | --- there's no reason you need to roll all the way back to
       | individually managed EC2 instances.
       | 
       | A short survey on this stuff:
       | 
       | https://fly.io/blog/sandboxing-and-workload-isolation/
        
         | mwcampbell wrote:
         | Since you're here, I was just thinking about how feasible it
         | would be to run a microVM-per-tenant setup like this on Fly. I
         | guess it would require some automation to create a Fly app for
         | each customer. Is this something you all have thought about?
        
           | tptacek wrote:
           | Extraordinarily easy. It's a design goal of the system. I
           | don't want to crud up the thread; this whole "container vs.
           | VM vs. dedicated hardware" debate is dear to my heart. But
           | feel free to drop me a line if you're interested in our take
           | on it.
        
       | vin10 wrote:
       | > If you wouldn't trust running it on your host, you probably
       | shouldn't run it in a container as well.
       | 
       | - From a Docker/Moby Maintainer
        
       | ploxiln wrote:
       | > we operate in networks where outbound MQTT and HTTPS is simply
       | not allowed (which is why we rely on encrypted DNS traffic for
       | device-to-Console communication)
       | 
       | HTTPS is not allowed (locked down for security!), so
       | communication is smuggled over DNS? uhh ... I suspect that a lot
       | of what the customer "security" departments do, doesn't really
       | make sense ...
        
       | er4hn wrote:
       | One thing I wasn't able to grok from the article is orchestration
       | of VMs. Are they using AWS to manage the VM lifecycles, restart
       | them, etc?
       | 
       | Last time I looked into this for on-prem the solutions seemed
       | very enterprise, pay the big bux, focused. Not a lot in the OSS
       | space. What do people use for on-prem VM orchestration that is
       | OSS?
        
         | jinzo wrote:
         | Depends what is your scale, but I used oVirt and Proxmox in the
         | past, and it was (especially oVirt) very enterprisey but OSS.
        
       ___________________________________________________________________
       (page generated 2024-07-25 23:05 UTC)