[HN Gopher] Unfashionably secure: why we use isolated VMs
___________________________________________________________________
Unfashionably secure: why we use isolated VMs
Author : mh_
Score : 157 points
Date : 2024-07-25 17:00 UTC (6 hours ago)
(HTM) web link (blog.thinkst.com)
(TXT) w3m dump (blog.thinkst.com)
| PedroBatista wrote:
| As a permanent "out of style" curmudgeon in the last ~15 years, I
| like that people are discovering that maybe VMs are in fact the
| best approach for a lot of workloads and the LXC cottage industry
| and Docker industrial complex that developed around solving
| problems created by themselves or solved decades ago might need
| to take a hike.
|
| Modern "containers" were invented to make things more
| reproducible ( check ) and simplify dev and deployments ( NOT
| check ).
|
| Personally FreeBSD Jails / Solaris Zones are the thing I like to
| dream are pretty much as secure as a VM and a perfect fit for a
| sane dev and ops workflow, I didn't dig too deep into this is
| practice, maybe I'm afraid to learn the contrary, but I hope not.
|
| Either way Docker is "fine" but WAY overused and overrated IMO.
| ganoushoreilly wrote:
| Docker is great, way overused 100%. I believe a lot of it
| started as "cost savings" on resource usage. Then it became the
| trendy thing for "scalability".
|
| When home enthusiasts build multi container stacks for their
| project website, it gets a bit much.
| applied_heat wrote:
| Solves dependency version hell also
| theLiminator wrote:
| Solves it in the same sense that it's a giant lockfile. It
| doesn't solve the other half where updates can horribly
| break your system and you run into transitive version
| clashes.
| bornfreddy wrote:
| But at least you can revert back to the original
| configuration (as you can with VM, too).
| Spivak wrote:
| It solves it in the sense that it empowers the devs to
| update their dependencies on their own time and ops can
| update the underlying infrastructure fearlessly. It
| turned a coordination problem into a non-problem.
| sitkack wrote:
| It doesn't solve it, it makes it tractable so you can use
| the scientific method to fix problems as opposed to voodoo.
| everforward wrote:
| > Modern "containers" were invented to make thinks more
| reproducible ( check ) and simplify dev and deployments ( NOT
| check ).
|
| I do strongly believe deployments of containers are easier. If
| you want something that parallels a raw VM, you can "docker
| run" the image. Things like k8s can definitely be complicated,
| but the parallel there is more like running a whole ESXi
| cluster. Having done both, there's really only a marginal
| difference in complexity between k8s and an ESXi cluster
| supporting a similar feature set.
|
| The dev simplification is supposed to be "stop dealing with
| tickets from people with weird environments", though it
| admittedly often doesn't apply to internal application where
| devs have some control over the environment.
|
| > Personally FreeBSD Jails / Solaris Zones are the thing I like
| to dream are pretty much as secure as a VM and a perfect fit
| for a sane dev and ops workflow
|
| I would be interested to hear how you use them. From my
| perspective, raw jails/zones are missing features and
| implementing those features on top of them ends up basically
| back at Docker (probably minus the virtual networking). E.g.
| jails need some way to get new copies of the code that runs in
| them, so you can either use Docker or write some custom
| Ansible/Chef/etc that does basically the same thing.
|
| Maybe I'm wrong, and there is some zen to be found in raw-er
| tools.
| dboreham wrote:
| For me it's about the ROAC property (Runs On Any Computer). I
| prefer working with stuff that I can run. Running software is
| live software, working software, loved software. Software that
| only works in weird places is bad, at least for me. Docker is
| pretty crappy in most respects, but it has the ROAC going for
| it.
|
| I would _love_ to have a "docker-like thing" (with ROAC) that
| used VMs not containers (or some other isolation tech that
| works). But afaik that thing does not yet exist. Yes there are
| several "container-tool, but we made it use VMs" (firecracker
| and downline), but they all need weirdo special setup, won't
| run on my laptop, or a generic Digitalocean VM.
| 01HNNWZ0MV43FF wrote:
| Yeah that's kind of a crummy tradeoff.
|
| Docker is "Runs on any Linux, mostly, if you have a new
| enough kernel" meaning it packages a big VM anyway for
| Windows and macOS
|
| VMs are "Runs on anything! ... Sorta, mostly, if you have VM
| acceleration" meaning you have to pick a VM software and hope
| the VM doesn't crash for no reason. (I have real bad luck
| with UTM and VirtualBox on my Macbook host for some reason.)
|
| All I want is everything - An APE-like program that runs on
| any OS, maybe has shims for slightly-old kernels, doesn't
| need a big installation step, and runs any useful guest OS.
| (i.e. Linux)
| neaanopri wrote:
| The modern developer yearns for Java
| smallmancontrov wrote:
| I had to use eclipse the other day. How the hell is it
| just as slow and clunky as I remember from 20 years ago?
| Does it exist in a pocket dimension where Moore's Law
| doesn't apply?
| TurningCanadian wrote:
| That's not Java's fault though. IntelliJ IDEA is also
| built on Java and runs just fine.
| qwery wrote:
| I think it's pretty remarkable to see any application in
| continuous use for so long, especially with so few
| changes[0] -- Eclipse must be doing something right!
|
| Maintaining (if not actively improving/developing) a
| piece of useful software without performance
| _degradation_ -- that 's a win.
|
| Keeping that up for decades? That's exceptional.
|
| [0] "so few changes": I'm not commenting on the amount of
| work done on the project or claiming that there is no
| useful/visible features added or upgrades, but referring
| to Eclipse of today feeling like the same application as
| it always did, and that Eclipse hasn't had multiple
| alarmingly frequent "reboots", "overhauls", etc.
|
| [?] keeping performance constant over the last decade or
| two is a win, relatively speaking, anyway
| dijit wrote:
| I agree, that you've pointed it out to me makes it
| obvious that this is not the norm, and we _should_
| celebrate this.
|
| I'm reminded of Casey Muratori's rant on Visual Studio; a
| program that largely feels like it hasn't changed much
| but clearly has regressed in performance massively;
| https://www.youtube.com/watch?v=GC-0tCy4P1U
| password4321 wrote:
| > _without performance degradation_
|
| Not accounting for Moore's Law, yikes. Need a comparison
| adjusted for "today's dollars".
| gryfft wrote:
| Maybe just the JVM.
| mschuster91 wrote:
| Java's ecosystem is just as bad. Gradle is insanely
| flexible but people create abominations out of it, Maven
| is extremely rigid so people resort to even worse
| abominations to get basic shit done.
| compsciphd wrote:
| docker is your userspace program carries all its user space
| dependencies with it and doesn't depend on the userspace
| configuration of the underlying system.
|
| What I argued in my paper is that systems like docker (i.e.
| what I created before it), improve over VMs and (even
| Zones/ZFS) in their ability to really run ephemeral
| computation. i.e. if it takes microseconds to setup the
| container file system, you can run a boatload of
| heterogeneous containers even if they only needed to run
| for very shot periods of time). Solaris Zones/ZFS didn't
| lend itself to heterogeneous environments, but simply
| cloning as single homogeneous environment, while VMs
| suffered from that problem, they also (at least at the
| time, much improved as of late) required a reasonably long
| bootup time.
| ThreatSystems wrote:
| Vagrant / Packer?
| gavindean90 wrote:
| With all the mind share that terraform gets you would thing
| vagrant would at least be known but alas
| tptacek wrote:
| Somebody educate me about the problem Packer would solve
| for you in 2024?
| kasey_junk wrote:
| I think the thread is more about how docker was a
| reaction to the vagrant/packer ecosystem that was deemed
| overweight but was in many ways was a "docker like thing"
| but VMs.
| tptacek wrote:
| Oh, yeah, I'm not trying to prosecute, I've just always
| been Packer-curious.
| yjftsjthsd-h wrote:
| What's a better way to make VM images?
| gryfft wrote:
| I've been meaning to do a bhyve deep dive for years, my gut
| feelings being much the same as yours. Would appreciate any
| recommended reading.
| Gud wrote:
| Read the fine manual and handbook.
| nimish wrote:
| Clear Containers/Kata Containers/firecracker VMs showed that
| there isn't really a dichotomy here. Why we aren't all using HW
| assisted containers is a mystery.
| turtlebits wrote:
| Engineers are lazy, especially Ops. Until it's easier to get
| up and running and there are tangible benefits, people won't
| care.
| tptacek wrote:
| It's not at all mysterious: to run hardware-virtualized
| containers, you need your compute hosted on a platform that
| will allow KVM. That's a small, expensive, tenuously
| available subset of AWS, which is by far the dominant compute
| platform.
| Spivak wrote:
| So... Lambda, Fargate, and EC2. The only thing you can't
| really do this with is EKS.
|
| Like Firecracker was made by AWS to run containers on their
| global scale KVM, EC2.
| tptacek wrote:
| Lambda and Fargate are implementations of the idea, not a
| way for you yourself to do any kind of KVM container
| provisioning. You can't generally do this on EC2; you
| need special instances for it.
|
| For a variety of reasons, I'm pretty familiar with
| Firecracker.
| turtlebits wrote:
| Honestly, it really doesn't matter whether it's VMs or Docker.
| The docker/container DX is so much better than VMWare/QEMU/etc.
| Make it easy to run workloads in VMs/Firecracker/etc and you'll
| see people migrate.
| packetlost wrote:
| I mean, Vagrant was basically docker before docker. People
| used it. But it turns out the overhead over booting a full VM
| + kernel adds latency which is undesirable for development
| workloads. The techniques used by firecracker could be used,
| but I suspect the overhead of allocating a namespace and
| loading a process will always be less than even restoring
| from a frozen VM, so I wouldn't hold my breath on it swinging
| back in VM's direction for developer workloads ever.
| yjftsjthsd-h wrote:
| It would be interesting to see a microvm
| (kata/firecracker/etc.) version of vagrant. And open
| source, of course. I can't see any technical reason why it
| would be particularly difficult.
| compsciphd wrote:
| As the person who created docker (well, before docker - see
| https://www.usenix.org/legacy/events/atc10/tech/full_papers/...
| and compare to docker), I argued that it wasn't just good for
| containers, but could be used to improve VM management as well
| (i.e. a single VM per running image - seehttps://www.usenix.org
| /legacy/events/lisa11/tech/full_papers...)
|
| I then went onto built a system with kubernetes that enabled
| one to run "kubernetes pods" in independent VMs -
| https://github.com/apporbit/infranetes (as well as create
| hybrid "legacy" VM / "modern" container deployments all managed
| via kubernetes.)
|
| - as a total aside (while I toot my own hort on the topic of
| papers I wrote or contributed to), note the reviewer of this
| paper that originally used the term Pod for a running container
| -
| https://www.usenix.org/legacy/events/osdi02/tech/full_papers...
| - explains where Kubernetes got the term from.
|
| I'd argue that FreeBSD Jails / Solaris Zones (Solaris Zone/ZFS
| inspired my original work) really aren't any more secure than
| containers on linux, as they all suffer from the same
| fundamental problem of the entire kernel being part of one's
| "tcb", so any security advantage they have is simply due lack
| of bugs, not simply a better design.
| ysnp wrote:
| Would you say approaches like gvisor or nabla containers
| provide more/enough evolution on the security front? Or is
| there something new on the horizon that excites you more as a
| prospect?
| compsciphd wrote:
| been out of the space for a bit (though interviewing again,
| so might get back into it), gvisor at least as the
| "userspace" hypervisor, seemed to provide minimal value vs
| modern hypervisor systems with low overhead / quick boot
| VMs (ala firecracker). With that said, I only looked at it
| years ago, so I could very well be out of date on it.
|
| Wasn't aware of Nabla, but they seem to be going with the
| unikernel approach (based on a cursory look at them).
| Unikernels have been "popular" (i.e. multiple attempts) in
| the space (mostly to basically run a single process app
| without any context switches), but it creates a process
| that is fundamentally different than what you develop and
| is therefore harder to debug.
|
| while the unikernels might be useful in the high frequency
| trading space (where any time savings are highly valued),
| I'm personally more skeptical of them in regular world
| usage (and to an extent, I think history has born this out,
| as it doesn't feel like any of the attempts at it, has
| gotten real traction)
| tptacek wrote:
| Modern gVisor uses KVM, not ptrace, for this reason.
| compsciphd wrote:
| so I did a check, it would seem that gvisor with kvm,
| mostly works for bare metal, not on existing VMs (nested
| virtualization).
|
| https://gvisor.dev/docs/architecture_guide/platforms/
|
| "Note that while running within a nested VM is feasible
| with the KVM platform, the systrap platform will often
| provide better performance in such a setup, due to the
| overhead of nested virtualization."
|
| I'd argue then for most people (unless have your own
| baremetal hyperscaler farm), one would end up using
| gvisor without kvm, but speaking from a place of
| ignorance here, so feel free to correct me.
| the_duke wrote:
| GVisor basically works by intercepting all Linux syscalls,
| and emulating a good chunk of the Linux kernel in userspace
| code. In theory this allows lowering the overhead per VM,
| and more fine-grained introspection and rate limiting /
| balancing across VMs, because not every VM needs to run
| it's own kernel that only interacts with the environment
| through hardware interfaces. Interaction happens through
| the Linux syscall ABI instead.
|
| From an isolation perspective it's not more secure than a
| VM, but less, because GVisor needs to implement it's own
| security sandbox to isolate memory, networking, syscalls,
| etc, and still has to rely on the kernel for various
| things.
|
| It's probably more secure than containers though, because
| the kernel abstraction layer is separate from the actual
| host kernel and runs in userspace - if you trust the
| implementation... using a memory-safe language helps there.
| (Go)
|
| The increased introspectioncapabiltiy would make it easier
| to detect abuse and to limit available resources on a more
| fine-grained level though.
|
| Note also that GVisor has quite a lot of overhead for
| syscalls, because they need to be piped through various
| abstraction layers.
| compsciphd wrote:
| I actually wonder how much "overhead" a VM actually has.
| i.e. a linux kernel that doesn't do anything (say perhaps
| just boots to an init that mounts proc and every n
| seconds read in/prints out /proc/meminfo) how much memory
| would the kernel actually be using?
|
| So if processes in gvisor map to processes on the
| underlying kernel, I'd agree it gives one a better
| ability to introspect (at least in an easy manner).
|
| It gives me an idea that I'd think would be interesting
| (I think this has been done, but it escapes me where), to
| have a tool that is external to the VM (runs on the
| hypervisor host) that essentially has "read only" access
| to the kernel running in the VM to provide visibility
| into what's running on the machine without an agent
| running within the VM itself. i.e. something that knows
| where the processes list is, and can walk it to enumerate
| what's running on the system.
|
| I can imagine the difficulties in implementing such a
| thing (especially on a multi cpu VM), where even if you
| could snapshot the kernel memory state efficiently, it be
| difficult to do it in a manner that provided a
| "safe/consistent" view. It might be interesting if the
| kernel itself could make a hypercall into the hypervisor
| at points of consistency (say when finished making an
| update and about to unlock the resource) to tell the tool
| when the data can be collected.
| xtacy wrote:
| > to have a tool that is external to the VM (runs on the
| hypervisor host) that essentially has "read only" access
| to the kernel running on the VM to provide visibility
| into what's running on the machine without an agent
| running within the VM itself
|
| Not quite what you are after, but comes close ... you
| could run gdb on the kernel in this fashion and inspect,
| pause, step through kernel code:
| https://stackoverflow.com/questions/11408041/how-to-
| debug-th....
| stacktrust wrote:
| https://github.com/Wenzel/pyvmidbg
| LibVMI-based debug server, implemented in Python.
| Building a guest aware, stealth and agentless full-system
| debugger.. GDB stub allows you to debug a remote process
| running in a VM with your favorite GDB frontend. By
| leveraging virtual machine introspection, the stub
| remains stealth and requires no modification of the
| guest.
|
| more: https://github.com/topics/virtual-machine-
| introspection
| ecnahc515 wrote:
| > I actually wonder how much "overhead" a VM actually
| has. i.e. a linux kernel that doesn't do anything (say
| perhaps just boots to an init that mounts proc and every
| n seconds read in/prints out /proc/meminfo) how much
| memory would the kernel actually be using?
|
| There's already some memory sharing available using DAX
| in Kata Containers at least: https://github.com/kata-
| containers/kata-containers/blob/main...
| bombela wrote:
| > As the person who created docker (well, before docker - see
| https://www.usenix.org/legacy/events/atc10/tech/full_papers/.
| .. and compare to docker)
|
| I picked the name and wrote the first prototype (python2) of
| Docker in 2012. I had not read your document (dated 2010). I
| didn't really read English that well at the time, I probably
| wouldn't have been able to understand it anyways.
|
| https://en.wikipedia.org/wiki/Multiple_discovery
|
| More details for the curious: I wrote the design doc and
| implemented the prototype. But not in a vacuum. It was a lot
| work with Andrea, Jerome and Gabriel. Ultimately, we all
| liked the name Docker. The prototype already had the notion
| of layers, lifetime management of containers and other
| fundamentals. It exposed an API (over TCP with zerorpc). We
| were working on container orchestration, and we needed a
| daemon to manage the life cycle of containers on every
| machine.
| compsciphd wrote:
| I'd note I didn't say you copied it, just that I created it
| first (i.e. "compare paper to docker". also, as you note,
| its possible someone else did it too, but at least my
| conception got through academic peer-review / patent
| office, yeah, there's a patent, never been attempted to be
| enforced though to my knowledge).
|
| when I describe my work (I actually should have used quotes
| here), I generally give air quotes when saying it, or say
| "proto docker", as it provides context for what I did
| (there's also a lot of people who view docker as synonymous
| with containerization as a whole, and I say that containers
| existed way before me). I generally try to approach it
| humbly, but I am proud that I predicted and built what the
| industry seemingly needed (or at least is heavily using).
|
| people have asked me why I didn't pursue it as a company,
| and my answer is a) I'm not much of an entrepreneur (main
| answer), and b) I felt it was a feature, not a "product",
| and would therefore only really profitable for those that
| had a product that could use it as a feature (which one
| could argue that product turned out to be clouds, i.e. they
| are the ones really making money off this feature). or as
| someone once said a feature isn't necessarily a product and
| a product isn't necessarily a company.
| anonfordays wrote:
| >Personally FreeBSD Jails / Solaris Zones are the thing I like
| to dream are pretty much as secure as a VM and a perfect fit
| for a sane dev and ops workflow, I didn't dig too deep into
| this is practice, maybe I'm afraid to learn the contrary, but I
| hope not
|
| Having run both at scale, I can confirm and assure you they are
| not as secure as VMs and did not produce sane devops workflows.
| Not that Docker is much better, but it _is_ better from the
| devops workflow perspective, and IMHO that 's why Docker "won"
| and took over the industry.
| kkfx wrote:
| A sane DevOps workflow is with declarative systems like NixOS
| or Guix System, definitively not on a VM infra in practice
| regularly not up to date, full of useless deps, on a host
| definitively not up to date, with the entire infra typically
| not much managed nor manageable and with an immense attack
| surface...
|
| VMs are useful for those who live on the shoulder of someone
| else (i.e. *aaS) witch is ALL but insecure.
| secondcoming wrote:
| I'm not sure what you're referring to here?
|
| Our cloud machines are largely VMs. Deployments mean
| building a new image and telling GCP to deploy that as
| machines come and go due to scaling. The software is up to
| date, dependencies are managed via ansible.
|
| Maybe you think VMs means monoliths? That doesn't have to
| be the case.
| nine_k wrote:
| VMs are useful when you don't own or rent dedicated
| hardware. Which is a lot of cases, especially when your
| load varies seriously over the day or week.
|
| And even if you do manage dedicated servers, it's often
| wise to use VMs on them to better isolate parts of the
| system, aka limit the blast radius.
| analognoise wrote:
| What do you think of Nix/NixOS?
| egberts1 wrote:
| Nix is trying to be like macOS's DMG but its image file is
| bit more parse-able.
| reddit_clone wrote:
| But that comes _after_ you have chosen VMs over Containers
| yes?
|
| If you are using VMs, I think NixOs/Guix is a good choice.
| Reproducible builds, Immutable OS, Immutable binaries and
| Dead easy rollback.
|
| It still looks somewhat futuristic. Hopefully gets traction.
| solarpunk wrote:
| if you're using nixos, just to do provisioning, I would
| argue OStree is a better fit.
| bspammer wrote:
| Nix is actually a really nice tool for building docker
| images: https://xeiaso.net/talks/2024/nix-docker-build/
| vundercind wrote:
| Docker's the best cross-distro rolling-release package manager
| and init system for services--staying strictly out of managing
| the base system, which is great--that I know of. I don't know
| of anything that's even close, really.
|
| All the other stuff about it is way less important to me than
| that part.
| pxc wrote:
| [delayed]
| topspin wrote:
| Isn't this discussion based on a false dichotomy? I, too, use
| VMs to isolate customers, and I use containers within those
| VMs, either with or without k8s. These tools solve different
| problems. Containers solve software management, whereas VMs
| provide a high degree of isolation.
|
| Container orchestration is where I see the great mistake in all
| of this. I consider everything running in a k8s cluster to be
| one "blast domain." Containers can be escaped. Faulty
| containers impact everyone relying on a cluster. Container
| orchestration is the thing I believe is "overused." It was
| designed to solve "hyper" scale problems, and it's being
| misused in far more modest use cases where VMs should prevail.
| I believe the existence of container orchestration and its
| misapplication has retarded the development of good VM tools: I
| dream of tools that create, deploy and manage entire VMs with
| the same ease as Docker, and that these tools have not matured
| and gained popularity because container orchestration is so
| easily misapplied.
|
| Strongly disagree about containers and dev/deployment ("NOT
| check"). I can no longer imagine development without
| containers: it would be intolerable. Container repos are a
| godsend for deployment.
| tptacek wrote:
| Jails/Zones are not pretty much as secure as a VM. They're
| materially less secure: they leave cotenant workloads sharing a
| single kernel (not just the tiny slice of the kernel KVM
| manages). Most kernel LPEs are probably "Jail" escapes, and
| it's not feasible to filter them out with system call
| sandboxing, because LPEs occur in innocuous system calls, too.
| tomjen3 wrote:
| If anything Docker is underused. You should have a very good
| reason to make a deploy that is not Docker, or (if you really
| need the extra security) a VM that runs one thing only (and so
| is essentially a more resource requiring Docker).
|
| If you don't, then it becomes much harder to answer the
| question of what exactly is deployed on a given server and what
| it takes to bring it up again if it goes down hard. If you but
| everything in Docker files, then the answer is whatever is set
| in the latest docker-compose file.
| m463 wrote:
| I've always hated the docker model of the image namespace. It's
| like those cloud-based routers you can buy.
|
| Docker actively prevents you from having a private repo. They
| don't want you to point away from their cloud.
|
| Redhat understood this and podman allows you to have a private
| docker infrastructure, disconnected from docker hub.
|
| For my personal stuff, I would like to use "FROM scratch" and
| build my personal containers in my own ecosystem.
| Carrok wrote:
| > Docker actively prevents you from having a private repo.
|
| In what ways? I use private repos daily with no issues.
| ranger207 wrote:
| Docker's good at packaging, and Kubernetes is good at providing
| a single API to do all the infra stuff like scheduling,
| storage, and networking. I think that if someone sat down and
| tried to create a idealized VM management solution that covered
| everything between "dev pushes changes" to "user requests
| website" then it'd probably have a single image for each VM to
| run (like Docker has a single image for each container to run)
| then management of VM hosts, storage, networking, and
| scheduling VMs to run on which host would wind up looking a lot
| like k8s. You could certainly do that with VMs but for various
| path dependency reasons people do that with containers instead
| and nobody's got a well adopted system for doing the same with
| VMs
| osigurdson wrote:
| When thinking about multi-tenancy, remember that your bank
| doesn't have a special VM or container, just for you.
| 01HNNWZ0MV43FF wrote:
| My bank doesn't even have 2FA
| jmnicolas wrote:
| Mine neither and they use a 6 numbers pincode! This is
| ridiculous, in comparison my home wifi password is 60+ random
| chars long.
| leononame wrote:
| But they do ask you only two digits of the pin on each try
| and they probably will lock your account after three
| incorrect attempts. Not saying 6 digits is secure, but it's
| better than everyone using "password" if they have a string
| policy on incorrect attempts.
|
| And don't hm they have 2FA for executing transactions?
|
| I'm pretty sure banks are some of the most targeted IT
| systems. I don't trust them blindly, but when it comes to
| online security, I trust that they built a system that's
| reasonably well secured and other cases, I'd get my money
| back, similar to credit cards.
| dspillett wrote:
| No, but they do have their own VM/container(s) separate from
| all the other banks that use the same service, with persisted
| data in their own storage account with its own encryption keys,
| etc.
|
| We deal with banks in DayJob - they have separate
| VMs/containers for their own UAT & training environments, and
| when the same bank that works in multiple regulatory
| jurisdictions they usually have systems servicing those
| separated too as if there were completely separate entities
| (only bringing aggregate data back together for higher-up
| reporting purposes).
| jonathanlydall wrote:
| Sure, it's an option which eliminates the possibility of certain
| types of errors, but it's costing you the ability to pool
| computing resources as efficiently as you could have with a
| multi-tenant approach.
|
| The author did acknowledge it's a trade off, but the economics of
| this trade off may or may not make sense depending on how much
| you need to charge your customers to remain competitive with
| competing offerings.
| bobbob1921 wrote:
| My big struggle with docker/containers vs VMs is the storage
| layer (on containers). I'm sure it's mostly lack of experience /
| knowledge on my end, but I never have a doubt or concern that my
| storage is persistent and clearly defined when using a VM based
| workload. I cannot say the same for my docker/container based
| workloads, I'm always a tad concerned about the persistence of
| storage, (or the resource management in regards to storage). This
| becomes even more true as you deal with networked storage on both
| platforms
| imp0cat wrote:
| Mount those paths that you care about to local filesystem.
| Otherwise, you're always one `docker system prune -a -f
| --volumes` from a disaster.
| amluto wrote:
| It absolutely boggles my mind that read-only mode is not the
| default in Docker. By default, every container has an extra,
| unnamed, writable volume: its own root. Typo in your volume
| mount? You're writing to root, and you _will_ lose data.
|
| Of course, once this is fixed and you start using read-only
| containers, one wonders why "container" exists as a persistent,
| named concept.
| fsckboy wrote:
| just as a meta idea, i'm mystified that systems folks find it
| impossible to create protected mode operating systems that are
| protected, and then we all engage in wasteful kluges like VMs.
|
| i'm not anti-VM, they're great technology, i just don't think it
| should be the only way to get protection. VMs are incredibly
| inefficient... what's that you say, they're not? ok, then why
| aren't they integrated into protected mode OSes so that they will
| actually be protected?
| bigbones wrote:
| Because it would defeat the purpose. Turns out we don't trust
| the systems folks all that much
| Veserv wrote:
| You are right, but not in the way you think. You are completely
| correct that techniques that can make virtual machines secure
| could also be applied to make operating systems secure. So, if
| we have secure virtual machines, why do we not have secure
| operating systems?
|
| Trick question. The virtual machines are not secure, they are
| just more obscure. There are plenty of virtual machine escapes
| that invalidate the security of the system and allow lateral
| takeover. The only real difference is that hypervisors are
| generally configured to have less ambient sharing by default,
| so you need actual hypervisor vulnerabilities rather than just
| relying on dumb service configuration.
|
| To quote Theo de Raadt (OpenBSD BDFL) [1]:
|
| "x86 virtualization is about basically placing another nearly
| full kernel, full of new bugs, on top of a nasty x86
| architecture which barely has correct page protection. Then
| running your operating system on the other side of this brand
| new pile of shit.
|
| You are absolutely deluded, if not stupid, if you think that a
| worldwide collection of software engineers who can't write
| operating systems or applications without security holes, can
| then turn around and suddenly write virtualization layers
| without security holes.
|
| You've seen something on the shelf, and it has all sorts of
| pretty colours, and you've bought it."
|
| [1] https://marc.info/?l=openbsd-misc&m=119318909016582
| elric wrote:
| Hah, I was going to post the same quote when I read the
| parent comment. Glad to see I'm not the only grump who
| remembers TDR quotes.
|
| But he's right. And with the endless stream of leaky CPUs and
| memory (spectre, rowhammer, etc) he's even more right now
| than he was 17 years ago.
|
| There are all kinds of things being done to mitigate multi-
| tenant security risks in the Confidential Computing space
| (with Trusted Execution Environments, Homomorphic Encryption,
| or even Secure Multiparty Computation), but these are all
| incredibly complex and largely bolted on to an insecure base.
|
| It's just really, *really*, hard to make something non-
| trivial fully secure. "It depends on your threat model" used
| to be a valid statement, but with everyone running all of
| their code on top of basically 3 platforms owned by
| megacorps, I'm not sure even that is true anymore.
| tptacek wrote:
| Microarchitectural attacks are an even bigger problem for
| shared-kernel multitenant systems!
| tptacek wrote:
| This Theo quote from 18 years ago gets brought up a lot. It's
| referring to a different era in virtualization (it
| practically predates KVM, and certainly widespread use of
| KVM). You can more or less assume he's talking about running
| things under VMWare.
|
| In the interim:
|
| * The Linux virtualization interface has been standardized
| --- everything uses the same small KVM interface
|
| * Security research matured and, in particular, mobile device
| jailbreaking have made the LPE attack surface relevant, so
| people have audited and fuzzed the living hell out of KVM
|
| * Maximalist C/C++ hypervisors have been replaced by
| lightweight virtualization, which codebases are generally
| written in memory-safe Rust.
|
| At the very least, the "nearly full kernel" thing is totally
| false now; that "extra" kernel (the userland hypervisor) is
| now probably the most trustworthy component in the whole
| system.
|
| I would be surprised if even Theo stuck up for that argument
| today, but if he did, I think he'd probably get rinsed.
| Veserv wrote:
| Are you claiming it has no security vulnerabilities? If
| yes, care to present a proof. If no, then please estimate
| how big of a bug bounty would result in a reported critical
| vulnerability.
|
| If I put up a 1 G$ bug bounty, do you think somebody would
| be able to claim it within a year? How about 10 M$? Please
| justify this in light of Google only offering 250 k$ [1]
| for a vulnerability that would totally compromise the
| security foundation of the multi-billion (trillion?) dollar
| Google Cloud.
|
| Please also justify why the number you present is adequate
| for securing the foundation of the multi-trillion dollar
| cloud industry. I will accept that element on its face if
| you say the cost would be 10 G$, but then I will demand
| basic proof such as formal proofs of correctness.
|
| [1] https://security.googleblog.com/2024/06/virtual-escape-
| real-...
| tptacek wrote:
| I have no idea who you're talking to, but nobody on this
| thread has claimed anything has "no security
| vulnerabilities". If you think there isn't an implicit
| 7-figure bounty on KVM escapes, we are operating from
| premises too far apart for further discussion to be
| productive.
|
| My bigger problem though: I gave you a bunch of
| substantive, axiomatic arguments, and you responded to
| none of them. Of the three of them, which were you
| already aware of? How did your opinion change after
| learning about the other ones? You cited a 2007 Theo
| argument in 2024, so I'm going to have trouble with the
| idea that you were aware of all of them; again, I think
| even Theo would be correcting your original post.
|
| _later_
|
| You've written about the vulnerability brokers you know
| in other posts here; I assume we can just have a
| substantive, systems based debate about this claim,
| without needing to cite Theo or Joanna Rutkowska or
| whatever.
| Veserv wrote:
| You presented arguments, but did not present any
| substantive, quantitative effects attributed to those
| changes. You have presented no quantitative means of
| evaluating security.
|
| Furthermore, you have presented no empirical evidence
| that those changes actually result in _meaningful_
| security. No, I do not mean "better", I mean meaningful,
| as in can protect against commercially-motivated hackers.
|
| None of the systems actually certified to protect against
| state-actors used such a nonsensical process as imagining
| improvements and then just assuming things are better.
| Show a proof of correctness and a NSA pentest that fails
| to find any vulnerabilities, then we can start talking.
| Barring that, the explicit, above-board bug bounty
| provides a okay lower bound on security. You really need
| a more stable process, but it is at least a starting
| point.
|
| And besides that, a 7-figure number is paltry. Google
| Cloud brings in, what, 11 figures? The operations of a
| billion dollar company should not be secured to a level
| of only a few million dollars.
|
| So again, proofs of correctness and demonstrated
| protection against teams with tens to hundreds of
| millions in budget (i.e team of 5 competent offensive
| security specialists for 2 years, NSO group for a year,
| etc.). Anything less is insufficient to bet trillions of
| dollars of commerce and countless lives on.
| tptacek wrote:
| So that's a no, then.
|
| Actual LOL at "an NSA pentest".
|
| _Slightly later_
|
| A friend points out I'm being too harsh here, and that
| lots of products do in fact get NSA pentests. They just
| never get the pentest report. We regret the error.
| fsflover wrote:
| Show me a recent escape from VT-d and then you will have a
| point.
| Veserv wrote:
| VT-x. You should get the name of the technology right
| before defending it. VT-d is the I/O virtualization
| technology.
|
| When did it become customary to defend people making claims
| of security instead of laughing in their face even though
| history shows them such claims to be a endless clown
| parade?
|
| How about you present the extraordinary evidence needed to
| support the extraordinary claim that there are no
| vulnerabilities? I will accept simple forms of proof such
| as a formal proof of correctness or a unclaimed 10 M$ bug
| bounty that has never been claimed.
| tptacek wrote:
| Not an especially impressive flex, but I'm not above
| trying to dunk on people for misspelling things either,
| so I'm not going to high-horse you about it (obviously i
| am).
|
| The history of KVM and hardware virtualization is not an
| endless clown parade.
|
| Find a vulnerability researcher to talk to about OpenBSD
| sometime, though.
|
| https://isopenbsdsecu.re/
| Veserv wrote:
| OpenBSD is not secure by any measure. That Theo happens
| to be right about the endless clown parade is independent
| of his ability to develop a secure operating system.
|
| I mean, jeez, even Joanna Rutkowska acknowledges the
| foundations are iffy enough to only justify claiming
| "reasonably secure" for Qubes OS.
|
| You are making a extraordinary claim of security which
| stands diametrically opposed to the consensus that things
| are easily hacked. You need to present extraordinary
| evidence to support such a claim. You can see my other
| reply for what I would consider minimal criteria for
| evidence.
| tptacek wrote:
| So far all I'm seeing here are appeals to the names of
| people who I don't believe agree with your take. You're
| going to need to actually defend the argument you made.
| yjftsjthsd-h wrote:
| > Find a vulnerability researcher to talk to about
| OpenBSD sometime, though.
|
| > https://isopenbsdsecu.re/
|
| Notice that at no point does anyone actually show up with
| a working exploit.
| toast0 wrote:
| Windows has Virtualization Based Security [1], where if your
| system has the right hardware and the right settings, it will
| use the virtualization support to get you a more protected
| environment. IO-MMU seems like it was designed for
| virtualization, but you can use it in a non-virtualized setting
| too, etc.
|
| [1] https://learn.microsoft.com/en-us/windows-
| hardware/design/de...
| ploxiln wrote:
| The industry tends to do this everywhere: we have a system to
| contain things, we made a mess of it, now we want to contain
| separate instances of the systems.
|
| For example, in AWS or GCP, you can isolate stuff for different
| environments or teams with security groups and IAM policies.
| You can separate them with separate VPCs that can't talk to
| each other. In GCP you can separate them with "projects". But
| soon that's not enough, companies want separate AWS accounts
| for separate teams or environments, and they need to be grouped
| under a parent org account, and you can have policies that
| grant ability to assume roles cross-account ... then you need
| separate associated groups of AWS accounts for separate
| divisions!
|
| It really never ends, companies will always want to take
| whatever nested mess they have, and instead of cleaning it up,
| just nest it one level further. That's why we'll be running
| wasm in separate processes in separate containers in separate
| VMs on many-core servers (probably managed with another level
| of virtualization, but who can tell).
| stacktrust wrote:
| A modern virtualization architecture can be found in the OSS pKVM
| L0 nested hypervisor for Android Virtualization Framework, which
| has some architectural overlap with HP/Bromium AX L0 + [Hyper-V |
| KVM | Xen] L1 + uXen L2 micro-VMs with copy-on-write memory.
|
| A Bromium demo circa 2014 was a web browser where every tab was
| an isolated VM, and every HTTP request was an isolated VM.
| Hundreds of VMs could be launched in a couple of hundred
| milliseconds. Firecracker has some overlap.
|
| _> Lastly, this approach is almost certainly more expensive. Our
| instances sit idle for the most part and we pay EC2 a pretty
| penny for the privilege._
|
| With many near-idle server VMs running identical code for each
| customer, there may be an opportunity to use copy-on-memory-write
| VMs with fast restore of unique memory state, using the
| techniques employed in live migration.
|
| Xen/uXen/AX:
| https://www.platformsecuritysummit.com/2018/speaker/pratt/
|
| pKVM: https://www.youtube.com/watch?v=9npebeVFbFw
| jefurii wrote:
| Using VMs as the unit allows them to move to another provider if
| they need to. They could even move to something like an on-prem
| Oxide rack if they wanted. [Yes I know, TFA lists this as a
| "false benefit" i.e. something they think doesn't benefit them.]
| mikewarot wrote:
| It's nice to see the Principle Of Least Access (POLA) in
| practical use. Some day, we'll have operating systems that
| respect it as well.
|
| As more people wake up to the realization that we shouldn't trust
| code, I expect that the number of civilization wide outages will
| decrease.
|
| Working in the cloud, they're not going to be able to use my
| other favorite security tool, the data diode. Which can
| positively guarantee ingress of control, while still allowing
| egress of reporting data.
| fsflover wrote:
| > Some day, we'll have operating systems that respect it as
| well.
|
| Qubes OS has been relying on it for many years. My daily
| driver, can't recommend it enough.
| nrr wrote:
| If you're coming by after the fact and scratching your head at
| what a data diode is, Wikipedia's page on the subject is a
| decent crib document.
| <https://en.wikipedia.org/wiki/Unidirectional_network>
| SunlitCat wrote:
| VMs are awesome for what they can offer. Docker (and the like)
| are kinda a lean VM for a specific tool scenario.
|
| What I would like to see, would be more App virtualization
| software which isolates the app from the underlying OS enough to
| provide an safe enough cage for the app.
|
| I know there are some commercial offerings out there (and a free
| one), but maybe someone can chime in has some opinions about them
| or know some additional ones?
| stacktrust wrote:
| HP business PCs ship with SureClick based on OSS uXen,
| https://news.ycombinator.com/item?id=41071884
| SunlitCat wrote:
| Thank you for sharing, didn't know that one!
| stacktrust wrote:
| It's from the original Xen team. Subsequently cloned by MS
| as MDAG (Defender Application Guard).
| SunlitCat wrote:
| Cool! I know MDAG and actually it's a pretty neat
| concept, kinda.
| peddling-brink wrote:
| That's what containers attempt to do. But it's not perfect.
| Adding a layer like gvisor helps, but again the app is still
| interacting with the host kernel so kernel exploits are still
| possible. What additional sandboxing are you thinking of?
| SunlitCat wrote:
| Maybe I am a bit naive, but in my mind it's just a simple
| software running between the OS and the tool in question
| which runs said software in some kind of virtualization,
| passing all requests to the OS after a check what they might
| want to do.
|
| I know that's what said tools are offering, but installing
| (and running) docker on Windows feels like loading up a whole
| other OS insides OS, so that even VM (Software) looks lean
| compared to that!
|
| But I admit, that I have no real experience with docker and
| the like.
| smitty1e wrote:
| > Switching to another provider would be non-trivial, and I don't
| see the VM as a real benefit in this regard. The barrier to
| switching is still incredibly high.
|
| This point is made in the context of VM bits, but that switching
| cost could (in theory, haven't done it myself) be mitigated
| using, e.g. Terraform.
|
| The brace-for-shock barrier at the enterprise level is going to
| be exfiltrating all of that valuable data. Bezos is running a
| Hotel California for that data: "You can checkout any time you
| like, but you can never leave" (easily).
| tetha wrote:
| Heh. We're in the process of moving a service for a few of our
| larger customers over due to some variety of emergencies, let's
| keep it at that.
|
| It took us 2-3 days of hustling to get the stuff running and
| production ready and providing the right answers. This is the
| "Terraform and Ansible-Stuff" stage of a real failover. In a
| full infrastructure failover, I'd expect it to take us 1-2 very
| long days to get 80% running and then up to a week to be fully
| back on track and another week of shaking out strange issues.
| And then a week or two of low-availability from the ops-team.
|
| However, for 3 large customers using that product,
| cybersecurity and compliance said no. They said no about 5-6
| weeks ago and project to have an answer somewhere within the
| next 1-2 months. Until then, the amount of workarounds and
| frustration growing around it is rather scary. I hope I can
| contain it to some places in which there is no permanent damage
| for the infrastructure.
|
| Tech isn't necessarily the hardest thing in some spaces.
| kkfx wrote:
| As much stuff you add as much attack surface you have.
| Virtualized infra are a commercial need, an IT and Operation
| OBSCENITY definitively never safe in practice.
| tptacek wrote:
| The cool kids have been combining containers and hardware
| virtualization for something like 10 years now (back to QEMU-Lite
| and kvmtool). Don't use containers if the abstraction gets in
| your way, of course, but if they work for you --- as a mechanism
| for packaging and shipping software and coordinating deployments
| --- there's no reason you need to roll all the way back to
| individually managed EC2 instances.
|
| A short survey on this stuff:
|
| https://fly.io/blog/sandboxing-and-workload-isolation/
| mwcampbell wrote:
| Since you're here, I was just thinking about how feasible it
| would be to run a microVM-per-tenant setup like this on Fly. I
| guess it would require some automation to create a Fly app for
| each customer. Is this something you all have thought about?
| tptacek wrote:
| Extraordinarily easy. It's a design goal of the system. I
| don't want to crud up the thread; this whole "container vs.
| VM vs. dedicated hardware" debate is dear to my heart. But
| feel free to drop me a line if you're interested in our take
| on it.
| vin10 wrote:
| > If you wouldn't trust running it on your host, you probably
| shouldn't run it in a container as well.
|
| - From a Docker/Moby Maintainer
| ploxiln wrote:
| > we operate in networks where outbound MQTT and HTTPS is simply
| not allowed (which is why we rely on encrypted DNS traffic for
| device-to-Console communication)
|
| HTTPS is not allowed (locked down for security!), so
| communication is smuggled over DNS? uhh ... I suspect that a lot
| of what the customer "security" departments do, doesn't really
| make sense ...
| er4hn wrote:
| One thing I wasn't able to grok from the article is orchestration
| of VMs. Are they using AWS to manage the VM lifecycles, restart
| them, etc?
|
| Last time I looked into this for on-prem the solutions seemed
| very enterprise, pay the big bux, focused. Not a lot in the OSS
| space. What do people use for on-prem VM orchestration that is
| OSS?
| jinzo wrote:
| Depends what is your scale, but I used oVirt and Proxmox in the
| past, and it was (especially oVirt) very enterprisey but OSS.
___________________________________________________________________
(page generated 2024-07-25 23:05 UTC)