[HN Gopher] Why Oxide Chose Illumos
___________________________________________________________________
Why Oxide Chose Illumos
Author : kblissett
Score : 209 points
Date : 2024-09-11 21:22 UTC (1 days ago)
(HTM) web link (rfd.shared.oxide.computer)
(TXT) w3m dump (rfd.shared.oxide.computer)
| Rendello wrote:
| I like the RFDs. Oxide just did a podcast episode on the process:
|
| https://oxide.computer/podcasts/oxide-and-friends/2065190
| rtpg wrote:
| > There is not a significant difference in functionality between
| the illumos and FreeBSD implementations, since pulling patches
| downstream has not been a significant burden. Conversely, the
| more advanced OS primitives in illumos have resulted in certain
| bugs being fixed only there, having been difficult to upstream to
| FreeBSD.
|
| curious about what bugs are being thought of there. Sounds like a
| very interesting situation to be in
| taspeotis wrote:
| I kagi'd Illumos and apparently Bryan Cantrill was a maintainer.
|
| Bryan Cantrill is CTO of Oxide [1].
|
| I assume that has no bearing on the choice, otherwise it would be
| mentioned in the discussion.
|
| [1] https://bcantrill.dtrace.org/2019/12/02/the-soul-of-a-new-
| co...
| gyre007 wrote:
| Yeah I came here to say that Bryan worked at Sun so why do they
| even need to write this post (yes, I appreciate the techinical
| reasons, just wanted to highlight the fact via a subtle dig
| :-))
| sausagefeet wrote:
| This isn't a blog post from an Oxide, it's a link to their
| internal RFD which they use to make decisions.
| gyre007 wrote:
| I never said it was a post by Oxide.
| sausagefeet wrote:
| Early Oxide founders came from Joyent which was an illumos shop
| and Cantrill is quite vocal about the history of Solaris,
| OpenSolaris, and illumos.
| codetrotter wrote:
| > Joyent which was an illumos shop
|
| And before that, they used to run FreeBSD.
|
| Mentioned for example in this comment by Bryan Cantrill a
| decade ago:
|
| https://news.ycombinator.com/item?id=6254092
|
| > [...] Speaking only for us (I work for Joyent), we have
| deployed hundreds of thousands of zones into production over
| the years -- and Joyent was running with FreeBSD jails before
| that [...]
|
| And I've seen some other primary sources (people who worked
| at Joyent) write that online too.
|
| And Bryan Cantrill, and several other people, came from Sun
| Microsystems to Joyent. Though I've never seen it mentioned
| which order that happened in; was it people from Sun that
| joined Joyent and then Joyent switched from FreeBSD to
| Illumos and creating SmartOS? Or had Joyent already switched
| to Illumos before the people that came from Sun joined?
|
| I would actually really enjoy a long documentary or talk from
| some people that worked at Joyent about the history of the
| company, how they were using FreeBSD and when they switched
| to Illumos and so on.
| panick21_ wrote:
| Joyent was using Solaris before Bryan worked there. Listen
| to the this podcast with Bryan and his co-founder about
| their origin story:
|
| https://www.youtube.com/watch?v=eVkIKm9pkPY
|
| This is about as good as you are gone get on the topic of
| Joyant history.
| codetrotter wrote:
| Thank you, I will watch that right away :)
| selykg wrote:
| Joyent also merged with TextDrive, which is where the
| FreeBSD part came from. TextDrive was an early Rails host,
| and could even do it in a shared hosting environment, which
| is where I think a lot of the original user base came from
| (also TextPattern)
|
| As I recall they were also the original host of Twitter,
| which if I recall was Rails back in the day.
| throw0101b wrote:
| > _As I recall [Joyent] were also the original host of
| Twitter, which if I recall was Rails back in the day._
|
| Up until 2008:
|
| * https://web.archive.org/web/20080201142828/http://www.j
| oyeur...
| mzi wrote:
| @bcantrill is the CTO of Oxide.
| taspeotis wrote:
| Yup, thanks
| panick21_ wrote:
| Bryan Cantrill also ported KVM to Illumos. At Joyent they had
| plenty of experience with KVM. See:
|
| https://www.youtube.com/watch?v=cwAfJywzk8o
|
| As far as I know, Bryan didn't personally work on the porting
| of bhyve (this might be wrong).
|
| So if anything, that would point to KVM as the 'familiar' thing
| given how many former Joyant people were there.
| bonzini wrote:
| KVM got more and more integrated with the rest of Linux as
| more virtualization features became general system features
| (e.g. posted interrupts). Also Google and Amazon are working
| more upstream and the pace of development increased a lot.
|
| Keeping a KVM port up to date is a huge effort compared to
| bhyve, and they probably had learnt that in the years between
| the porting of KVM and the founding of Oxide.
| elijahwright wrote:
| Where is Max Bruning these days?
| bonzini wrote:
| > QEMU is often the subject of bugs affecting its reliability and
| security.
|
| {{citation needed}}?
|
| When I ran the numbers in 2019, there hadn't been guest
| exploitable vulnerabilities that affected devices normally used
| for IaaS for 3 years. Pretty much every cloud outside the big
| three (AWS, GCE, Azure) runs on QEMU.
|
| Here's a talk I gave about it that includes that analysis:
|
| slides - https://kvm-forum.qemu.org/2019/kvmforum19-bloat.pdf
|
| video - https://youtu.be/5TY7m1AneRY?si=Sj0DFpRav7PAzQ0Y
| _rs wrote:
| I thought AWS uses KVM, which is the same VM that QEMU would
| use? Or am I mistaken?
| bonzini wrote:
| AWS uses KVM in the kernel but they have a different, non-
| open source userspace stack for EC2; plus Firecracker which
| is open source but is only used for Lambda, and runs on EC2
| bare metal instances.
|
| Google also uses KVM with a variety of userspace stacks: a
| proprietary one (tied to a lot of internal Google
| infrastructure but overall a lot more similar to QEMU than
| Amazon's) for GCE, gVisor for AppEngine or whatever it is
| called these days, crosvm for ChromeOS, and QEMU for Android
| Emulator.
| tptacek wrote:
| Lambda and Fargate.
| dastbe wrote:
| unless something has changed in the past year, fargate
| still runs each task in a single use ec2 vm with no
| further isolation around containers in a task.
| my123 wrote:
| It was true for Fargate some time ago, but is not true
| anymore since quite a while. All Fargate tasks run on EC2
| instances today.
| easton wrote:
| ...which is probably the reason why task launches take
| 3-5 business weeks
| tptacek wrote:
| Ah, interesting. Thanks for the correction!
| 9front wrote:
| EC2 instances are using the Xen hypervisor. At least that's
| what reported by hostnamectl.
| wmf wrote:
| EC2 migrated off Xen around ten years ago. Only really
| old instances should be using Xen or Xen emulation.
| 9front wrote:
| I'm puzzled by your comment. On an EC2 instance of AL2023
| deployed on us-east-1 region this is the output of
| hostnamectl: [ec2-user][~]$ hostnamectl
| Static hostname: ip-x-x-x-x.ec2.internal
| Icon name: computer-vm Chassis: vm
| Machine ID: ec2d54f27fc534ea74980638ccc33d96
| Boot ID: 6caf18b7ed3647819c1985c11f128142
| Virtualization: xen Operating System: Amazon Linux
| 2023.5.20240903 CPE OS Name:
| cpe:2.3:o:amazon:amazon_linux:2023
| Kernel: Linux 6.1.106-116.188.amzn2023.x86_64
| Architecture: x86-64 Hardware Vendor: Xen
| Hardware Model: HVM domU Firmware Version:
| 4.11.amazon
| wmf wrote:
| What instance type is it?
| bonzini wrote:
| KVM can emulate the Xen hypercall interface. Amazon is
| not using Xen anymore.
| simcop2387 wrote:
| I'm not quite sure the status of it at least, but
| reported back in 2017 that they are moving off Xen
|
| https://www.theregister.com/2017/11/07/aws_writes_new_kvm
| _ba...
|
| It could be that it's not all over and tied to specific
| machine types still, or there's something they've done to
| make it report to the guest still that it's xen based for
| some compatibility reasons.
| blaerk wrote:
| I think some older instance types are still on xen, later
| types run kvm (code named nitro.. perhaps?). I can't
| remember the exact type but last year we ran into some
| weird issues related to some kernel regression that only
| affected some instances in our fleet, turns out they
| where all the same type and apparently ran on xen
| according to aws support
| daneel_w wrote:
| QEMU can use a number of different hypervisors, KVM and Xen
| being the two most common ones. Additionally it can also
| _emulate_ any architecture if one would want /need that.
| TimTheTinker wrote:
| > When I ran the numbers in 2019, there hadn't been guest
| exploitable vulnerabilities that affected devices normally used
| for IaaS for 3 years.
|
| So there existed known guest-exploitable vulnerabilities as
| recently as 8 years ago. Maybe that, combined with the fact
| that QEMU is _not_ written in Rust, is what is causing Oxide to
| decide against QEMU.
|
| I think it's fair to say that any sufficiently large codebase
| originally written in C or C++ has memory safety bugs. Yes, the
| Oxide RFD author may be phrasing this using weasel words; and
| memory safety bugs may not be exploitable at a given point in a
| codebase's history. But I don't think that makes Oxide's
| decision invalid.
| bonzini wrote:
| That would be a damn good record though, isn't it? (I am
| fairly sure that more were found since, but the point is that
| these are pretty rare). Firecracker, which is written in
| Rust, had one in 2019:
| https://www.cve.org/CVERecord?id=CVE-2019-18960
|
| Also QEMU's fuzzing is very sophisticated. Most recent
| vulnerabilities were found that way rather than by security
| researchers, which I don't think it's the case for
| "competitors".
| TimTheTinker wrote:
| You're not wrong, and that is very impressive. There's
| nothing like well-applied fuzzing to improve security.
|
| But I still don't think that makes Oxide's decision or my
| comment necessarily invalid, if only because of an _a
| priori_ decision to stick with Rust system-wide -- it
| raises the floor on software quality.
| akira2501 wrote:
| > it raises the floor on software quality.
|
| Languages cannot possibly do this.
| TimTheTinker wrote:
| I believe TypeScript and Rust are both strong examples of
| languages that do this (for different reasons and in
| different ways).
|
| It's also possible for a language to raise the ceiling of
| software quality, and Zig is an excellent example.
|
| I'm thinking of "floors" and "ceilings" as the outer
| bounds of what happens in real, everyday life within
| particular software ecosystems in terms of software
| quality. By "quality" I mean all of capabilities,
| performance, and absence of problems.
|
| It takes a team of _great_ engineers (and management
| willing to take a risk) to benefit from a raised ceiling.
| TigerBeetle[0] is an example of what happens when you
| pair a great team, great research, and a high-ceiling
| language.
|
| [0] https://tigerbeetle.com/
| akira2501 wrote:
| > possible for a language to raise the ceiling of
| software quality
|
| Cargo is widely recognized as low quality. The thesis
| fails within it's own standard packaging. It's possible
| for a language to be used by _more people_ and thus raise
| the quality _in aggregate_ of produced software but the
| language itself has no bearing on quality in any
| objective measure.
|
| > to benefit from a raised ceiling
|
| You're explicitly putting the cart before the horse here.
| The more reasonable assertion is that it takes good
| people to get good results regardless of the quality of
| the tool. Acolytes are uncomfortable saying this because
| it also destroys the negative case, which is, it would be
| impossible to write quality software in a previous
| generation language.
|
| > TigerBeetle[0] is an example
|
| Of a protocol and a particular implementation of that
| protocol. It has client libraries in multiple languages.
| This has no bearing on this point.
| sophacles wrote:
| > Cargo is widely recognized as low quality.
|
| Can you point me to both of:
|
| * why it's considered low quality
|
| * evidence of this "wide regard"
|
| Other than random weirdos who think allowing dependencies
| is a bad practice because you could hurt yourself, while
| extolling the virtues of undefined behavior - I've never
| heard much serious criticism of it.
| akira2501 wrote:
| > why it's considered low quality
|
| Other software providing the same features produce better
| results for those users. It's dependency management is
| fundamentally broken and causes builds to be much slower
| than they could otherwise be. Lack of namespaces which is
| a lesson well learned before the first line of Cargo was
| ever written.
|
| I could go on.
|
| > evidence of this "wide regard"
|
| We are on the internet. If you doubt me you can easily
| falsify this yourself. Or you could discover something
| you've been ignorant of up until now. Try "rust cargo
| sucks" as a search motif.
|
| > random weirdos
|
| Which may or may not be true, but you believe it, and yet
| you use your time to comment to us. This is more of a
| criticism of yourself than of me; however, I do
| appreciate your attempt to be insulting and dismissive.
| sophacles wrote:
| Im not attempting to insult you, i didn't know you held
| such a hypocritical position - sorry pointing out that it
| is weird for someone working a field that is so dependent
| on logic to hold such a self-contradictory position
| insults you. Maybe instead of weird i should use the
| words unusual and unexpected. My bad.
|
| You're right, I'm being dismissive of weasely unbacked
| claims of "wide regard". It's very clear now that you
| can't back your claim and I can safely ignore your entire
| argument as unfounded. Thanks for confirming!
| otabdeveloper4 wrote:
| Heresy! Software written in Rust _never_ has security
| vulnerabilities or bugs. The borrow checker means you don
| 't have to worry about security, Rust handles it for you
| automatically so you can go shopping.
| ameliaquining wrote:
| I do think that only having one CVE in six years is a
| pretty decent record, especially since that vulnerability
| probably didn't grant arbitrary code execution in
| practice.
|
| Rust is an important part of how Firecracker pulls this
| off, but it's not the only part. Another important part
| is that it's a much smaller codebase than QEMU, so there
| are fewer places for bugs to hide. (This, in turn, is
| possible in part because Firecracker deliberately doesn't
| implement any features that aren't necessary for its core
| use case of server-side workload isolation, whereas QEMU
| aims to be usable for anything that you might want to use
| a VM for.)
| sophacles wrote:
| Why is it the only people who say this at all are people
| saying it sarcastically or quoting fictional strawmen
| (and can never seem to provide evidence of it being said
| in earnest)?
| dvdbloc wrote:
| What do the big three use?
| paxys wrote:
| AWS - Nitro (based on KVM)
|
| Google - "KVM-based hypervisor"
|
| Azure - Hyper-V
|
| You can of course assume that all of them heavily customize
| the underlying implemenation for their own needs and for
| their own hardware. And then they have stuff like
| Firecracker, GVisor etc. layered on top depending on the
| product line.
| daneel_w wrote:
| Some more data:
|
| Oracle Cloud - QEMU/KVM
|
| Scaleway - QEMU/KVM
| bonzini wrote:
| IBM cloud, DigitalOcean, Linode, OVH, Hetzner,...
| anonfordays wrote:
| >Pretty much every cloud outside the big three (AWS, GCE,
| Azure) runs on QEMU.
|
| QEMU typically uses KVM for the hypervisor, so the
| vulnerabilities will be KVM anyway. The big three all use KVM
| now. Oxide decided to go with bhyve instead of KVM.
| bonzini wrote:
| No, QEMU is a huge C program which can have its own
| vulnerabilities.
|
| Usually QEMU runs heavily confined, but remote code execution
| in QEMU (remote = "from the guest") can be a first step
| towards exploiting a more serious local escalation via a
| kernel vulnerability. This second vulnerability can be in KVM
| or in any other part of the kernel.
| 6c696e7578 wrote:
| Azure uses hyper-v, unless things have changed massively, the
| linux they run for infra and customers is in hyper-v.
| cmeacham98 wrote:
| > The big three all use KVM now.
|
| This isn't true - Azure uses Hyper-V
| (https://learn.microsoft.com/en-
| us/azure/security/fundamental...), and AWS uses an in-house
| hypervisor called Nitro (https://aws.amazon.com/ec2/nitro/).
| anonfordays wrote:
| >This isn't true - Azure uses Hyper-V
|
| I thought Azure was moving/moved to KVM for Linux, but I
| was wrong.
|
| >AWS uses an in-house hypervisor called Nitro
|
| Nitro uses KVM under the hood.
| hinkley wrote:
| If they are being precise, then "reliability and security"
| means something different than "security and reliability".
|
| How many reliability bugs has QEMU experienced in this time?
|
| The man power to go on site and deal with in the field problems
| could be crippling. You often pick the boring problems for this
| reason. High touch is super expensive. Just look at Ferrari.
| sausagefeet wrote:
| While it's fair to say this does describe why Illumos was chosen,
| the actual RFD title is not presented and it is about Host OS +
| Virtualization software choice.
|
| Even if you think it's a foregone conclusion given the history of
| bcantrill and other founders of Oxide, there absolutely is value
| in putting decision to paper and trying to provide a rational
| because then it can be challenged.
|
| The company I co-founded does an RFD process as well and even if
| there is 99% chance that we're going to use the thing we've
| always used, if you're a serious person, the act of expressing it
| is useful and sometimes you even change your own mind thanks to
| the process.
| transpute wrote:
| _> Xen: Large and complicated (by dom0) codebase, discarded for
| KVM by AMZN_ 1. Xen Type-1 hypervisor is smaller
| than KVM/QEMU. 2. Xen "dom0" = Linux/FreeBSD/OpenSolaris.
| KVM/bhyve also need host OS. 3. AMZN KVM-subset: x86
| cpu/mem virt, blk/net via Arm Nitro hardware. 4. bhyve is
| Type-2. 5. Xen has Type-2 (uXen). 6. Xen dom0/host
| can be disaggregated (Hyperlaunch), unlike KVM. 7. pKVM
| (Arm/Android) is smaller than KVM/Xen.
|
| _> The Service Management Facility (SMF) is responsible for the
| supervision of services under illumos.. a [Linux] robust
| infrastructure product would likely end up using few if any of
| the components provided by the systemd project, despite there now
| being something like a hundred of them. Instead, more traditional
| components would need to be revived, or thoroughly bespoke
| software would need to be developed, in order to avoid the
| technological and political issues with this increasingly
| dominant force in the Linux ecosystem._
|
| Is this an argument for Illumos over Linux, or for translating
| SMF to Linux?
| bonzini wrote:
| Talking about "technological and political issues" without
| mentioning any, or without mentioning _which_ components would
| need to be revived, sounds a lot like FUD unfortunately. Mixing
| and matching traditional and systemd components is super
| common, for example Fedora and RHEL use chrony instead of
| timesyncd, and NetworkManager instead of networkd.
| actionfromafar wrote:
| I read it as "we can sit in this more quiet room where people
| don't rave about systemd all day long".
| bonzini wrote:
| But do they? Oxide targets the enterprise, and people there
| don't care that much about how the underlying OS works.
| It's been ten years since a RHEL release started using
| systemd and there has been no exodus to either Windows or
| Illumos.
|
| I don't mean FUD in a disparaging sense, more like literal
| fear of the unknown causing people to be excessively
| cautious. I wouldn't have any problem with Oxide saying "we
| went for what we know best", there's no need to fake that
| so much more research went into a decision.
| panick21_ wrote:
| The underlying hyperwiser on oxide isn't exposed to the
| consumers of the API. Just like on Amazon.
|
| I think arguably the bhyve over KVM was the more
| fundamental reason, and bhyve doesn't run on linux
| anyway.
| bonzini wrote:
| Exactly, then why would they be dragged into systemd-or-
| not-systemd discussion? If you want to use Linux, use
| either Debian or the CentOS hyperscaler spin (the one
| that Meta uses) and call it a day.
|
| I am obviously biased as I am a KVM (and QEMU) developer
| myself, but I don't see any other plausible reason other
| than "we know the Illumos userspace best". Founder mode
| and all that.
|
| As to their choice of hypervisor, to be honest KVM on
| Illumos was probably not a great idea to begin with,
| therefore they used bhyve.
| jclulow wrote:
| FWIW, founder mode didn't exist five years ago when we
| were getting started! More seriously, though, this
| document (which I helped write) is an attempt
| specifically to avoid classic FUD tropes. It's not
| perfect, but it reflects certainly aspects of my lived
| experience in trying to get pieces of the Linux ecosystem
| to work in production settings.
|
| While it's true that I'm a dyed in the wool illumos
| person, being in the core team and so on, I have Linux
| desktops, and the occasional Linux system in lab
| environments. I have been supporting customers with all
| sorts of environments that I don't get to choose for most
| of my career, including Linux and Windows systems. At
| Joyent most of our customers were running hardware
| virtualised Linux and Windows guests, so it's not like I
| haven't had a fair amount of exposure. I've even spent
| several days getting SCO OpenServer to run under our KVM,
| for a customer, because I apparently make bad life
| choices!
|
| As for not discussing the social and political stuff in
| any depth, I felt at the time (and still do today) that
| so much ink had been split by all manner of folks talking
| about LKML or systemd project behaviour over the last
| decade that it was probably a distraction to do anything
| other than mention it in passing. As I believe I said in
| the podcast we did about this RFD recently: I'm not sure
| if this decision would be right for anybody else or not,
| but I believe it was and is right for us. I'm not trying
| to sell you, or anybody else, on making the same calls.
| This is just how we made our decision.
| bonzini wrote:
| Founder mode existed, it just didn't have a catchy name.
| And I absolutely believe that it was the right choice for
| your team, exactly for "founder mode" reasons.
|
| In other words, I don't think that the social or
| technological reasons in the document were that strong,
| _and that 's fine_. Rather, my external armchair
| impression is simply that OS and hypervisor were not
| something where you were willing to spend precious "risk
| points", and that's the right thing to do given that you
| had a lot more places that were an absolute jump in the
| dark.
| InvaderFizz wrote:
| I would agree with that. Given the history of the Oxide
| team, they chose what they viewed was the best technology
| for THEM, as maintainers. The rest is mostly
| justification of that.
|
| That's just fine, as long as they're not choosing a
| clearly inferior long term option. The technically
| superior solution is not always the right solution for
| your organization given the priorities and capabilities
| of your team, and that's just fine! (I have no opinion on
| KVM vs bhyve, I don't know either deep enough to form
| one. I'm talking in general.)
| wmf wrote:
| Instead people rave about Solaris.
| packetlost wrote:
| The Oxide folks are rather vocal about their distaste for the
| Linux Foundation. FWIW I think they went with the right
| choice for them considering they'd rather sign up for
| maintaining the entire thing themselves than saddling
| themselves with the baggage of a Linux fork or upstreaming
| netbsdusers wrote:
| > Talking about "technological and political issues" without
| mentioning any
|
| I don't know why you think none were mentioned - to name one,
| they link a GitHub issue created against the systemd
| repository by a Googler complaining that systemd is
| inappropriately using Google's NTP servers, which at the time
| were not a public service, and kindly asking for systemd to
| stop using them.
|
| This request was refused and the issue was closed and locked.
|
| Behaviour like this from the systemd maintainers can only
| appear bizarre, childish, and unreasonable to any
| unprejudiced observer, putting their character and integrity
| into question and casting doubt on whether they should be
| trusted with the maintenance of software so integral to at
| least a reasonably large minority of modern Linux systems.
| inftech wrote:
| And people forget that this behavior of systemd devs is
| present in lots of other core projects of the Linux
| ecosystem.
|
| Unfortunately this makes modern Linux not reliable.
| suprjami wrote:
| systemd made time servers a compile-time option and a warn
| if distros are using the default time servers:
|
| https://github.com/systemd/systemd/pull/554
|
| What's your suggested alternative?
|
| Using pool.ntp.org requires a vendor zone. systemd does not
| consider itself a vendor, it's the distros shipping systemd
| which are the vendor and should register and use their own
| vendor zone.
|
| I don't care about systemd either way, but your own false
| representation of facts makes your last paragraph apply to
| your "argument".
| bigfatkitten wrote:
| > What's your suggested alternative?
|
| That if they do not wish to ship a safe default, they do
| not ship a default at all.
| suprjami wrote:
| That would have been my preference too.
| evandrofisico wrote:
| I've been using xen in production for at least 18 years, and
| although there is been some development, it is extremely hard
| to get actual documentation on how to do things with it.
|
| There is no place documenting how to integrate the
| Dom0less/Hyperlaunch in a distribution or how to build
| infrastructure with it, at best you will find a github repo,
| with the last commit dated 4 years ago, with little to no
| information on what to do with the code.
| dijit wrote:
| Honestly, SMF is superior to SystemD and it's ironic it came
| earlier (and, that shows based on the fact that it uses XML as
| its configuration language.. ick).
|
| However, two things are an issue:
|
| 1) The CDDL license of SMF makes it difficult to use, or at
| least that's what I was told when I asked someone why SMF
| wasn't ported to Linux in 2009.
|
| 2) SystemD is _it_ now. It's too complicated to replace and
| software has become hopelessly dependent on its existence,
| which is what I mentioned was my largest worry with a
| monoculture and I was routinely dismissed.
|
| So, to answer your question. The argument must be: IllumOS over
| Linux.
| transpute wrote:
| _> software has become hopelessly dependent on its existence_
|
| With some effort, Devuan has managed to support multiple init
| systems, at least for the software packaged by Devuan/Debian.
|
| _> SMF is superior to SystemD ... [CDDL]_
|
| OSS workalike opportunity, as new Devuan init system?
|
| _> The argument must be: IllumOS over Linux._
|
| Thanks :)
| jclulow wrote:
| SMF is OSS. The CDDL is an OSI approved licence. I'm not
| aware of any reason one couldn't readily ship user mode
| CDDL software in a Linux distribution; you don't even have
| the usual (often specious) arguments about linking and
| derivative works and so on in that case.
| saagarjha wrote:
| Unrelated, but is this a homegrown publishing platform?
| crb wrote:
| Yes; it's referred to in Oxide's RFD about RFDs [1]
| https://rfd.shared.oxide.computer/rfd/0001 but the referenced
| URL is 404 unless you're an Oxide employee.
|
| [1]
| https://rfd.shared.oxide.computer/rfd/0001#_shared_rfd_rende...
| [2] https://github.com/oxidecomputer/rfd/blob/master/src
| dcre wrote:
| That link is out of date. The site and backend are now open
| source. Only the repo containing the RFD contents is private.
|
| https://github.com/oxidecomputer/rfd-site
|
| https://github.com/oxidecomputer/rfd-api
| benjaminleonard wrote:
| Yep, you can see a little more on the
| [blog](https://oxide.computer/blog/a-tool-for-discussion) or
| the most recent
| [podcast](https://oxide.computer/podcasts/oxide-and-
| friends/2065190). The API and site repos are also public.
| ReleaseCandidat wrote:
| Instead of stating more or less irrelevant reasons, I'd prefer to
| read something like "I am (have been?) one of the core
| maintainers and know Illumos and Bhyve, so even if there would be
| 'objectively' better choices, our familiarity with the OS and
| hypervisor trump that". A "I like $A, always use $A and have
| experience using $A" is almost always a better argument than "$A
| is better than $B because $BLA", because that doesn't tell me
| anything about the depth of knowledge of using $A and $B or the
| knowledge of the subject of decision - there is a reason half of
| Google's results is some kind of "comparison" spam.
| actionfromafar wrote:
| But everyone at Oxide already knows that back story. At least
| if you list some other reasons list you can have a discussion
| about technical merits if you want to.
| ReleaseCandidat wrote:
| But that doesn't make sense if you have specialists for $A
| that also like to work with $A. Why should I as a customer
| trust Illumos/Bhye developers that are using Linux/KVM
| instead of "real" Linux/KVM developers? The only thing that
| such a decision would tell me is to not even think about
| using Illumos or Bhyve.
|
| The difference between "Buy our
| Illumos/Bhye solution! Why? I have been an Illumos/Bhyve
| Maintainer!"
|
| and "Buy our Linux/KVM solution! Why? I
| have been an Illumos/Bhyve Maintainer!"
|
| should make my point a bit clearer
| panick21_ wrote:
| Those are not the only options. You can have KVM on
| Illumos, or Bhye on FreeBSD.
|
| And finding people to heir that know Linux/KVM wouldn't be
| a problem for them.
|
| This evaluation was done years ago and they added like 50
| people since then.
|
| Saying 'We have a great KVM Team but our CEO was once an
| Illumos developer' is perfectly reasonable.
|
| And as I point out in my other comment, the former Joyant
| people like know more about KVM then anything else anyway.
| So it would be:
|
| "Buy our KVM Solution, we have KVM experts"
|
| But they evaluated that Bhyve was better then KVM despite
| that.
| ReleaseCandidat wrote:
| > "Buy our KVM Solution, we have KVM experts"
|
| Of course, but that is less of unique selling point.
|
| > But they evaluated that Bhyve was better then KVM
| despite that.
|
| If you are selling Bhyve you better say that whether it's
| true or not. So why should I, as a reader or employee or
| customer, trust them?
| panick21_ wrote:
| But Bryan also ported KVM to Illumos. And Joyand used KVM and
| they supported KVM there for years, I assume Bryan knows more
| about KVM then Bhyve as he seemed very hands on in the
| implementation (there is nice talk on youtube). So the idea
| that he isn't familiar with KVM isn't the case. So based on
| that KVM or Bhyve on Illumos, KVM would suggest itself.
|
| In the long term if $A is actually better then $B, then it
| makes sense to start with $A even if you don't know $A. Because
| if you are trying to building a company that is hopefully
| making billions in revenue in the future, then long term
| matters a great deal.
|
| Now the question is can you objectively figure out if $A or $B
| is better. And how much time does it take to figure out.
| Familiarity of the team is one consideration but not the most
| important one.
|
| Trying to be objective about this, instead of just saying 'I
| know $A' seems quite like a smart thing to do. And writing it
| down also seems smart.
|
| In a few years you can look back and actually say, was our
| analysis correct, if no what did we misjudge. And then you can
| learn from that.
|
| If you just go with familiarity you are basically saying 'our
| failure was predetermined so we did nothing wrong', when you
| clearly did go wrong.
| jclulow wrote:
| For what it's worth, we at _Joyent_ were seriously investing
| in bhyve as our next generation of hypervisor for quite a
| while. We had been diverging from upstream KVM, and most
| especially upstream QEMU, for a long time, and bhyve was a
| better fit for us for a variety of reasons. We adopted a port
| that had begun at Pluribus, another company that was doing
| things with OpenSolaris and eventually illumos, and Bryan
| lead us through that period as well.
| ComputerGuru wrote:
| Are you/will you be upstreaming fixes and/or improvements
| to Bhyve?
| jclulow wrote:
| Yes, my personal goal is to ensure that basically
| everything we do in the Oxide "stlouis" branch of illumos
| eventually goes upstream to illumos-gate where it filters
| down to everyone else!
| pmooney wrote:
| Improvements and fixes to illumos bhyve are almost
| entirely done in upstream illumos-gate, rather than the
| Oxide downstream.
|
| Upstreaming those changes into FreeBSD bhyve is a more
| complicated situation, given that illumos has diverged
| from upstream over the years due to differing opinions
| about certain interfaces.
| specialist wrote:
| > _Trying to be objective about this... And writing it down
| also seems smart._
|
| Mosdef.
|
| IIRC, these RFDs are part of Oxide's commitment to FOSS and
| radical openness.
|
| Whatever decision is ultimately made, for better or worse,
| having that written record allows the future team(s) to pick
| up the discussion where it previously left off.
|
| Working on a team that didn't have sacred cows, an
| inscrutible backstory ("hmmm, I dunno why, that's just how it
| is. if it ain't broke, don't fix it."), and gatekeepers would
| be _so great_.
| blinkingled wrote:
| I guess that's one way of keeping Solaris alive :)
| daneel_w wrote:
| _" * Emerging VMMs (OpenBSD's vmm, etc): Haven't been proven in
| production"_
|
| It's a small operation, but https://openbsd.amsterdam/ have
| absolutely proven that OpenBSD's hypervisor is production-capable
| _in terms of stability_ - but there are indeed other problems
| that rule against it on scale.
|
| For those who are unfamiliar with OpenBSD: the primary caveat is
| that its hypervisor can so far only provide guests with a single
| CPU core.
| jclulow wrote:
| Yes, to be clear this is not meant to be a criticism of
| software quality at OpenBSD! Though I don't necessarily always
| agree with the leadership style I have big respect for their
| engineering efforts and obviously as another relatively niche
| UNIX I feel a certain kinship! That part of the document was
| also written some years ago, much closer to 2018 when that
| service got started than now, so it's conceivable that we
| wouldn't have said the same thing today.
|
| I will say, though, that single VCPU guests would not have met
| our immediate needs in the Oxide product!
| notaplumber1 wrote:
| > I will say, though, that single VCPU guests would not have
| met our immediate needs in the Oxide product!
|
| Could Oxide not have helped push multi-vcpu guests out the
| door by sponsoring one of the main developers working on it,
| or contributing to development? From a secure design
| perspective, OpenBSD's vmd is a lot more appealing than bhyve
| is today.
|
| I saw recently that AMD SEV (Secure Encrypted Virtualization)
| was added, which seems compelling for Oxide's AMD based
| platform. Has Oxide added support for that to their bhyve
| fork yet?
| pmooney wrote:
| > Could Oxide not have helped push multi-vcpu guests out
| the door by sponsoring one of the main developers working
| on it, or contributing to development?
|
| Being that vmd's values are aligned with OpenBSD's
| (security above all else), it is probably not a good fit
| for what Oxide is trying to achieve. Last I looked at vmd
| (circa 2019), it was doing essentially all device emulation
| in userspace. While it makes total sense to keep as much
| logic as possible out of ring-0 (again, emphasis on
| security), doing so comes with some substantial performance
| costs. Heavily used devices, such as the APIC, will incur
| pretty significant overhead if the emulation requires round
| trips out to userspace on top of the cost of VM exits.
|
| > I saw recently that AMD SEV (Secure Encrypted
| Virtualization) was added, which seems compelling for
| Oxide's AMD based platform. Has Oxide added support for
| that to their bhyve fork yet?
|
| SEV complicates things like the ability to live-migrate
| guests between systems.
| tonyg wrote:
| > Nested virtualisation [...] challenging to emulate the
| underlying interfaces with flawless fidelity [...] dreadful
| performance
|
| It is so sad that we've ended up with designs where this is the
| case. There is no _intrinsic_ reason why nested virtualization
| should be hard to implement or should perform poorly. Path
| dependence strikes again.
| bonzini wrote:
| It doesn't perform poorly in fact. It can be tuned at 90% of
| non-nested virtualization, and for workloads where it doesn't,
| that's more than anything else a testimony to how close
| virtualized performance is to bare-metal.
|
| That said, it does add a lot of complexity.
| fefe23 wrote:
| These sound like reason you retconned so it sounds like you
| didn't choose Illumos because your founder used to work at Sun
| and Joyent before. :-)
|
| Frankly I don't understand why they blogged that at all. It reeks
| of desperation, like they feel they need to defend their choice.
| They don't.
|
| It also should not matter to their customers. They get exposed
| APIs and don't have to care about the implementation details.
| jclulow wrote:
| It's not a blog post, it's an RFD. We have a strong focus on
| writing as part of thinking and making decisions, and when we
| can, we like to publish our decision making documents in the
| spirit of open source. This is not a defence of our position so
| much as a record of the process through which we arrived at it.
| This is true of our other RFDs as well, which you can see on
| the site there.
|
| > It also should not matter to their customers. They get
| exposed APIs and don't have to care about the implementation
| details.
|
| Yes, the whole product is definitely designed that way
| intentionally. Customers get abstracted control of compute and
| storage resources through cloud style APIs. From their
| perspective it's a cloud appliance. It's only from our
| perspective as the people building it that it's a UNIX system.
| stonogo wrote:
| So at no point did anyone even suspect that Illumos was under
| consideration because it's been corporate leadership's pet
| project for decades? That seems like a wild thing to omit
| from the "RFD" process. Or were some topics not open to the
| "RFD" process?
| jclulow wrote:
| We are trying to build a business here. The goal is to sell
| racks and racks of computers to people, not build a
| menagerie of curiosities and fund personal projects.
| Everything we've written here is real, at least from our
| point of view. If we didn't think it would work, why would
| we throw our own business, and equity, and so on, away?
|
| The reason I continue to invest myself, if nothing else, in
| illumos, is because I genuinely believe it represents a
| better aggregate trade off for production work than the
| available alternatives. This document is an attempt to
| distill _why that is_ , not an attempt to cover up a
| personal preference. I do have a personal preference, and
| I'm not shy about it -- but that preference is based on
| tangible experiences over twenty years!
| leoh wrote:
| I'd love to use Illumos, but a lack of arm64 support is a non-
| starter
| jclulow wrote:
| Folks are working on it! I believe it boots on some small
| systems and under QEMU, but it's still relatively early days.
| I'm excited for the port to eventually make it into the gate,
| though!
| geerlingguy wrote:
| In before someone asks about riscv64 ;)
| ComputerGuru wrote:
| I don't mean to downplay the importance for you personally but
| I do want to clarify that while it might be a non-starter for
| you, all of arm64 is so new that it's hardly a non-starter for
| anyone considering putting it into (traditional) production.
| timenova wrote:
| You're right, however I was looking for the same information
| to maybe try it on a RPi to learn more about Illumos.
| BirAdam wrote:
| I think a bigger reason for Oxide using Illumos is that many of
| the people over there are former Sun folks.
| throw0101b wrote:
| Somewhat related, they discussed why they chose to use ZFS for
| their storage backend as opposed to (say) Ceph in a podcast
| episode:
|
| * https://www.youtube.com/watch?v=UvEKSqBBcZw
|
| Certainly they already had experience with ZFS (as it is built
| into Illumos/Solaris), but as it was told to them by someone they
| trusted who ran a lot of Ceph: " _Ceph is operated, not shipped
| [like ZFS]_ ".
|
| There's more care-and-feeding required for it, and they probably
| don't want that as they want to treat product in a more
| appliance/toaster-like fashion.
| pclmulqdq wrote:
| Ceph is sadly not very good at what it does. The big clouds
| have internal versions of object store that are _far_ better
| (no single point of failure, much better error recovery story,
| etc.). ZFS solves a different problem, though. ZFS is a full-
| featured filesystem. Like Ceph it is also vulnerable to single
| points of failure.
| throw0101b wrote:
| > _The big clouds have internal versions of object store that
| are far better (no single point of failure, much better error
| recovery story, etc.)._
|
| There are different levels of scalability needs. CERN has
| over a dozen (Ceph) clusters with over 100PB of total data as
| of 2023:
|
| * https://www.youtube.com/watch?v=bl6H888k51w
|
| Certainly there are some number of folks that need more than
| that, but I don't there are many.
|
| > _Like Ceph it is also vulnerable to single points of
| failure._
|
| The SPOF for ZFS is the host (unless you replicate, e.g.,
| _zfs send_ ).
|
| What is SPOF of Ceph? You can have multiple monitors,
| managers, and MDSes.
| pclmulqdq wrote:
| Single-monitor is a common way to run Ceph. On top of that,
| many cluster configurations cause the whole thing to slow
| to a crawl when a very small minority of nodes go down.
| Never mind packet loss, bad switches, and other sorts of
| weird failure mechanisms. Ceph in general is pretty bad at
| operating in degraded modes. ZFS and systems like Tectonic
| (FB) and Colossus (Google) do much better when things
| aren't going perfectly.
|
| Do you know how many administrators CERN has for its Ceph
| clusters? Google operates Colossus at ~1000x that size with
| a team of 20-30 SREs (almost all of whom aren't spending
| their time doing operations).
| anonfordays wrote:
| ZFS and Ceph is apples to oranges. ZFS is scoped to a single
| host, Ceph can span data centers.
| ComputerGuru wrote:
| It's very possible to run a light/small layer on top of ZFS
| (either userspace daemon or via FUSE) to get you most of the
| way to scaling ZFS-backed object storage within or across
| data centers depending on what specific availability metrics
| you need.
| anonfordays wrote:
| That's true for any filesystem, not specific to ZFS. ZFS is
| not a clustered or multi-host filesystem.
| seabrookmx wrote:
| What does this light/small layer look like?
|
| In my experience you need something like GlusterFS which I
| wouldn't call "light".
| throw0101b wrote:
| > _ZFS and Ceph is apples to oranges._
|
| Oxide is shipping an on-prem 'cloud appliance'. From the
| customer's/user's perspective of calling an API asking for
| storage, it does not matter what the backend is--apple or
| orange--as long as "fruit" (i.e., a logical bag of a certain
| size to hold bits) is the result that they get back.
| anonfordays wrote:
| Yes, it could be NTFS behind the scenes, but this is still
| an apples to oranges comparison because the storage service
| Oxide created is Crucible[0], not ZFS. Crucible is more of
| an apples to apples comparison with Ceph.
|
| [0] https://github.com/oxidecomputer/crucible
| wmf wrote:
| You mean they use Crucible instead of Ceph?
| jonstewart wrote:
| Illumos makes sense as a host OS--it's capable, they know it,
| they can make sure it works well on their hardware, and
| virtualization means users don't need that much familiarity with
| it.
|
| If I were Oxide, though, I'd be _sprinting_ to seamless VMWare
| support. Broadcom has turned into a modern-day Oracle (but
| dumber??) and many customers will migrate in the next two years.
| Even if those legacy VMs aren't "hyperscale", there's going to be
| lots of budget devoted to moving off VMWare.
| parasubvert wrote:
| Oracle is a $53 billion company, and never had a mass exodus,
| just less greenfield deployments.
|
| Broadcom also isn't all that dumb, VMware was fat and lazy and
| customers were coddled for a very long time. They've made a bet
| that it's sticky. The competition isn't as weak as they
| thought, that's true, but it will take 5+ years to catch up,
| not 2 years, in general. Broadcom was betting on it taking 10
| years: plenty of time to squeeze out margins. Customers have
| been trying and failing to eliminate the vTax since OpenStack.
| Red Hat and Microsoft are the main viable alternatives.
| kayo_20211030 wrote:
| Is the date on this piece correct?
|
| The section about Rust as a first class citizen seems to contain
| references to its potential use in Linux that are a few years out
| of date; with nothing more current than 2021.
|
| > As of March 2021, work on a prototype for writing Linux drivers
| in Rust is happening in the linux-next tree.
| kayo_20211030 wrote:
| nm, I read the postscript. The RFD was from 2021. I wonder how
| correct it was, and whether decisions made, based on it, were
| good ones or bad ones.
| alberth wrote:
| Isn't it simply Oxide founders are old Sun engineers, and Illumos
| is the open source spinoff of their old work.
| sophacles wrote:
| According to the founders and early engineers on their podcast
| - no, they tried to fairly evaluate all the oses and were
| willing to go with other options.
|
| Practically speaking, its hard to do it completely objectively
| and the in-house expertise probably colored the decision.
| pclmulqdq wrote:
| Tried to, sure, but when you evaluate other products strictly
| against the criteria under which you built your own version,
| you know what the conclusion will be. Never mind that you are
| carrying your blind spots with you. I would say that there
| was an attempt to evaluate other products, but not so much an
| attempt to be objective in that evaluation.
|
| In general, being on your own private tech island is a tough
| thing to do, but many engineers would rather do that than
| swallow their pride.
| Aissen wrote:
| Point 1.1 about QEMU seems even less relevant today, with QEMU
| adding support for the microvm machines, hence greatly reducing
| the amount of exposed code. And as bonzini said in the thread,
| the recent vulnerability track record is not so bad.
| magicalhippo wrote:
| Been running Bhyve on FreeBSD (technically FreeNAS). Found PCIe
| pass-through of NMVe drives was fairly straight forward once the
| correct incantations were found, but network speed to host has
| been fairly abysmal. On my admittedly aging Threadripper 1920X, I
| can only get ~2-3 Gbps peak from a Linux guest.
|
| That's with virtio, the virtual intel "card" is even slower.
|
| They went with Illumos though, so curious if the poor performance
| is a FreeBSD-specific thing.
| bitfilped wrote:
| It's been a minute since I messed with bhyve on FreeBSD, but
| I'm pretty sure you have to switch out the networking stack to
| something like Netgraph if you intend to use fast networking.
| craftkiller wrote:
| Hmmm I'm not the OP, but I run my personal site on a
| kubernetes cluster hosted in bhyve VMs running Debian on a
| FreeBSD machine using netgraph for the networking. I just
| tested by launching iperf3 on the FreeBSD host and launching
| an alpine linux pod in the cluster, and I only got ~4Gbit/s.
| This is surprising to me since netgraph is supposed to be
| capable of much faster networking but I guess this is going
| through multiple additional layers that may have slowed it
| down (off the top of my head: kubernetes with flannel,
| iptables in the VM, bhyve, and pf on the FreeBSD host).
| bitfilped wrote:
| Do you know if you're still using if_bridge? I remembered
| this article from klara that goes a bit more into the
| details. https://klarasystems.com/articles/using-netgraph-
| for-freebsd...
| ComputerGuru wrote:
| I just spun up a VNET jail (so it should be essentially using
| the same network stack and networking isolation level as a
| bhyve guest would) and tested with iperf3 and without any
| tweaking or optimization and without even using jumbo frames
| I'm able to get 24+ Gbps with iperf3 (32k window size, tcp,
| single stream) between host/guest over the bridged and
| virtualized network interface. My test hardware is older than
| yours, it's a Xeon E5-1650 v3 and this is even with nested
| virtualization since the "host" is actually an ESXi guest
| running pf!
|
| But I think you might be right about something because, playing
| with it some more, I'm seeing an asymmetry in network I/O
| speeds; when I use `iperf3 -R` from the VNET jail to make the
| host connect to the guest and send data instead of the other
| way around, I get very inconsistent results with bursts of 2
| Gbps traffic and then entire seconds without any data
| transferred (regardless of buffer size). I'd need to do a
| packet capture to figure out what is happening but it doesn't
| look like the default configuration performs very well at all!
| tcdent wrote:
| Linux has a rich ecosystem, but the toolkit is haphazard and a
| little shakey. Sure, everyone uses it, because when we last
| evaluated our options (in like 2009) it was still the most robust
| solution. That may no longer be the case.
|
| Given all of that, and taking into account building a product on
| top of it, and thus needing to support it and stand behind it,
| Linux wasn't the best choice. Looking ahead (in terms of decades)
| and not just shipping a product now, it was found that an
| alternate ecosystem existed to support that.
|
| Culture of the community, design principles, maintainability are
| all things to consider beyond just "is it popular".
|
| Exciting times in computing once again!
| craftkiller wrote:
| I wonder if CockroachDB abandoning the open source license[0]
| will have an impact on their choice to use it. It looks like the
| RFD was posted 1 day before the license switch[1], and the RFD
| has a section on licenses stating they intended to stick to the
| OSS build:
|
| > To mitigate all this, we're intending to stick with the OSS
| build, which includes no CCL code. [0]
| https://news.ycombinator.com/item?id=41256222 [1]
| https://rfd.shared.oxide.computer/rfd/0110
| implr wrote:
| They already have another RFD for this:
| https://rfd.shared.oxide.computer/rfd/0508
|
| and on HN: https://news.ycombinator.com/item?id=41268043
| Rendello wrote:
| And a podcast episode: https://oxide.computer/podcasts/oxide-
| and-friends/2052742
| yellowapple wrote:
| I'm surprised that KVM on Illumos wasn't in the running,
| especially with SmartOS setting that as precedent (even if bhyve
| is preferred nowadays).
| mechanicker wrote:
| Wonder if this is more due to Bhyve being developed on FreeBSD
| and Illumos derives from a common ancestor BSD?
|
| I know NetApp (stack based on FreeBSD) contributed significantly
| to Bhyve when they were exploring options to virtualize Data
| ONTAP (C mode)
|
| https://forums.freebsd.org/threads/bhyve-the-freebsd-hypervi...
| jclulow wrote:
| While we have a common ancestor in the original UNIX, so much
| of illumos is really more from our SVR4 heritage -- but then
| also so much of _that_ has been substantially reworked since
| then anyway.
| computersuck wrote:
| Because CTO Bryan Cantrill, who was a core contributor to illumos
| anonnon wrote:
| Ctrl+f Cantrill >Phrase not found
|
| Bryan Cantrill, ex-Sun dev, ex-Joyent CTO, now CTO of Oxide, is
| the reason they chose Illumos. Oxide is primarily an attempt to
| give Solaris (albeit Rustified) a second life, similar to Joyent
| before. The company even cites Sun co-founder Scott McNealy for
| its principles:
|
| https://oxide.computer/principles
|
| >"Kick butt, have fun, don't cheat, love our customers and change
| computing forever."
|
| >If this sounds familiar, it's because it's essentially Scott
| McNealy's coda for Sun Microsystems.
| JonChesterfield wrote:
| Illumos and ZFS sounds completely sensible for a company that
| runs on specific hardware. They mention the specific epyc cpu
| their systems are running on which suggests they're all ~
| identical.
|
| Linux has a massive advantage where it comes to hardware support
| for all kinds of esoteric devices. If you don't need that, and
| you've got engineers that are capable of patching the OS to
| support your hardware, yep, have at it. Good call.
___________________________________________________________________
(page generated 2024-09-12 23:01 UTC)