[HN Gopher] Why we use our own hardware
       ___________________________________________________________________
        
       Why we use our own hardware
        
       Author : nmjenkins
       Score  : 716 points
       Date   : 2024-12-22 08:36 UTC (14 hours ago)
        
 (HTM) web link (www.fastmail.com)
 (TXT) w3m dump (www.fastmail.com)
        
       | oldpersonintx wrote:
       | longtime FM user here
       | 
       | good on them, understanding infrastructure and cost/benefit is
       | essential in any business you hope to run for the long haul
        
       | bartvk wrote:
       | Such an awesome article. I like how they didn't just go with the
       | Cloud wave but kept sysadmin'ing, like ol' Unix graybeards. Two
       | interesting things they wrote about their SSDs:
       | 
       | 1) "At this rate, we'll replace these [SSD] drives due to
       | increased drive sizes, or entirely new physical drive formats
       | (such E3.S which appears to finally be gaining traction) long
       | before they get close to their rated write capacity."
       | 
       | and
       | 
       | 2) "We've also anecdotally found SSDs just to be much more
       | reliable compared to HDDs (..) easily less than one tenth the
       | failure rate we used to have with HDDs."
        
         | tgv wrote:
         | To avoid sysadmin tasks, and keep costs down, you've got to go
         | so deep in the cloud, that it becomes just another arcane skill
         | set. I run most of my stuff on virtual Linux servers, but some
         | on AWS, and that's hard to learn, and doesn't transfer to GCP
         | or Azure. Unless your needs are extreme, I think sysadmin'ing
         | is the easier route in most cases.
        
           | baxtr wrote:
           | I predict a slow but unstoppable comeback of the sysadmin job
           | over the next 5-10 years.
        
             | homebrewer wrote:
             | It never disappeared in some places. In my region there's
             | been zero interest in "the cloud" because of physical
             | remoteness from all major GCP/AWS/Azure datacenters
             | (resulting in high latency), for compliance reasons, and
             | because it's easier and faster to solve problems by dealing
             | with a local company than pleading with a global giant that
             | gives zero shits about you because you're less than a
             | rounding error in its books.
        
           | wongarsu wrote:
           | For so many things the cloud isn't really easier or cheaper,
           | and most cloud providers stopped advertising it as such. My
           | assumption is that cloud adoption is mainly driven by 3
           | forces:
           | 
           | - for small companies: free credits
           | 
           | - for large companies: moving prices as far away as possible
           | from the deploy button, allowing dev and it to just deploy
           | stuff without purchase orders
           | 
           | - self-perpetuating due to hype, cv-driven development, and
           | ease of hiring
           | 
           | All of these are decent reasons, but none of them may apply
           | to a company like fastmail
        
             | graemep wrote:
             | Also CYA. If you run your own servers and something goes
             | wrong its your fault. if its an outage at AWS its their
             | fault.
             | 
             | Also a huge element of follow the crowd, branding non-
             | technical management are familiar with, and so on. I have
             | also found some developers (front end devs, or back end
             | devs who do not have sysadmin skills) feel cloud is the
             | safe choice. This is very common for small companies as
             | they may have limited sysadmin skills (people who know how
             | to keep windows desktops running are not likely to be who
             | you want to deploy servers) and a web GUI _looks_ a lot
             | easier to learn.
        
               | dietr1ch wrote:
               | > If its an outage at AWS its their fault.
               | 
               | Well, still your fault, but easy to judo the risk into
               | clients saying supporting multi-cloud is expensive and
               | not a priority.
        
               | graemep wrote:
               | Management in many places will not even know what multi-
               | cloud is (or even multi-region).
               | 
               | As Cloudstrike showed, if you follow the crowd and tick
               | the right boxes you will not be blamed.
        
               | bobnamob wrote:
               | nit: Crowdstrike
               | 
               | Unless the incident is now being referred to as
               | "Cloudstrike", in which case, eww
        
               | dietr1ch wrote:
               | Yeah, he meant Crowdstrike. Cloudstrike is the name of a
               | future security incident affecting multiple cloud
               | provides. I can't disclose more details.
        
             | ghaff wrote:
             | There are other, if often at least tangentially related,
             | reasons but more than I can give justice to in a comment.
             | 
             | Many people largely got a lot of things wrong about cloud
             | that I've been meaning to write about for a while. I'll get
             | to it after the holidays. But probably none more than the
             | idea that massive centralized computing (which was wrongly
             | characterized as a utility like the electric grid) would
             | have economics with which more local computing options
             | could never compete.
        
             | Winsaucerer wrote:
             | I'm very interested in approaches that avoid cloud, so
             | please don't read this as me saying cloud is superior. I
             | can think of some other advantages of cloud:
             | 
             | - easy to setup different permissions for users
             | (authorisation considerations).
             | 
             | - able to transfer assets to another owner (e.g., if
             | there's a sale of a business) without needing to move
             | physical hardware.
             | 
             | - other outsiders (consultants, auditors, whatever) can
             | come in and verify the security (or other) of your setup,
             | because it's using a standard well known cloud platform.
        
               | wongarsu wrote:
               | Those are valid reasons, but not always as straight
               | forward:
               | 
               | > easy to setup different permissions for users
               | (authorisation considerations)
               | 
               | Centralized permission management is an advantage of the
               | cloud. At the same time it's easy to do wrong. Without
               | the cloud you usually have more piecemeal solutions
               | depending on segmenting network access and using the
               | permission systems of each service
               | 
               | > able to transfer assets to another owner (e.g., if
               | there's a sale of a business) without needing to move
               | physical hardware
               | 
               | The obvious solution here is to not own your hardware but
               | to rent dedicated servers. Removes some of the
               | maintenance burden, and the servers can be moved between
               | entities as you like. The cloud does give you more
               | granularity though
               | 
               | > other outsiders (consultants, auditors, whatever) can
               | come in and verify the security (or other) of your setup,
               | because it's using a standard well known cloud platform
               | 
               | There is a huge cottage industry of software trying to
               | scan for security issues in your cloud setups. On the one
               | hand that's an advantage of a unified interface, on the
               | other hand a lot of those issues wouldn't occur outside
               | the cloud. In any case, verifying security isn't easy in
               | or out of the cloud. But if you have an auditor that is
               | used to cloud deployments it will be easier to satisfy
               | them there, that's certainly true
        
             | oftenwrong wrote:
             | In small companies, cloud also provides the ability to work
             | around technical debt and to reduce risk.
             | 
             | For example, I have seen several cases where poorly
             | designed systems that unexpectedly used too much memory,
             | and there was no time to fix it, so the company increased
             | the memory on all instances with a few clicks. When you
             | need to do this immediately to avoid a botched release that
             | has already been called "successful" and announced as such
             | to stakeholders, that is a capability that saves the day.
             | 
             | An example of de-risking is using a cloud filesystem like
             | EFS to provide a pseudo-infinite volume. No risk of an
             | outage due to an unexpectedly full disk.
             | 
             | Another example would be using a managed database system
             | like RDS vs self-managing the same RDBMS: using the managed
             | version saves on labor and reduces risk for things like
             | upgrades. What would ordinarily be a significant effort for
             | a small company becomes automatic, and RDS includes various
             | sanity checks to help prevent you from making mistakes.
             | 
             | The reality of the industry is that many companies are just
             | trying to hit the next milestone of their business by a
             | deadline, and the cloud can help despite the downsides.
        
               | sgarland wrote:
               | > For example, I have seen several cases where poorly
               | designed systems that unexpectedly used too much memory
               | 
               | > using a managed database system like RDS vs self-
               | managing the same RDBMS: using the managed version saves
               | on labor
               | 
               | As a DBRE / SRE, I can confidently assert that belief in
               | the latter is often directly responsible for the former.
               | AWS is quite clear in their shared responsibility model
               | [0] that you are still responsible for making sound
               | decisions, tuning various configurations, etc. Having
               | staff that knows how to do these things often prevents
               | the poor decisions from being made in the first place.
               | 
               | [0]: https://aws.amazon.com/compliance/shared-
               | responsibility-mode...
        
               | graemep wrote:
               | Not a DB admin, but I do install and manage DBs for small
               | clients.
               | 
               | My experience is that AWS makes the easy things easy and
               | the difficult things difficult, and the knowledge is not
               | transferable.
               | 
               | With a CLI or non-cloud management tools I can create,
               | admin and upgrade a database (or anything else) exactly
               | the same way, locally, on a local VM, and on a cloud VM
               | from any provider (including AWS). Doing it with a
               | managed database means learning how the provider does it
               | - which takes longer and I personally find it more
               | difficult (and stressful).
               | 
               | What I cannot do as well as a real DB admin could do is
               | things like tuning. Its not really an issue for small
               | clients (a few generic changes to scale settings to
               | available resources is enough - and cheaper than paying
               | someone to tune it). Come to think of it, I do not even
               | know how to make those changes on AWS and just hope the
               | defaults match the size of RDS you are paying for (and
               | change when you scale up?).
               | 
               | having written the above I am now doubting whether I have
               | done the right thing in the past.
        
             | nine_k wrote:
             | A cloud is really easy to get started with.
             | 
             | Free tiers, startup credits, easily available managed
             | databases, queues, object storage, lambdas, load-balancing,
             | DNS, TLS, specialist stuff like OCR. It's easy to prototype
             | something, run for free or for peanuts, start getting some
             | revenue.
             | 
             | Then, as you grow, the costs become steeper, but migrating
             | off of the cloud looks even more expensive, especially if
             | you have accumulated a lot of data (egress costs you,
             | especially from AWS). Congrats, you have become the
             | desirable, typical cloud customer.
        
           | graemep wrote:
           | > it becomes just another arcane skill set
           | 
           | Its an arcane skill set with a GUI. It makes it _look_ much
           | easier to learn.
        
         | edward28 wrote:
         | The power of Moore's law.
        
         | jeffbee wrote:
         | I don't see how point 2 could have come as a surprise to
         | anyone.
        
         | kwillets wrote:
         | SSD's are also a bit of an achilles heel for AWS -- they have
         | their own Nitro firmware for wear levelling and key rotations,
         | due to the hazards of multitenant. It's possible for one EC2
         | tenant to use up all the write cycles and then pass it to
         | another, and encryption with key rotation is required to keep
         | data from leaking across tenant changes. It's also slower.
         | 
         | We had one outage where key rotation had been enabled on
         | reboot, so data partitions were lost after what should have
         | been a routine crash. Overall, for data warehousing, our
         | failure rate on on-prem (DC-hosted) hardware was lower IME.
        
       | louwrentius wrote:
       | I like this writeup, informative and to-the-point.
       | 
       | Today, the cloud isn't about other people's hardware.
       | 
       | It's about infrastructure being an API call away. Not just
       | virtual machines but also databases, load-balancers, storage, and
       | so on.
       | 
       | The cost isn't the DC or the hardware, but the hours spend on
       | operations.
       | 
       | And you can abuse developers to do operations on the side :-)
        
         | zelphirkalt wrote:
         | And then come the weird aspects of bad cloud service providers,
         | like IONOS, who have broken OS images, a provisioning API, that
         | is a bottleneck, where what other people do and how much they
         | do can slow down your own provisioning and creating network
         | interfaces can take minutes via their API and their customer
         | services says "That's how it is, cannot change it.", and you
         | get a very shitty web user interface, that desperately tries to
         | be a single page app, yet has all the default browser
         | functionality like the back button broken. Yet they still cost
         | literally 10x what Hetzner cloud costs, while Hetzner basically
         | does everything better.
         | 
         | And then it is still also about other people's hardware in
         | addition to that.
        
       | goldeneye13_ wrote:
       | Didn't see this in the article, do they have multi az redundancy?
       | I.e. if the entire raid goes up in flames what's the recovery
       | process?
        
         | comboy wrote:
         | Yeah, that makes me feel uneasy as a long time fastmail user.
        
         | cyrnel wrote:
         | Looks like they do mention that elsewhere:
         | https://www.fastmail.com/features/reliability/
         | 
         | > Fastmail has some of the best uptime in the business, plus a
         | comprehensive multi data center backup system. It starts with
         | real-time replication to geographically dispersed data centers,
         | with additional daily backups and checksummed copies of
         | everything. Redundant mirrors allow us to failover a server or
         | even entire rack in the case of hardware failure, keeping your
         | mail running.
        
         | Amfy wrote:
         | I believe they replicate from NJ to WA (Seattle). At least
         | that's something they spoke about many years ago.
        
         | sufehmi wrote:
         | https://www.fastmail.com/blog/throwback-security-confidentia...
        
       | jmakov wrote:
       | Would be interesting to know how files get stored. They don't
       | mention any distributed FS solutions like SeaweedFS so once a
       | drive is full, does the file get sent to another one via some
       | service? Also ZFS seems an odd choice since deletions (esp of
       | small files) at +80% full drive are crazy slow.
        
         | shrubble wrote:
         | The open-source Cyrus IMAP server which they mention using, has
         | replication built-in. ZFS also has built-in replication
         | available.
         | 
         | Deletion of files depends on how they have configured the
         | message store - they may be storing a lot of data into a
         | database, for example.
        
           | mastax wrote:
           | ZFS replication is quite unreliable when used with ZFS native
           | encryption, in my experience. Didn't lose data but constant
           | bugs.
        
         | ackshi wrote:
         | Keeping enough free space should be much less of a problem with
         | SSDs. They can tune it so the array needs to be 95% full before
         | the slower best-fit allocator kicks in.
         | https://openzfs.readthedocs.io/en/latest/performance-tuning....
         | 
         | I think that 80% figure is from when drives were much smaller
         | and finding free space over that threshold with the first-fit
         | allocator was harder.
        
         | ryao wrote:
         | Unlike ext4 that locks the directory when unlinking, ZFS is
         | able to scale on parallel unlinking. In specific, ZFS has range
         | locks that permit directory entries to be removed in parallel
         | from the extendible hash trees that store them. While this is
         | relatively slow for sequential workloads, it is fast on
         | parallel workloads. If you want to delete a large directory
         | subtree fast on ZFS, do the rm operations in parallel. For
         | example, this will run faster on ZFS than a naive rm operation:
         | find /path/to/subtree -name -type f | parallel -j250 rm --
         | rm -r /path/to/subtree
         | 
         | A friend had this issue on spinning disks the other day. I
         | suggested he do this and the remaining files were gone in
         | seconds when at the rate his naive rm was running, it should
         | have taken minutes. It is a shame that rm does not implement a
         | parallel unlink option internally (e.g. -j), which would be
         | even faster, since it would eliminate the execve overhead and
         | likely would eliminate some directory lookup overhead too,
         | versus using find and parallel to run many rm processes.
         | 
         | For something like fast mail that has many users, unlinking
         | should be parallel already, so unlinking on ZFS will not be
         | slow for them.
         | 
         | By the way, that 80% figure has not been true for more than a
         | decade. You are referring to the best fit allocator being used
         | to minimize external fragmentation under low space conditions.
         | The new figure is 96%. It is controlled by metaslab_df_free_pct
         | in metaslab.c:
         | 
         | https://github.com/openzfs/zfs/blob/zfs-2.2.0/module/zfs/met...
         | 
         | Modification operations become slow when you are at/above 96%
         | space filled, but that is to prevent even worse problems from
         | happening. Note that my friend's pool was below the 96%
         | threshold when he was suffering from a slow rm -r. He just had
         | a directory subtree with a large amount of directory entries he
         | wanted to remove.
         | 
         | For what it is worth, I am the ryao listed here and I was
         | around when the 80% to 96% change was made:
         | 
         | https://github.com/openzfs/zfs/graphs/contributors
        
           | switch007 wrote:
           | I discovered this yesterday! Blew my mind. I had to check 3
           | times that the files were actually gone and that I specified
           | the correct directory as I couldn't believe how quick it ran.
           | Super cool
        
           | jmakov wrote:
           | Thank you very much for sharing this, very insightful.
        
             | ryao wrote:
             | Thank you for posting your original comment. The process of
             | writing my reply gave me a flash of inspiration:
             | 
             | https://github.com/openzfs/zfs/pull/16896
             | 
             | I doubt that this will make us as fast as ext4 at unlinking
             | files in a single thread, but it should narrow the gap
             | somewhat. It also should make many other common operations
             | slightly faster.
             | 
             | I had looked into range lock overhead years ago, but when I
             | saw the majority of time entering range locks was spent in
             | an "unavoidable" memory allocation, I did not feel that
             | making the operations outside the memory allocation faster
             | would make much difference, so I put this down. I imagine
             | many others profiling the code came to the same conclusion.
             | Now that the memory allocation overhead will soon be gone,
             | additional profiling might yield further improvements. :)
        
       | caidan wrote:
       | I absolutely love Fastmail. I moved off of Gmail years ago with
       | zero regrets. Better UI, better apps, better company, and need I
       | say better service? I still maintain and fetch from a Gmail
       | account so it all just works seamlessly for receiving and sending
       | Gmail, so you don't have to give anything up either.
        
         | pawelduda wrote:
         | Their android app has always been much snappier than Gmail,
         | it's the little things that drew me to it years ago
        
         | jb1991 wrote:
         | Their UI is definitely faster but I do prefer the gmail UI, for
         | example how new messages are displayed in threads is quite
         | useless in fastmail.
        
         | petesergeant wrote:
         | I use Fastmail for my personal mail, and I don't regret it, but
         | I'm not quite as sold as you are, I guess maybe because I still
         | have a few Google work accounts I need to use. Spam filtering
         | in Fastmail is a little worse, and the search is _terrible_.
         | The iOS app is usable but buggy. The easy masked emails are a
         | big win though, and setting up new domains feels like less of a
         | hassle with FM. I don't regret using Fastmail, and I'd use them
         | again for my personal email, but it doesn't feel like a slam
         | dunk.
        
         | mlfreeman wrote:
         | I moved from my own colocated 1U running Mailcow to Fastmail
         | and don't regret it one bit. This was an interesting read, glad
         | to see they think things through nice and carefully.
         | 
         | The only things I wish FM had are all software:
         | 
         | 1. A takeout-style API to let me grab a complete snapshot once
         | a week with one call
         | 
         | 2. The ability to be an IdP for Tailscale.
        
         | xerp2914 wrote:
         | 100% this. I migrated from Gmail to Fastmail about 5 years ago
         | and it has been rock solid. My only regret is that I didn't do
         | it sooner.
        
       | tucnak wrote:
       | Yeah, Cloud is a bit of a scam innit? Oxide is looking more and
       | more attractive every day as the industry corrects itself from
       | overspending on capabilities they would never need.
        
         | klysm wrote:
         | It's trading time for money
        
           | jgb1984 wrote:
           | Fake news. I've got my bare metal server deployed and
           | installed with my ansible playbook even before you manage to
           | log into the bazillion layers of abstraction that is AWS.
        
             | acedTrex wrote:
             | But can you do that on demand in minutes for 1000
             | application teams that have unique snowflake needs. Because
             | terraform or bicep can.
        
             | klysm wrote:
             | In multiple regions?
        
           | rob_c wrote:
           | Yes, welcome to business. But frankly an email provider needs
           | to have their own metal, if they don't they're not worth
           | doing business with
        
       | mgaunard wrote:
       | Why is it surprising? It's well known cloud is 3 times the price.
        
         | diggan wrote:
         | Because the default for companies today is cloud, even though
         | it almost never makes sense. Sure, if you have really spikey
         | load, need to dynamically scale at any point and don't care
         | about your spend, it might make sense.
         | 
         | Ive even worked in companies where the engineering team spent
         | effort and time on building "scalable infrastructure" before
         | the product itself even found product-market fit...
        
         | dewey wrote:
         | Nobody said it's surprising though, they are well aware of it
         | having done it for more than two decades. Many newcomers are
         | not aware of it though, as their default is "cloud" and they
         | never even shopped for servers, colocation or looked around on
         | the dedicated server market.
        
           | aimanbenbaha wrote:
           | I don't think they're not just aware. But purely from scaling
           | and distribution perspective it'd be wiser to start on cloud
           | while you're still on the product-market fit phase. Also
           | 'bare metal' requires more on the capex end and with how our
           | corporate tax system is set it's just discouraging to go on
           | this lane first and it'd be better off to spend on acquiring
           | clients.
           | 
           | Also I'd guess a lot of technical founders are more familiar
           | with cloud/server-side than with dealing or delegating
           | sysadmin taks that might require adding members to the team.
        
             | dewey wrote:
             | I agree, the cloud definitely has a lot of use cases and
             | when you are building more complicated systems it makes
             | sense to just have to do a few clicks to get a new stack
             | setup vs. having someone evaluate solutions and getting
             | familiar with operating them on a deep level (backups
             | etc.).
        
       | rrgok wrote:
       | I would like to know the tech stack behind it.
        
       | antihero wrote:
       | I've started to host my own sites and stuff on an old MacBook in
       | a cupboard with a shit old external hardware Ava microk8s and
       | it's great!
        
         | theoreticalmal wrote:
         | Another homelabber joins the ranks!!
        
       | tndibona wrote:
       | But what about the cost and complexity of a room with the racks
       | and the cooling needs of running these machines? And the
       | uninterrupted power setup? The wiring mess behind the racks.
        
         | hyhconito wrote:
         | I'm not fastmail but this is not rocket science. Has everyone
         | forgotten how datacentre services work in 2024?
        
           | rob_c wrote:
           | Yes they have and they feel they deserve credit for
           | discovering a WiFi cable is more reliable to the new shiny
           | kit that was sold to them by a vendor...
        
         | jonatron wrote:
         | Even for cloud providers, these are mostly other people's
         | problems, eg: Equinix
        
         | 7952 wrote:
         | Do colocation facilities solve that?
        
         | bradfa wrote:
         | There is a very competitive market for colo providers in
         | basically every major metropolitan area in the US, Europe, and
         | Asia. The racks, power, cooling, and network to your machines
         | is generally very robust and clearly documented on how to
         | connect. Deploying servers in house or in a colo is a well
         | understood process with many experts who can help if you don't
         | have these skills.
        
           | rob_c wrote:
           | Colo offers the ability to ship and deploy and keep latencies
           | down if you're global, but if you're local yes you should
           | just get someone on site and the modern equivalent of a T1
           | line setup to your premises if you're running "online"
           | services.
        
         | grishka wrote:
         | Own hardware doesn't mean own data center. Many data centers
         | offer colocation.
        
       | lokimedes wrote:
       | A mail-cloud provider uses its own hardware? Well, that's to be
       | expected, it would be a refreshing article if it was written by
       | one of their customers.
        
       | tuananh wrote:
       | gmail does spam filtering very well for me. fastmail on the other
       | hands, puts lots of legit emails into spam folder. manually
       | marking "not spam" doesn't help
       | 
       | other than that, i'm happy with fastmail.
        
         | jacobdejean wrote:
         | iCloud is just as bad, sends important things to spam
         | constantly and marking as "not spam" has never done anything
         | perceivable.
        
         | ghaff wrote:
         | If I look at my Gmail SPAM folder, there is very rarely
         | something genuinely important in it. What there is a fair bit
         | of though is random newsletters and announcements that I may
         | have signed up for in some way shape or form that I don't
         | really care about or generally look at. I assume they've been
         | reported as SPAM by enough people rather than simply
         | unsubscribed to that Google now labels them as such.
        
       | xsc wrote:
       | Are those backups geographically distributed?
        
         | christophilus wrote:
         | Yes.
        
       | _bare_metal wrote:
       | Plugging https://BareMetalSavings.com
       | 
       | in case you want to ballpark-estimate your move off of the cloud
       | 
       | Bonus points: I'm a Fastmail customer, so it tangentially tracks
       | 
       | ----
       | 
       | Quick note about the article: ZFS encryption can be flaky, be
       | sure you know what you're doing before deploying for your
       | infrastructure.
       | 
       | Relevant Reddit discussion:
       | https://www.reddit.com/r/zfs/comments/1f59zp6/is_zfs_encrypt...
       | 
       | A spreadsheet of related issues that I can't remember who made:
       | 
       | https://docs.google.com/spreadsheets/d/1OfRSXibZ2nIE9DGK6sww...
        
         | brongondwana wrote:
         | Yeah, we know about the ZFS encryption with send/receive bug,
         | it's frustrating our attempts to get really nice HA support on
         | our logging system... but so far it appears that just deleting
         | the offsending snapshot and creating a new one works, and we're
         | funding some research into the issue as well.
         | 
         | This is the current script - it runs every minute for each pool
         | synced between the two log servers:
         | https://gist.github.com/brong/6a23fee1480f2d62b8a18ade5aea66...
        
       | ackshi wrote:
       | I'm a little surprised it seems they didn't have some existing
       | compression solution before moving to zfs. With so much
       | repetitive text across emails I would think there would be a LOT
       | to gain, such as from dictionaries, compressing many emails into
       | bigger blobs, and fine-tuning compression options.
        
         | silvestrov wrote:
         | They use ZFS with zstd which likely compresses well enough.
         | 
         | Custom compression code can introduce bugs that can kill
         | Fastmail's reputation of reliability.
         | 
         | It's better to use a well tested solution that cost a bit more.
        
       | rob_c wrote:
       | Hosts online service seems to think deserving of medal for
       | discovering that S3 buckets from a cloud provider are crap and
       | cost a fortune.
       | 
       | The heading in this space makes your think they're running custom
       | FPGAs such as with Gmail, not just running on metal... As for
       | drive failures, welcome to storage at scale. Build your solution
       | so it's a weekly task to replace 10disks at a time not critical
       | at 2am when a single disk dies...
       | 
       | Storing/Accessing tonnes of <4kB files is difficult, but other
       | providers are doing this on their own metal with CEPH at the PB
       | scale.
       | 
       | I love ZFS, it's great with per-disk redundancy but CEPH is
       | really the only game in town for inter-rack/DC resilience which I
       | would hope my email provider has.
        
       | johnklos wrote:
       | The whole push to the cloud has always fascinated me. I get it -
       | most people aren't interested in babysitting their own hardware.
       | On the other hand, a business of just about any size that has any
       | reasonable amount of hosting is better off with their own systems
       | when it comes purely to cost.
       | 
       | All the pro-cloud talking points are just that - talking points
       | that don't persuade anyone with any real technical understanding,
       | but serve to introduce doubt to non-technical people and to trick
       | people who don't examine what they're told.
       | 
       | What's particularly fascinating to me, though, is how some people
       | are so pro-cloud that they'd argue with a writeup like this with
       | silly cloud talking points. They don't seem to care much about
       | data or facts, just that they love cloud and want everyone else
       | to be in cloud, too. This happens much more often on sites like
       | Reddit (r/sysadmin, even), but I wouldn't be surprised to see a
       | little of it here.
       | 
       | It makes me wonder: how do people get so sold on a thing that
       | they'll go online and fight about it, even when they lack facts
       | or often even basic understanding?
       | 
       | I can clearly state why I advocate for avoiding cloud: cost,
       | privacy, security, a desire to not centralize the Internet. The
       | reason people advocate for cloud for others? It puzzles me.
       | "You'll save money," "you can't secure your own machines," "it's
       | simpler" all have worlds of assumptions that those people can't
       | possibly know are correct.
       | 
       | So when I read something like this from Fastmail which was
       | written without taking an emotional stance, I respect it. If I
       | didn't already self-host email, I'd consider using Fastmail.
       | 
       | There used to be so much push for cloud everything that an
       | article like this would get fanatical responses. I hope that it's
       | a sign of progress that that fanaticism is waning and people
       | aren't afraid to openly discuss how cloud isn't right for many
       | things.
        
         | mjburgess wrote:
         | 1. People are credulous
         | 
         | 2. People therefore repeat talking points which seem in their
         | interest
         | 
         | 3. With enough repetition these become their beliefs
         | 
         | 4. People will defend their beliefs as _theirs_ against attack
         | 
         | 5. Goto 1
        
         | anotherhue wrote:
         | They spent time and career points learning cloud things and
         | dammit it's going to matter!
         | 
         | You can't even blame them too much, the amount of cash poured
         | into cloud marketing is astonishing.
        
           | sgarland wrote:
           | The thing that frustrates me is it's possible to know how to
           | do both. I have worked with multiple people who are quite
           | proficient in both areas.
           | 
           | Cloud has definite advantages in some circumstances, but so
           | does self-hosting; moreover, understanding the latter makes
           | the former much, much easier to reason about. It's silly to
           | limit your career options.
        
             | noworriesnate wrote:
             | Being good at both is twice the work, because even if some
             | concepts translate well, IME people won't hire someone
             | based on that. "Oh you have experience with deploying
             | RabbitMQ but not AWS SQS? Sorry, we're looking for someone
             | more qualified."
        
               | sgarland wrote:
               | That's a great filter for places I don't want to work at,
               | then.
        
         | cpursley wrote:
         | The fact is, managing your own hardware is a pita and a
         | distraction from focusing on the core product. I loathe messing
         | with servers and even opt for "overpriced" paas like fly,
         | render, vercel. Because every minute messing with and
         | monitoring servers is time not spent on product. My tune might
         | change past a certain size and a massive cloud bill and there's
         | room for full time ops people, but to offset their salary, it
         | would have to be huge.
        
           | cpursley wrote:
           | Anecdotal - but I once worked for a company where the product
           | line I built for them after acquisition was delayed by 5
           | months because that's how long it took to get the hardware
           | ordered and installed in the datacenter. Getting it up on AWS
           | would have been a days work, maybe two.
        
             | stubish wrote:
             | Yes, it is death by 1000 cuts. Speccing, negotiating with
             | hardware vendors, data center selection and negotiating, DC
             | engineer/remote hands, managing security cage access,
             | designing your network, network gear, IP address ranges,
             | BGP, secure remote console access, cables, shipping,
             | negotiating with bandwidth providers (multiple, for
             | redundancy), redundant hardware, redundant power sources,
             | UPS. And then you get to plug your server in. Now duplicate
             | other stuff your cloud might provide, like offsite backups,
             | recovery procedures, HA storage, geographic redundancy. And
             | do it again when you outgrown your initial DC. Or build
             | your own DC (power, climate, fire protection, security,
             | fiber, flooring, racks)
        
               | sgarland wrote:
               | Much of this is still required in cloud. Also, I think
               | you're missing the middle ground where 99.99% of
               | companies could happily exist indefinitely: colo. It
               | makes little to no financial or practical sense for most
               | to run their own data centers.
        
             | sroussey wrote:
             | Oh, absolutely, with your own hardware you need planning.
             | Time to deployment is definitely a thing.
             | 
             | Really, the one major thing that bites on cloud providers
             | in there 99.9% margin on egress. The markup is insane.
        
           | fhd2 wrote:
           | I'm with you there, with stuff like fly.io, there's really no
           | reason to worry about infrastructure.
           | 
           | AWS, on the other hand, seems about as time consuming and
           | hard as using root servers. You're at a higher level of
           | abstraction, but the complexity is about the same I'd say. At
           | least that's my experience.
        
             | cpursley wrote:
             | I agree with this position and actively avoid AWS
             | complexity.
        
           | noprocrasted wrote:
           | That argument makes sense for PaaS services like the ones you
           | mention. But for bare "cloud" like AWS, I'm not convinced it
           | is saving any effort, it's merely swapping one kind of
           | complexity with another. Every place I've been in had full-
           | time people messing with YAML files or doing "something" with
           | the infrastructure - generally trying to work around the
           | (self-inflicted) problems introduced by their cloud provider
           | - whether it's the fact you get 2010s-era hardware or that
           | you get nickel & dimed on absolutely arbitrary actions that
           | have no relationship to real-world costs.
        
             | jeffbee wrote:
             | In what sense is AWS "bare cloud"? S3, DynamoDB, Lambda,
             | ECS?
        
               | inemesitaffia wrote:
               | EC2
        
               | bsder wrote:
               | I would actually argue that EC2 is a "cloud smell"--if
               | you're using EC2 you're doing it wrong.
        
               | noprocrasted wrote:
               | How do you configure S3 access control? You need to learn
               | & understand how their IAM works.
               | 
               | How do you even point a pretty URL to a lambda? Last time
               | I looked you need to stick an "API gateway" in front
               | (which I'm sure you also get nickel & dimed for).
               | 
               | How do you go from "here's my git repo, deploy this on
               | Fargate" with AWS? You need a CI pipeline which will run
               | a bunch of awscli commands.
               | 
               | And I'm not even talking about VPCs, security groups,
               | etc.
               | 
               | Somewhat different skillsets than old-school sysadmin
               | (although once you know sysadmin basics, you realize a
               | lot of these are just the same concepts under a branded
               | name and arbitrary nickel & diming sprinkled on top), but
               | equivalent in complexity.
        
           | sgarland wrote:
           | Counterpoint: if you're never "messing with servers," you
           | probably don't have a great understanding of how their
           | metrics map to those of your application's, and so if you
           | bottleneck on something, it can be difficult to figure out
           | what to fix. The result is usually that you just pay more
           | money to vertically scale.
           | 
           | To be fair, you did say "my tune might change past a certain
           | size." At small scale, nothing you do within reason really
           | matters. World's worst schema, but your DB is only seeing 100
           | QPS? Yeah, it doesn't care.
        
             | tokioyoyo wrote:
             | I don't think you're correct. I've watched junior/mid-level
             | engineers figure things out solely by working on the cloud
             | and scaling things to a dramatic degree. It's really not a
             | rocket science.
        
               | sgarland wrote:
               | I didn't say it's rocket science, nor that it's
               | impossible to do without having practical server
               | experience, only that it's more difficult.
               | 
               | Take disks, for example. Most cloud-native devs I've
               | worked with have no clue what IOPS are. If you saturate
               | your disk, that's likely to cause knock-on effects like
               | increased CPU utilization from IOWAIT, and since "CPU is
               | high" is pretty easy to understand for anyone, the
               | seemingly obvious solution is to get a bigger instance,
               | which depending on the application, may inadvertently
               | solve the problem. For RDBMS, a larger instance means a
               | bigger buffer pool / shared buffers, which means fewer
               | disk reads. Problem solved, even though actually solving
               | the root cause would've cost 1/10th or less the cost of
               | bumping up the entire instance.
        
               | tokioyoyo wrote:
               | > Most cloud-native devs
               | 
               | You might be making some generalizations from your
               | personal experience. Since 2015, at all of my jobs,
               | everything has been running on some sort of a cloud. I'm
               | yet to meet a person who doesn't understand IOPS. If I
               | was a junior (and from my experience, that's what they
               | tend to do), I'd just google "slow X potential reasons".
               | You'll most likely see some references to IOPS and
               | continue your research from there.
               | 
               | We've learned all these things one way or another. My
               | experience started around 2007ish when I was renting out
               | cheap servers from some hosting providers. Others might
               | be dipping their feet into readily available cloud-
               | infrastructure, and learning it from that end. Both
               | works.
        
           | icedchai wrote:
           | Writing piles of IaC code like Terraform and CloudFormation
           | is also a PITA and a distraction from focusing on your core
           | product.
           | 
           | PaaS is probably the way to go for small apps.
        
             | UltraSane wrote:
             | But that effort has a huge payoff in that it can be used to
             | disaster recovery in a new region and to spin up testing
             | environments.
        
             | sgarland wrote:
             | A small app (or a larger one, for that matter) can quite
             | easily run on infra that's instantiated from canned IaC,
             | like TF AWS Modules [0]. If you can read docs, you should
             | be able to quite trivially get some basic infra up in a
             | day, even with zero prior experience managing it.
             | 
             | [0]: https://github.com/terraform-aws-modules
        
               | icedchai wrote:
               | Yes, I've used several of these modules myself. They save
               | tons of time! Unfortunately, for legacy projects, I
               | inherited a bunch of code from individuals that built
               | everything "by hand" then copy-pasted everything. No re-
               | usability.
        
           | xorcist wrote:
           | > every minute messing with and monitoring servers
           | 
           | You're not monitoring your deployments because "cloud"?
        
         | jeffbee wrote:
         | The problem with your claims here is they can only be right if
         | the entire industry is experiencing mass psychosis. I reject a
         | theory that requires that, because my ego just isn't that
         | large.
         | 
         | I once worked for several years at a publicly traded firm well-
         | known for their return-to-on-prem stance, and honestly it was a
         | complete disaster. The first-party hardware designs didn't work
         | right because they didn't have the hardware designs staffing
         | levels to have de-risked to possibility that AMD would fumble
         | the performance of Zen 1, leaving them with a generation of
         | useless hardware they nonetheless paid for. The OEM hardware
         | didn't work right because they didn't have the chops to qualify
         | it either, leaving them scratching their heads for months over
         | a cohort of servers they eventually discovered were
         | contaminated with metal chips. And, most crucially, for all the
         | years I worked there, the only thing they wanted to accomplish
         | was failover from West Coast to East Coast, which never worked,
         | not even once. When I left that company they were negotiating
         | with the data center owner who wanted to triple the rent.
         | 
         | These experiences tell me that cloud skeptics are sometimes
         | missing a few terms in their equations.
        
           | johnklos wrote:
           | > The problem with your claims here is they can only be right
           | if the entire industry is experiencing mass psychosis.
           | 
           | What's the market share of Windows again? ;)
        
             | mardifoufs wrote:
             | You're proving their point though. Considering that there
             | are tons of reasons to use windows, some people just don't
             | see them and think that everyone else is crazy :^) (I know
             | you're joking but some people actually unironically have
             | the same sentiment)
        
           | noprocrasted wrote:
           | There's however a middle-ground between run your own
           | colocated hardware and cloud. It's called "dedicated" servers
           | and many hosting providers (from budget bottom-of-the-barrel
           | to "contact us" pricing) offer it.
           | 
           | Those take on the liability of sourcing, managing and
           | maintaining the hardware for a flat monthly fee, and would
           | take on such risk. If they make a bad bet purchasing
           | hardware, you won't be on the hook for it.
           | 
           | This seems like a point many pro-cloud people
           | (intentionally?) overlook.
        
           | floating-io wrote:
           | "Vendor problems" is a red herring, IMO; you can have those
           | in the cloud, too.
           | 
           | It's been my experience that those who can build good,
           | reliable, high-quality systems, can do so either in the cloud
           | or on-prem, generally with equal ability. It's just another
           | platform to such people, and they will use it appropriately
           | and as needed.
           | 
           | Those who can only make it work in the cloud are either
           | building very simple systems (which is one place where the
           | cloud can be appropriate), or are building a house of cards
           | that will eventually collapse (or just cost them obscene
           | amounts of money to keep on life support).
           | 
           | Engineering is engineering. Not everyone in the business does
           | it, unfortunately.
           | 
           | Like everything, the cloud has its place -- but don't
           | underestimate the number of decisions that get taken out of
           | the hands of technical people by the business people who went
           | golfing with their buddy yesterday. He just switched to
           | Azure, and it made his accountants really happy!
           | 
           | The whole CapEx vs. OpEx issue drives me batty; it's the
           | number one cause of cloud migrations in my career. For
           | someone who feels like spent money should count as spent
           | money regardless of the bucket it comes out of, this twists
           | my brain in knots.
           | 
           | I'm clearly not a finance guy...
        
             | sgarland wrote:
             | > or are building a house of cards that will eventually
             | collapse (or just cost them obscene amounts of money to
             | keep on life support)
             | 
             | Ding ding ding. It's this.
             | 
             | > The whole CapEx vs. OpEx issue drives me batty
             | 
             | Seconded. I can't help but feel like it's not just a "I
             | don't understand money" thing, but more of a "the way Wall
             | Street assigns value is fundamentally broken." Spending
             | $100K now, once, vs. spending $25K/month indefinitely does
             | not take a genius to figure out.
        
             | krsgjerahj wrote:
             | you forgot cogs
             | 
             | it's all about painting the right picture for your
             | investors, so you make up shit and classify as cogs or opex
             | depending on what is most beneficial for you in the moment
        
           | marcosdumay wrote:
           | > The problem with your claims here is they can only be right
           | if the entire industry is experiencing mass psychosis.
           | 
           | Yes. Mass psychosis explains an incredible number of
           | different and apparently unrelated problems with the
           | industry.
        
         | onli wrote:
         | The one convincing argument from technical people I saw, that
         | would be repeated to your comment, is that by now, you dont
         | find enough experienced engineers to reliably setup some really
         | big systems. Because so much went to the cloud, a lot of the
         | knowledge is buried there.
         | 
         | That came from technical people who I didn't perceive as being
         | dogmatically pro-cloud.
        
         | zosima wrote:
         | Cloud expands the capabilities of what one team can manage by
         | themselves, enabling them to avoid a huge amount of internal
         | politics.
         | 
         | This is worth astronomical amounts of money in big corps.
        
           | acedTrex wrote:
           | I have said for years the value of cloud is mainly its api,
           | thats the selling point in large enterprise.
        
             | sgarland wrote:
             | Self-hosted software also has APIs, and Terraform
             | libraries, and Ansible playbooks, etc. It's just that you
             | have to know what it is you're trying to do, instead of
             | asking AWS what collection of XaaS you should use.
        
           | sgarland wrote:
           | I'm not convinced this is entirely true. The upfront cost if
           | you don't have the skills, sure - it takes time to learn
           | Linux administration, not to mention management tooling like
           | Ansible, Puppet, etc.
           | 
           | But once those are set up, how is it different? AWS is quite
           | clear with their responsibility model that you still have to
           | tune your DB, for example. And for the setup, just as there
           | are Terraform modules to do everything under the sun, there
           | are Ansible (or Chef, or Salt...) playbooks to do the same.
           | For both, you _should_ know what all of the options are
           | doing.
           | 
           | The only way I see this sentiment being true is that a dev
           | team, with no infrastructure experience, can more easily spin
           | up a lot of infra - likely in a sub-optimal fashion - to run
           | their application. When it inevitably breaks, they can then
           | throw money at the problem via vertical scaling, rather than
           | addressing the root cause.
        
             | the__alchemist wrote:
             | Do you need those tools? It seems that for fundamental web
             | hosting, you need your application server, nginx or
             | similar, postgres or similar, and a CLI. (And an
             | interpreter etc if your application is in an interpreted
             | lang)
        
               | sgarland wrote:
               | I suppose that depends on your RTO. With cloud providers,
               | even on a bare VM, you can to some extent get away with
               | having no IaC, since your data (and therefore config) is
               | almost certainly on networked storage which is redundant
               | by design. If an EC2 fails, or even if one of the drives
               | in your EBS drive fails, it'll probably come back up as
               | it was.
               | 
               | If it's your own hardware, if you don't have IaC of some
               | kind - even something as crude as a shell script - then a
               | failure may well mean you need to manually set everything
               | up again.
        
               | noprocrasted wrote:
               | Get two servers (or three, etc)?
        
               | sgarland wrote:
               | Well, sure - I was trying to do a comparison in favor of
               | cloud, because the fact that EBS Volumes can magically
               | detach and attach is admittedly a neat trick. You can of
               | course accomplish the same (to a certain scale) with
               | distributed storage systems like Ceph, Longhorn, etc. but
               | then you have to have multiple servers, and if you have
               | multiple servers, you probably also have your application
               | load balanced with failover.
        
               | zbentley wrote:
               | For fundamentals, that list is missing:
               | 
               | - Some sort of firewall or network access control. Being
               | able to say "allow http/s from the world (optionally
               | minus some abuser IPs that cause problems), and allow SSH
               | from developers (by IP, key, or both)" at a separate
               | layer from nginx is prudent. Can be ip/tables config on
               | servers or a separate firewall appliance.
               | 
               | - Some mechanism of managing storage persistence for the
               | database, e.g. backups, RAID, data files stored on fast
               | network-attached storage, db-level replication. Not
               | losing all user data if you lose the DB server is table
               | stakes.
               | 
               | - Something watching external logging or telemetry to let
               | administrators know when errors (e.g. server failures,
               | overload events, spikes in 500s returned) occur. This
               | could be as simple as Pingdom or as involved as automated
               | alerting based on load balancer metrics. Relying on users
               | to report downtime events is not a good approach.
               | 
               | - Some sort of CDN, for applications with a frontend
               | component. This isn't required for fundamental web
               | hosting, but for sites with a frontend and even moderate
               | (10s/sec) hit rates, it can become required for
               | cost/performance; CDNs help with egress congestion (and
               | fees, if you're paying for metered bandwidth).
               | 
               | - Some means of replacing infrastructure from nothing. If
               | the server catches fire or the hosting provider nukes it,
               | having a way to get back to where you were is important.
               | Written procedures are fine if you can handle long
               | downtime while replacing things, but even for a handful
               | of application components those procedures get pretty
               | lengthy, so you start wishing for automation.
               | 
               | - Some mechanism for deploying new code, replacing
               | infrastructure, or migrating data. Again, written
               | procedures are OK, but start to become unwieldy very
               | early on ('stop app, stop postgres, upgrade the postgres
               | version, start postgres, then apply application
               | migrations to ensure compatibility with new version of
               | postgres, then start app--oops, forgot to take a postgres
               | backup/forgot that upgrading postgres would break the
               | replication stream, gotta write that down for net
               | time...').
               | 
               | ...and that's just for a very, very basic web hosting
               | application--one that doesn't need caches, blob stores,
               | the ability to quickly scale out application server or
               | database capacity.
               | 
               | Each of those things can be accomplished the traditional
               | way--and you're right, that sometimes that way is easier
               | for a given item in the list (especially if your
               | maintainers have expertise in that item)! But in
               | aggregate, having a cloud provider handle each of those
               | concerns tends to be easier overall and not require
               | nearly as much in-house expertise.
        
             | zosima wrote:
             | You are focusing on technology. And sure of course you can
             | get most of the benefits of AWS a lot cheaper when self-
             | hosting.
             | 
             | But when you start factoring internal processes and
             | incompetent IT departments, suddenly that's not actually a
             | viable option in many real-world scenarios.
        
               | jeffbee wrote:
               | Exactly. With the cloud you can suddenly do all the
               | things your tyrannical Windows IT admin has been saying
               | are impossible for the last 30 years.
        
               | the_arun wrote:
               | It is similar to cooking at home vs ordering cooked food
               | everyday. If some guarantees the taste & quality people
               | would happy to outsource it.
        
             | tylerchurch wrote:
             | I think this is only true for teams and apps of a certain
             | size.
             | 
             | I've worked on plenty of teams with relatively small apps,
             | and the difference between:
             | 
             | 1. Cloud: "open up the cloud console and start a VM"
             | 
             | 2. Owned hardware: "price out a server, order it, find a
             | suitable datacenter, sign a contract, get it racked, etc."
             | 
             | Is quite large.
             | 
             | #1 is 15 minutes for a single team lead.
             | 
             | #2 requires the team to agree on hardware specs, get
             | management approval, finance approval, executives signing
             | contracts. And through all this you don't have anything
             | online yet for... weeks?
             | 
             | If your team or your app is large, this probably all
             | averages out in favor of #2. But small teams often don't
             | have the bandwidth or the budget.
        
               | noprocrasted wrote:
               | 3. "Dedicated server" at any hosting provider
               | 
               | Open their management console, press order now, 15 mins
               | later get your server's IP address.
        
               | zbentley wrote:
               | For purposes of this discussion, isn't AWS just a very
               | large hosting provider?
               | 
               | I.e. most hosting providers give you the option for
               | virtual or dedicated hardware. So does Amazon (metal
               | instances).
               | 
               | Like, "cloud" was always an ill-defined term, but in the
               | case of "how do I provision full servers" I think there's
               | no qualitative difference between Amazon and other
               | hosting providers. Quantitative, sure.
        
               | noprocrasted wrote:
               | > Amazon (metal instances)
               | 
               | But you still get nickel & dimed and pay insane costs,
               | including on bandwidth (which is free in most
               | conventional hosting providers, and overages are 90x
               | cheaper than AWS' costs).
        
               | irunmyownemail wrote:
               | Qualitatively, AWS is greedy and nickle and dime you to
               | death. Their Route53 service doesn't even have all the
               | standard DNS options I need and I can get everywhere else
               | or even on my own running bind9. I do not use IPv6 for
               | several reasons, when AWS decided charge for IPv4, I went
               | looking elsewhere to get my VM's.
               | 
               | I can't even imagine how much the US Federal Government
               | is charging American taxpayers to pay AWS for hosting
               | there, it has to be astronomical.
        
               | everfrustrated wrote:
               | Out of curiosity, which DNS record types do you need that
               | Route53 doesn't support?
        
               | goodpoint wrote:
               | More like 15 seconds.
        
               | AnthonyMouse wrote:
               | You're assuming that hosting something in-house implies
               | that each application gets its own physical server.
               | 
               | You buy a couple of beastly things with dozens of cores.
               | You can buy twice as much capacity as you actually use
               | and still be well under the cost of cloud VMs. Then it's
               | still VMs and adding one is just as fast. When the load
               | gets above 80% someone goes through the running VMs and
               | decides if it's time to do some house cleaning or it's
               | time to buy another host, but no one is ever waiting on
               | approval because you can use the reserve capacity
               | immediately while sorting it out.
        
               | necovek wrote:
               | Before the cloud, you could get a VM provisioned (virtual
               | servers) or a couple of apps set up (LAMP stack on a
               | shared host ;)) in a few minutes over a web interface
               | already.
               | 
               | "Cloud" has changed that by providing an API to do this,
               | thus enabling IaC approach to building combined hardware
               | and software architectures.
        
               | layer8 wrote:
               | The SMB I work for runs a small on-premise data center
               | that is shared between teams and projects, with maybe 3-4
               | FTEs managing it (the respective employees also do dev
               | and other work). This includes self-hosting email,
               | storage, databases, authentication, source control, CI,
               | ticketing, company wiki, chat, and other services. The
               | current infrastructure didn't start out that way and
               | developed over many years, so it's not necessarily
               | something a small startup can start out with, but beyond
               | a certain company size (a couple dozen employees or more)
               | it shouldn't really be a problem to develop that, if
               | management shares the philosophy. I certainly find it
               | preferable culturally, if not technically, to maximize
               | independence in that way, have the local expertise and
               | much better control over everything.
               | 
               | One (the only?) indisputable benefit of cloud is the
               | ability to scale up faster (elasticity), but most
               | companies don't really need that. And if you do end up
               | needing it after all, then it's a good problem to have,
               | as they say.
        
               | SoftTalker wrote:
               | Your last paragraph identifies the reason that running
               | their own hardware makes sense for Fastmail. The demand
               | for email is pretty constant. Everyone does roughly the
               | same amount of emailing every day. Daily load is
               | predictable, and growth is predictable.
               | 
               | If your load is very spiky, it might make more sense to
               | use cloud. You pay more for the baseline, but if your
               | spikes are big enough it can still be cheaper than
               | provisioning your own hardware to handle the highest
               | loads.
               | 
               | Of course there's also possibly a hybrid approach, you
               | run your own hardware for base load and augment with
               | cloud for spikes. But that's more complicated.
        
               | maccard wrote:
               | I work for a 50 person subsidiary of a 30k person
               | organisation. I needed a domain name. I put in the
               | purchase request and 6 months later eventually gave up,
               | bought it myself and expensed it.
               | 
               | Our AWS account is managed by an SRE team. It's a 3 day
               | turnaround process to get any resources provisioned, and
               | if you don't get the exact spec right (you forgot to
               | specify the iops on the volume? Oops) 3 day turnaround.
               | Already started work when you request an adjustment?
               | Better hope as part of your initial request you specified
               | backups correctly or you're starting again.
               | 
               | The overhead is absolutely enormous, and I actually don't
               | even have billing access to the AWS account that I'm
               | responsible for.
        
               | j45 wrote:
               | Manageability of cloud without a dedicated resource is a
               | form of resource creep, and shadow labour costs that
               | aren't factored in.
               | 
               | How many things don't end up happening because of this?
               | When they need a sliver of resources in the start?
        
               | cyberax wrote:
               | > Our AWS account is managed by an SRE team.
               | 
               | That's an anti-pattern (we call it "the account") in the
               | AWS architecture.
               | 
               | AWS internally just uses multiple accounts, so a team can
               | get their own account with centrally-enforced guardrails.
               | It also greatly simplifies billing.
        
               | maccard wrote:
               | That's not something that I have control over or
               | influence over.
        
               | mbesto wrote:
               | > 3 day turnaround process to get any resources
               | provisioned
               | 
               | Now imagine having to deal with procurement to purchase
               | hardware for your needs. 6 months later you have a
               | server. Oh you need a SAN for object storage? There goes
               | another 6 months.
        
               | maccard wrote:
               | At a previous job we had some decent on prem resources
               | for internal services. The SRE guys had a bunch of extra
               | compute and you would put in a ticket for a certain
               | amount of resources (2 cpu, SSD, 8GB memory x2 on
               | different hosts). There wasn't a massive amount of
               | variability between the hardware, and you just requested
               | resources to be allocated from a bunch of hypervisors.
               | Turnaround time was about 3 days too. Except, you were t
               | required to be self sufficient in AWS terminology to
               | request exactly what you needed .
        
               | xorcist wrote:
               | There is a large gap between "own the hardware" and "use
               | cloud hosting". Many people rent the hardware, for
               | example, and you can use managed databases which is one
               | step up than "starting a vm".
               | 
               | But your comparison isn't fair. The difference between
               | running your own hardware and using the cloud (which is
               | perhaps not even the relevant comparison but let's run
               | with it) is the difference between:
               | 
               | 1. Open up the cloud console, and
               | 
               | 2. You already have the hardware so you just run "virsh"
               | or, more likely, do nothing at all because you own the
               | API so you have already included this in your Ansible or
               | Salt or whatever you use for setting up a server.
               | 
               | Because ordering a new physical box isn't really
               | comparable to starting a new VM, is it?
        
               | sanderjd wrote:
               | I've always liked the theory of #2, I just haven't worked
               | anywhere yet that has executed it well.
        
               | Symbiote wrote:
               | You have omitted the option between the two, which is
               | renting a server. No hardware to purchase, maintain or
               | set up. Easily available in 15 minutes.
        
               | tylerchurch wrote:
               | While I did say "VM" in my original comment, to me this
               | counts as "cloud" because the UI is functionally the
               | same.
        
               | amluto wrote:
               | I've never worked at a company with these particular
               | problems, but:
               | 
               | #1: A cloud VM comes with an obligation for someone at
               | the company to maintain it. The cloud does not excuse
               | anyone from doing this.
               | 
               | #2: Sounds like a dysfunctional system. Sure, it may be
               | common, but a medium sized org could easily have some
               | datacenter space and allow any team to rent a server or
               | an instance, or to buy a server and pay some nominal
               | price for the IT team to keep it working. This isn't
               | actually rocket science.
               | 
               | Sure, keeping a fifteen year old server working safely is
               | a chore, but so is maintaining a fifteen-year-old VM
               | instance!
        
               | icedchai wrote:
               | Obligation? Far from it. I've worked at some poorly
               | staffed companies. Nobody is maintaining old VMs or
               | container images. If it works, nobody touches it.
               | 
               | I worked at a supposedly properly staffed company that
               | had raised 100's of millions in investment, and it was
               | the same thing. VMs running 5 year old distros that
               | hadn't been updated in years. 600 day uptimes, no kernel
               | patches, ancient versions of Postgres, Python 2.7 code
               | everywhere, etc. This wasn't 10 years ago. This was 2
               | years ago!
        
               | j45 wrote:
               | The cloud is someone else's computer.
               | 
               | Having redirected of a vm provider or installing a hyper
               | visor on equipment is another thing.
        
               | j45 wrote:
               | There is. Middle ground between the extremes of those
               | pendulums of all cloud or physical metal.
               | 
               | You can start with using a cloud only for VMs and only
               | run services on it using IaaS or PaaS. Very serviceable.
        
               | warner25 wrote:
               | You gave me flashbacks to a far worse bureaucratic
               | nightmare with #2 in my last job.
               | 
               | I supported an application with a team of about three
               | people for a regional headquarters in the DoD. We had one
               | stack of aging hardware that was racked, on a handshake
               | agreement with another team, in a nearby facility under
               | that other team's control. We had to periodically request
               | physical access for maintenance tasks and the facility
               | routinely lost power, suffered local network outages,
               | etc. So we decided that we needed new hardware and more
               | of it spread across the region to avoid the shaky single-
               | point-of-failure.
               | 
               | That began a three _year_ process of: waiting for budget
               | to be available for the hardware  / license / support
               | purchases; pitching PowerPoints to senior management to
               | argue for that budget (and getting updated quotes every
               | time from the vendors); working out agreements with other
               | teams at new facilities to rack the hardware; traveling
               | to those sites to install stuff; and working through the
               | cybersecurity compliance stuff for each site. I left
               | before everything was finished, so I don't know how they
               | ultimately dealt with needing, say, someone to physically
               | reseat a cable in Japan (an international flight away).
        
             | bonoboTP wrote:
             | You can get pretty far without any of that fancy stuff. You
             | can get plenty done by using parallel-ssh and then focusing
             | on the actual thing you develop instead of endless tooling
             | and docker and terraform and kubernetes and salt and puppet
             | and ansible. Sure, if you know why you need them and know
             | what value you get from them OK. But many people just do it
             | because it's the thing to do...
        
             | marcosdumay wrote:
             | All of that is... completely unrelated to the GP's post.
             | 
             | Did you reply to the right comment? Do you think "politics"
             | is something you solve with Ansible?
        
               | sgarland wrote:
               | > Cloud expands the capabilities of what one team can
               | manage by themselves, enabling them to avoid a huge
               | amount of internal politics.
               | 
               | It's related to the first part. Re: the second, IME if
               | you let dev teams run wild with "managing their own
               | infra," the org as a whole eventually pays for that when
               | the dozen bespoke stacks all hit various bottlenecks, and
               | no one actually understands how they work, or how to
               | troubleshoot them.
               | 
               | I keep being told that "reducing friction" and
               | "increasing velocity" are good things; I vehemently
               | disagree. It might be good for short-term profits, but it
               | is poison for long-term success.
        
             | sanderjd wrote:
             | I have never ever worked somewhere with one of these
             | "cloud-like but custom on our own infrastructure" setups
             | that didn't leak infrastructure concerns through the
             | abstraction, to a significantly larger degree than AWS.
             | 
             | I believe it can work, so maybe there are really successful
             | implementations of this out there, I just haven't seen it
             | myself yet!
        
           | daemonologist wrote:
           | Our big company locked all cloud resources behind a
           | floating/company-wide DevOps team (git and CI too). We have
           | an old on-prem server that we jealously guard because it
           | allows us to create remotes for new git repos and deploy
           | prototypes without consulting anyone.
           | 
           | (To be fair, I can see why they did it - a lot of deployments
           | were an absolute mess before.)
        
           | mark242 wrote:
           | This is absolutely spot on.
           | 
           | What do you mean, I can't scale up because I've used my
           | hardware capex budget for the year?
        
         | glitchc wrote:
         | Cloud solves one problem quite well: Geographic redundancy.
         | It's extremely costly with on-prem.
        
           | sgarland wrote:
           | Only if you're literally running your own datacenters, which
           | is in no way required for the majority of companies. Colo
           | giants like Equinix already have the infrastructure in place,
           | with a proven track record.
           | 
           | If you enable Multi-AZ for RDS, your bill doubles until you
           | cancel. If you set up two servers in two DCs, your initial
           | bill doubles from the CapEx, and then a very small percentage
           | of your OpEx goes up every month for the hosting. You very,
           | very quickly make this back compared to cloud.
        
             | Cyph0n wrote:
             | But reliable connectivity between regions/datacenters
             | remains a challenge, right? Compute is only one part of the
             | equation.
             | 
             | Disclaimer: I work on a cloud networking product.
        
               | sgarland wrote:
               | It depends on how deep you want to go. Equinix for one
               | (I'm sure others as well, but I'm most familiar with
               | them) offers managed cross-DC fiber. You will probably
               | need to manage the networking, to be fair, and I will
               | readily admit that's not trivial.
        
               | irunmyownemail wrote:
               | I use Wireguard, pretty simple, where's the challenge?
        
               | Cyph0n wrote:
               | I am referring to the layer 3 connectivity that Wireguard
               | is running on top of. Depending on your use case and
               | reliability and bandwidth requirements, routing
               | everything over the "public" internet won't cut it.
               | 
               | Not to mention setting up and maintaining your physical
               | network as the number of physical hosts you're running
               | scales.
        
           | dietr1ch wrote:
           | Does it? I've seen outages around "Sorry, us-west_carolina-3
           | is down". AWS is particularly good at keeping you aware of
           | their datacenters.
        
             | bdangubic wrote:
             | if you see that you are doing it wrong :)
        
               | sgarland wrote:
               | AWS has had multiple outages which were caused by a
               | single AZ failing.
        
               | dietr1ch wrote:
               | Yup, I was referring to, I guess, one of these,
               | 
               | - https://news.ycombinator.com/item?id=29473630:
               | (2021-12-07) AWS us-east-1 outage
               | 
               | - https://news.ycombinator.com/item?id=29648286:
               | (2021-12-22) Tell HN: AWS appears to be down again
               | 
               | Maybe things are better now, but it became apparent that
               | people might be misusing cloud providers or betting that
               | things work flawlessly even if they completely ignore
               | AZs.
        
             | toast0 wrote:
             | It can be useful. I run a latency sensitive service with
             | global users. A cloud lets me run it in 35 locations
             | dealing with one company only. Most of those locations only
             | have traffic to justify a single, smallish, instance.
             | 
             | In the locations where there's more traffic, and we need
             | more servers, there are more cost effective providers, but
             | there's value in consistency.
             | 
             | Elasticity is nice too, we doubled our instance count for
             | the holidays, and will return to normal in January. And our
             | deployment style starts a whole new cluster, moves traffic,
             | then shuts down the old cluster. If we were on owned
             | hardware, adding extra capacity for the holidays would be
             | trickier, and we'd have to have a more sensible deployment
             | method. And the minimum service deployment size would
             | probably not be a little quad processor box with 2GB ram.
             | 
             | Using cloud for the lower traffic locations and a cost
             | effective service for the high traffic locations would
             | probably save a bunch of money, but add a lot of deployment
             | pain. And a) it's not my decision and b) the cost
             | difference doesn't seem to be quite enough to justify the
             | pain at our traffic levels. But if someone wants to make a
             | much lower margin, much simpler service with lots of
             | locations and good connectivity, be sure to post about it.
             | But, I think the big clouds have an advantage in geographic
             | expansion, because their other businesses can provide
             | capital and justification to build out, and high margins at
             | other locations help cross subsidize new locations when
             | they start.
        
               | dietr1ch wrote:
               | I agree it can be useful (latency, availability, using
               | off-peak resources), but running globally should be a
               | default and people should opt-in into fine-grained
               | control and responsibility.
               | 
               | From outside it seems that either AWS picked the wrong
               | default to present their customers, or that it's
               | unreasonably expensive and it drives everyone into the
               | in-depth handling to try to keep cloud costs down.
        
           | icedchai wrote:
           | Except, almost nobody, outside of very large players, does
           | cross region redundancy. us-east-1 is like a SPOF for the
           | entire Internet.
        
           | liontwist wrote:
           | Cloud noob here. But if I have a central database what can I
           | distribute across geographic regions? Static assets? Maybe a
           | cache?
        
             | sgarland wrote:
             | Yep. Cross-region RDBMS is a hard problem, even when you're
             | using a managed service - you practically always have to
             | deal with eventual consistency, or increased latency for
             | writes.
        
           | ayuhito wrote:
           | My company used to do everything on-prem. Until a literal
           | earthquake and tsunami took down a bunch of systems.
           | 
           | After that, yeah we'll let AWS do the hard work of enabling
           | redundancy for us.
        
         | sgarland wrote:
         | > What's particularly fascinating to me, though, is how some
         | people are so pro-cloud that they'd argue with a writeup like
         | this with silly cloud talking points.
         | 
         | I'm sure I'll be downvoted to hell for this, but I'm convinced
         | that it's largely their insecurities being projected.
         | 
         | Running your own hardware isn't tremendously difficult, as
         | anyone who's done it can attest, but it does require a much
         | deeper understanding of Linux (and of course, any services
         | which previously would have been XaaS), and that's a vanishing
         | trait these days. So for someone who may well be quite skilled
         | at K8s administration, serverless (lol) architectures, etc. it
         | probably is seen as an affront to suggest that their skill set
         | is lacking something fundamental.
        
           | TacticalCoder wrote:
           | > So for someone who may well be quite skilled at K8s
           | administration ...
           | 
           | And running your own hardware is not incompatible with
           | Kubernetes: on the contrary. You can fully well have your
           | infra spin up VMs and then do container orchestration if
           | that's your thing.
           | 
           | And part your hardware monitoring and reporting tool can work
           | perfectly fine from containers.
           | 
           | Bare metal -> Hypervisor -> VM -> container orchestration ->
           | a container running a "stateless" hardware monitoring
           | service. And VMs themselves are "orchestrated" too.
           | Everything can be automated.
           | 
           | Anyway say a harddisk being to show errors? Notifications
           | being sent (email/SMS/Telegram/whatever) by another service
           | in another container, dashboard shall show it too (dashboards
           | are cool).
           | 
           | Go to the machine once the spare disk as already been
           | resilvered, move it where the failed disk was, plug in a new
           | disk that becomes the new spare.
           | 
           | Boom, done.
           | 
           | I'm not saying all self-hosted hardware should do container
           | orchestration: there are valid use cases for bare metal too.
           | 
           | But something as to be said about controlling _everything_ on
           | your own infra: from the bare metal to the VMs to container
           | orchestration. To even potentially your own IP address space.
           | 
           | This is all within reach of an _individual_ , both skill-wise
           | and price-wise (including obtaining your own IP address
           | space). People who drank the cloud kool-aid should ponder
           | this and wonder how good their skills truly are if they
           | cannot get this up and working.
        
             | sgarland wrote:
             | Fully agree. And if you want to take it to the next level
             | (and have a large budget), Oxide [0] seems to have neatly
             | packaged this into a single coherent product. They don't
             | quite have K8s fully running, last I checked, but there are
             | of course other container orchestration systems.
             | 
             | > Go to the machine once the spare disk as already been
             | resilvered
             | 
             | Hi, fellow ZFS enthusiast :-)
             | 
             | [0]: https://oxide.computer
        
             | noprocrasted wrote:
             | > And running your own hardware is not incompatible with
             | Kubernetes: on the contrary
             | 
             | Kubernetes actually makes so much more sense on bare-metal
             | hardware.
             | 
             | On the cloud, I think the value prop is dubious - your
             | cloud provider is already giving you VMs, why would you
             | need to subdivide them further and add yet another layer of
             | orchestration?
             | 
             | Not to mention that you're getting 2010s-era performance on
             | those VMs, so subdividing them is terrible from a
             | performance point of view too.
        
               | sgarland wrote:
               | > Not to mention that you're getting 2010s-era
               | performance on those VMs, so subdividing them is terrible
               | from a performance point of view too.
               | 
               | I was trying in vain to explain to our infra team a
               | couple of weeks ago why giving my team a dedicated node
               | of a newer instance family with DDR5 RAM would be
               | beneficial for an application which is heavily
               | constrained by RAM speed. People seem to assume that
               | compute is homogenous.
        
               | theideaofcoffee wrote:
               | I would wager that the same kind of people that were
               | arguing against your request for a specific hardware
               | config are the same ones in this comment section railing
               | against any sort of self-sufficiency by hosting it
               | yourself on hardware. All they know is cloud, all they
               | know how to do is "ScAlE Up thE InStanCE!" when shit hits
               | the fan. It's difficult to argue against that and make
               | real progress. I understand your frustration completely.
        
             | irunmyownemail wrote:
             | I agree, I run PROD, TEST and DEV kube clusters all in
             | VM's, works great.
        
         | luplex wrote:
         | In the public sector, cloud solves the procurement problem. You
         | just need to go through the yearlong process once to use a
         | cloud service, instead of for each purchase > 1000EUR.
        
         | moltar wrote:
         | Cloud is more than instances. If all you need is a bunch of
         | boxes, then cloud is a terrible fit.
         | 
         | I use AWS cloud a lot, and almost never use any VMs or
         | instances. Most instances I use are along the lines of a simple
         | anemic box for a bastion host or some such.
         | 
         | I use higher level abstractions (services) to simplify
         | solutions and outsource maintenance of these services to AWS.
        
         | TacticalCoder wrote:
         | > All the pro-cloud talking points are just that - talking
         | points that don't persuade anyone with any real technical
         | understanding ...
         | 
         | And moreover most of the actual interesting things, like having
         | VM templates and stateless containers, orchestration, etc. is
         | very easy to run yourself and gets you 99.9% of the benefits of
         | the cloud.
         | 
         | About just any and every service is available as container file
         | already written for you. And if it doesn't exist, it's not hard
         | to plumb up.
         | 
         | A friend of mine runs more than 700 containers (yup, seven
         | hundreds), split over his own rack at home (half of them) and
         | the other half on dedicated servers (he runs stuff like
         | FlightRadar, AI models, etc.). He'll soon get his own IP
         | addresses space. Complete "chaos monkey" ready infra where you
         | can cut any cable and the thing shall keep working: everything
         | is duplicated, can be spun up on demand, etc. Someone could
         | still his entire rack and all his dedicated server, he'd still
         | be back operational in no time.
         | 
         | If an individual can do that, a company, no matter its size,
         | can do it too. And arguably 99.9% of all the companies out
         | there don't have the need for an infra as powerful as the one
         | most homelab enthusiast have.
         | 
         | And another thing: there's even two in-betweens between "cloud"
         | and "our own hardware located at our company". First is
         | colocating your own hardware but in a datacenter. Second is
         | renting dedicated servers from a datacenter.
         | 
         | They're often ready to accept cloud-init directly.
         | 
         | And it's not hard. I'd say learning to configure hypervisors on
         | bare metal, then spin VMs from templates, then running
         | containers inside the VMs is actually much easier than learning
         | all the idiosyncrasies of all the different cloud vendors APIs
         | and whatnots.
         | 
         | Funnily enough when the pendulum swung way too far on the
         | "cloud all the things" side, those saying at some point we'd
         | read story about repatriation were being made fun of.
        
           | sgarland wrote:
           | > If an individual can do that, a company, no matter its
           | size, can do it too.
           | 
           | Fully agreed. I don't have physical HA - if someone stole my
           | rack, I would be SOL - but I can easily ride out a power
           | outage for as long as I want to be hauling cans of gasoline
           | to my house. The rack's UPS can keep it up at full load for
           | at least 30 minutes, and I can get my generator running and
           | hooked up in under 10. I've done it multiple times. I can
           | lose a single server without issue. My only SPOF is internet,
           | and that's only by choice, since I can get both AT&T and
           | Spectrum here, and my router supports dual-WAN with auto-
           | failover.
           | 
           | > And arguably 99.9% of all the companies out there don't
           | have the need for an infra as powerful as the one most
           | homelab enthusiast have.
           | 
           | THIS. So many people have no idea how tremendously fast
           | computers are, and how much of an impact latency has on
           | speed. I've benchmarked my 12-year old Dells against the
           | newest and shiniest RDS and Aurora instances on both MySQL
           | and Postgres, and the only ones that kept up were the ones
           | with local NVMe disks. Mine don't even technically have
           | _local_ disks; they're NVMe via Ceph over Infiniband.
           | 
           | Does that scale? Of course not; as soon as you want geo-
           | redundant, consistent writes, you _will_ have additional
           | latency. But most smaller and medium companies don't _need_
           | that.
        
         | dan-robertson wrote:
         | Well cloud providers often give more than just VMs in a data
         | enter somewhere. You may not be able to find good equivalents
         | if you aren't using the cloud. Some third-party products are
         | also only available on clouds. How much of a difference those
         | things make will depend on what you're trying to do.
         | 
         | I think there are accounting reasons for companies to prefer
         | paying opex to run things on the cloud instead of more capex-
         | intensive self-hosting, but I don't understand the dynamics
         | well.
         | 
         | It's certainly the case that clouds tend to be more expensive
         | than self-hosting, even when taking account of the discounts
         | that moderately sized customers can get, and some of the
         | promises around elastic scaling don't really apply when you are
         | bigger.
         | 
         | To some of your other points: the main customers of companies
         | like AWS are businesses. Businesses generally don't care about
         | the centralisation of the internet. Businesses are capable of
         | reading the contracts they are signing and not signing them if
         | privacy (or, typically more relevant to businesses, their IP)
         | cannot be sufficiently protected. It's not really clear to me
         | that using a cloud is going to be less secure than doing things
         | on-prem.
        
         | tyingq wrote:
         | I think part of it was a way for dev teams to get an infra team
         | that was not empowered to say no. Plus organizational theory,
         | empire building, etc.
        
           | sgarland wrote:
           | Yep. I had someone tell me last week that they didn't want a
           | more rigid schema because other teams rely on it, and
           | anything adding "friction" to using it would be poorly
           | received.
           | 
           | As an industry, we are largely trading correctness and
           | performance for convenience, and this is not seen as a
           | negative by most. What kills me is that at every cloud-native
           | place I've worked at, the infra teams were both responsible
           | for maintaining and fixing the infra that product teams
           | demanded, but were not empowered to push back on unreasonable
           | requests or usage patterns. It's usually not until either the
           | limits of vertical scaling are reached, or a SEV0 occurs
           | where these decisions were the root cause does leadership
           | even begin to consider changes.
        
         | tzs wrote:
         | There was a time when cloud was significantly cheaper then
         | owning.
         | 
         | I'd expect that there are people who moved to the cloud then,
         | and over time started using services offered by their cloud
         | provider (e.g., load balancers, secret management, databases,
         | storage, backup) instead of running those services themselves
         | on virtual machines, and now even if it would be cheaper to run
         | everything on owned servers they find it would be too much
         | effort to add all those services back to their own servers.
        
           | toomuchtodo wrote:
           | The cloud wasn't about cheap, it was about _fast_. If you're
           | VC funded, time is everything, and developer velocity above
           | all else to hyperscale and exit. That time has passed (ZIRP),
           | and the public cloud margin just doesn't make sense when you
           | can own and operate (their margin is your opportunity) on
           | prem with similar cloud primitives around storage and
           | compute.
           | 
           | Elasticity is a component, but has always been from a batch
           | job bin packing scheduling perspective, not much new there.
           | Before k8s and Nomad, there was Globus.org.
           | 
           | (Infra/DevOps in a previous life at a unicorn, large worker
           | cluster for a physics experiment prior, etc; what is old is a
           | new again, you're just riding hype cycle waves from junior to
           | retirement [mainframe->COTS on prem->cloud->on prem cloud,
           | and so on])
        
           | dboreham wrote:
           | That was never true except in the case that the required
           | hardware resources were significantly smaller than a typical
           | physical machine.
        
         | tomrod wrote:
         | <ctoHatTime> Dunno man, it's really really easy to set up an S3
         | and use it to share datasets for users authorized with IAM....
         | 
         | And IAM and other cloud security and management considerations
         | is where the opex/capex and capability argument can start to
         | break down. Turns out, the "cloud" savings comes from not
         | having capabilities in house to manage hardware. Sometimes, for
         | most businesses, you want some of that lovely reliability.
         | 
         | (In short, I agree with you, substantially).
         | 
         | Like code. It is easy to get something basic up, but
         | substantially more resources are needed for non-trivial things.
        
           | hamandcheese wrote:
           | I feel like IAM may be the sleeper killer-app of cloud.
           | 
           | I self-host a lot of things, but boy oh boy if I were running
           | a company it would be a helluvalotta work to get IAM properly
           | set up.
        
             | sanderjd wrote:
             | I strongly agree with this and also strongly lament it.
             | 
             | I find IAM to be a terrible implementation of a
             | foundationally necessary system. It feels tacked on to me,
             | except now it's tacked onto thousands of other things and
             | there's no way out.
        
               | andrewfromx wrote:
               | like terraform! isn't pulumi 100% better but there's no
               | way out of terraform.
        
             | pphysch wrote:
             | That's essentially why "platform engineering" is a hot
             | topic. There are great FOSS tools for this, largely in the
             | Kubernetes ecosystem.
             | 
             | To be clear, authentication could still be outsourced, but
             | authorizing access to (on-prem) resources in a multi-tenant
             | environment is something that "platforms" are frequently
             | designed for.
        
         | necovek wrote:
         | But isn't _using_ Fastmail akin to using a cloud provider
         | (managed email vs managed everything else)? They are similarly
         | a service provider, and as a customer, you don 't really care
         | "who their ISP is?"
         | 
         | The discussion matters when we are talking about _building_
         | things: whether you self-host or use managed services is a set
         | of interesting trade-offs.
        
           | citrin_ru wrote:
           | Yes, FastMail is a SAAS. But there adepts of a religion which
           | would tell you that companies like FastMail should be built
           | on top of AWS and it is the only true way. It is good to have
           | some counter narrative to this.
        
             | j45 wrote:
             | Being cloud compatible (packaged well) can be as important
             | as being cloud-agnostic (work on any cloud).
             | 
             | Too many projects become beholden to one cloud.
        
         | UltraSane wrote:
         | "All the pro-cloud talking points are just that - talking
         | points that don't persuade anyone with any real technical
         | understanding,"
         | 
         | This is false. AWS infrastructure is vastly more secure than
         | almost all company data centers. AWS has a rule that the same
         | person cannot have logical access and physical access to the
         | same storage device. Very few companies have enough IT people
         | to have this rule. The AWS KMS is vastly more secure than what
         | almost all companies are doing. The AWS network is vastly
         | better designed and operated than almost all corporate
         | networks. AWS S3 is more reliable and scalable than anything
         | almost any company could create on their own. To create
         | something even close to it you would need to implement
         | something like MinIO using 3 separate data centers.
        
           | gooosle wrote:
           | <citations needed>
        
           | j45 wrote:
           | The cloud is someone else's computer.
           | 
           | It's like putting something in someone's desk drawer under
           | the guise of convenience at the expense of security.
           | 
           | Why?
           | 
           | Too often, someone other than the data owner has or can get
           | access to the drawer directly or indirectly.
           | 
           | Also, Cloud vs self hosted to me is a pendulum that has swung
           | back and forth for a number of reasons.
           | 
           | The benefits of the cloud outlined here are often a lot of
           | open source tech packaged up and sold as manageable from a
           | web browser, or a command line.
           | 
           | One of the major reasons the cloud became popular was
           | networking issues in Linux to manage volume at scale. At the
           | time the cloud became very attractive for that reason, plus
           | being able to virtualize bare metal servers to put into any
           | combination of local to cloud hosting.
           | 
           | Self-hosting has become easier by an order of magnitude or
           | two for anyone who knew how to do it, except it's something
           | people who haven't done both self-hosting and cloud can
           | really discuss.
           | 
           | Cloud has abstracted away the cost of horsepower, and
           | converted it to transactions. People are discovering a
           | fraction of the horsepower is needed to service their
           | workloads than they thought.
           | 
           | At some point the horsepower got way beyond what they needed
           | and it wasn't noticed. But paying for a cloud is convenient
           | and standardized.
           | 
           | Company data centres can be reasonably secured using a number
           | of PaaS or IaaS solutions readily available off the shelf.
           | Tools from VMware, Proxmox and others are tremendous.
           | 
           | It may seem like there's a lot to learn, except most problems
           | they are new to someone have often been thought of a ton by
           | both people with and without experience that is beyond cloud
           | only.
        
             | the_arun wrote:
             | > The cloud is someone else's computer
             | 
             | Isn't it more like leasing in a public property? Meaning it
             | is yours as long as you are paying the lease? Analogous to
             | renting an apartment instead of owning a condo?
        
               | adamtulinius wrote:
               | Not at all. You can inspect the apartment you rent. The
               | cloud is totally opaque in that regard.
        
               | j45 wrote:
               | Totally opaque is a really nice way to describe it.
        
               | j45 wrote:
               | Nope. It's literally putting private data in a shared
               | drawer in someone else's desk where you have your area of
               | the drawer.
        
               | jameshart wrote:
               | Literally?
               | 
               | I would just like to point out that most of us who have
               | ever had a job at an office, attended an academic
               | institution, or lived in rented accommodation have kept
               | stuff in someone else's desk drawer from time to time.
               | Often a leased desk in a building rented from a random
               | landlord.
               | 
               | Keeping things in someone else's desk drawer can be
               | convenient and offer a sufficient level of privacy for
               | many purposes.
               | 
               | And your proposed alternative to using 'someone else's
               | desk drawer' is, what, make your own desk?
               | 
               | I guess, since I'm not a carpenter, I can buy a flatpack
               | desk from ikea and assemble it and keep my stuff in that.
               | I'm not sure that's an improvement to my privacy posture
               | in any meaningful sense though.
        
               | j45 wrote:
               | It doesn't have to be entirely literal, or not literal at
               | all.
               | 
               | A single point of managed/shared access to a drawer
               | doesn't fit all levels of data sensitivity and security.
               | 
               | I understand this kind of wording and analogy might be
               | triggering for the drive by down voters.
               | 
               | A comment like the above though allows both people to
               | openly consider viewpoints that may not be theirs.
               | 
               | For me it shed light on something simpler.
               | 
               | Shared access to shared infrastructure is not always
               | secure as we want to tell ourselves. It's important to be
               | aware when it might be security through abstraction.
               | 
               | The dual security and convenience of self-hosting IaaS
               | and PaaS even at a dev, staging or small scale production
               | has improved dramatically, and allows for things to be
               | built in a cloud agnostic way to allow switching clouds
               | to be much easier. It can also easily build a business
               | case to lower cloud costs. Still, it doesn't have to be
               | for everyone either, where the cloud turns to be
               | everything.
               | 
               | A small example? For a stable homeland - their a couple
               | of usff small servers running proxmox or something
               | residential fibre behind a tailscale or cloudflare funnel
               | and compare the cost for uptime. It's surprising how much
               | time servers and apps spend idling.
               | 
               | Life and the real world is more than binary. Be it all
               | cloud or no cloud.
        
               | MadnessASAP wrote:
               | > Keeping things in someone else's desk drawer can be
               | convenient and offer a sufficient level of privacy for
               | many purposes.
               | 
               | Too torture a metaphor to death, are you going to keep
               | your bank passwords in somebody else's desk drawer? Are
               | you going to keep 100 million people's bank passwords in
               | that drawer?
               | 
               | > I guess, since I'm not a carpenter, I can buy a
               | flatpack desk from ikea and assemble it and keep my stuff
               | in that. I'm not sure that's an improvement to my privacy
               | posture in any meaningful sense though.
               | 
               | If you're not a carpenter I would recommend you stay out
               | of the business of building safe desk drawers all
               | together. Although you should probably still be able to
               | recognize that the desk drawer you own, that is inside
               | your own locked house is a safer option then the one at
               | the office accessible by any number of people.
        
             | UltraSane wrote:
             | > The cloud is someone else's computer.
             | 
             | And in the case of AWS it is someone else's extremely well
             | designed and managed computer and network.
        
               | j45 wrote:
               | Generally I look to people who could build an AWS on the
               | value of it or doing it themselves because they can do
               | both.
               | 
               | Happy to hear more.
        
             | AtlasBarfed wrote:
             | One of the ways the NSA and security services get so much
             | intelligence on targets isn't by direct decryption of what
             | they are storing in data or listening in. A great deal with
             | their intelligence is simply metadata intelligence. They
             | watch what you do. They watch the amount of data you
             | transport. They watch your patterns of movement.
             | 
             | So even if eight of us is providing direct security and
             | encryption in the sense of what most security professionals
             | are concerned with key strength etc etc etc, Eddie of us
             | still has a great deal about of information about what you
             | do, because they get to watch how much data moves from
             | where to where and other information about what those
             | machines are
        
           | fulafel wrote:
           | OTOH:
           | 
           | 1. big clouds are very lucrative targets for spooks, your
           | data seem pretty likely to be hoovered up as "bycatch" (or
           | maybe main catch depending on your luck) by various agencies
           | and then traded around as currency
           | 
           | 2. you never hear about security probems (incidents or
           | exposure) in the platforms, there's no transparency
           | 
           | 3. better than most coporate stuff is a low bar
        
             | sfilmeyer wrote:
             | >3. better than most corporate stuff is a low bar
             | 
             | I think it's a very relevant bar, though. The top level
             | commenter made points about "a business of just about any
             | size", which seems pretty exactly aligned with "most
             | corporate stuff".
        
             | likeabatterycar wrote:
             | > you never hear about security probems (incidents or
             | exposure) in the platforms
             | 
             | Except that one time...
             | 
             | https://www.seattlemet.com/news-and-city-
             | life/2023/04/how-a-...
        
               | noprocrasted wrote:
               | If I remember right, the attacker's AWS employment is
               | irrelevant - no privileged AWS access was used in that
               | case. The attacker working for AWS was a pure
               | coincidence, it could've been anyone.
        
             | stefan_ wrote:
             | 4. we keep hitting hypervisor bugs and having to work
             | around the fact that your software coexists on the same
             | machine with 3rdparty untrusted software who might in fact
             | be actively trying to attack you. All this silliness with
             | encrypted memory buses and the various debilitating
             | workarounds for silicon bugs.
             | 
             | So yes, the cloud is very secure, except for the very thing
             | that makes it the cloud that is not secure at all and has
             | just been papered over because questioning it means the
             | business model is bust.
        
             | mardifoufs wrote:
             | Most corporations (which is the vast majority of cloud
             | users) absolutely don't care about spooks, sadly enough. If
             | that's the threat model, then it's a very very rare case to
             | care about it. Most datacenters/corporations won't even
             | fight or care about sharing data with local
             | spooks/cops/three letter agencies. The actual threat is
             | data leaks, security breaches, etc.
        
             | nine_k wrote:
             | If you don't want your data to be accessible to "various
             | agencies", don't share it with corporations, full stop.
             | Corporations are obliged by law to make it available to the
             | agencies, and the agencies often overreach, while the
             | corporations almost never mind the overreach. There are
             | limitations for stuff like health or financial data, but
             | these are not impenetrable barriers.
             | 
             | I would just consider all your hosted data to be easily
             | available to any security-related state agency; consider
             | them already having a copy.
        
               | immibis wrote:
               | That depends where it's hosted and how it's encrypted.
               | Cloud hosts can just reach into your RAM, but dedicated
               | server hosts would need to provision that before
               | deploying the server, and colocation providers would need
               | to take your server offline to install it.
        
               | nine_k wrote:
               | Colocated / Dedicated is not Cloud, AFAICT. It's the
               | "traditional hosting", not elastic / auto-scalable. You
               | of course may put your own, highly tamper-proof boxes in
               | a colocation rack, and be reasonably certain that any
               | attempt to exfiltrate data from them won't be invisible
               | to you.
               | 
               | By doing so, you share nothing with your hosting
               | provider, you only rent rack space / power /
               | connectivity.
        
           | noprocrasted wrote:
           | > AWS infrastructure is vastly more secure than almost all
           | company data centers
           | 
           | Secure in what terms? Security is always about a threat model
           | and trade-offs. There's no absolute, objective term of
           | "security".
           | 
           | > AWS has a rule that the same person cannot have logical
           | access and physical access to the same storage device.
           | 
           | Any promises they make aren't worth anything unless there's
           | contractually-stipulated damages that AWS should pay in case
           | of breach, those damages actually corresponding to the costs
           | of said breach for the customer, and a history of actually
           | paying out said damages without shenanigans. They've already
           | got a track record of lying on their status pages, so it
           | doesn't bode well.
           | 
           | But I'm actually wondering what this specific rule even tries
           | to defend against? You presumably care about data protection,
           | so logical access is what matters. Physical access seems
           | completely irrelevant no?
           | 
           | > Very few companies have enough IT people to have this rule
           | 
           | Maybe, but that doesn't actually mitigate anything from the
           | company's perspective? The company itself would still be in
           | the same position, aka not enough people to reliably separate
           | responsibilities. Just that instead of those responsibilities
           | being physical, they now happen inside the AWS console.
           | 
           | > The AWS KMS is vastly more secure than what almost all
           | companies are doing.
           | 
           | See first point about security. Secure against what - what's
           | the threat model you're trying to protect against by using
           | KMS?
           | 
           | But I'm not necessarily denying that (at least some) AWS
           | services are very good. Question is, is that "goodness"
           | required for your use-case, is it enough to overcome its
           | associated downsides, and is the overall cost worth it?
           | 
           | A pragmatic approach would be to evaluate every component on
           | its merits and fitness to the problem at hand instead of
           | going all in, one way or another.
        
             | cyberax wrote:
             | > They've already got a track record of lying on their
             | status pages, so it doesn't bode well.
             | 
             | ???
        
             | nine_k wrote:
             | Physical access is pretty relevant if you could bribe an
             | engineer to locate some valuable data's physical location,
             | then go service the particular machine, copy the disk
             | (during servicing "degraded hardware"), and thus exflitrate
             | the data without any traces of a breach.
        
             | Brian_K_White wrote:
             | Physical access and logical root access can't hide things
             | form each other. It takes both to hide an activity. If you
             | only have one, then the other can always be used to uncover
             | or detect in the first place, or at least diagnose after.
        
           | Aachen wrote:
           | AWS is so complicated, we usually find more impactful
           | permission problems than in any company using their own
           | hardware
        
           | rmbyrro wrote:
           | about security, most businesses using AWS invest little to
           | nothing in securing their software, or even adopt basic
           | security practices for their employees
           | 
           | having the most secure data center doesn't matter if you load
           | your secrets as env vars in a system that can be easily
           | compromised by a motivated attacker
           | 
           | so i don't buy this argument as a general reason pro-cloud
        
             | dajonker wrote:
             | This exactly, most leaks don't involve any physical access.
             | Why bother with something hard when you can just get in
             | through an unmaintained Wordpress/SharePoint/other legacy
             | product that some department can't live without.
        
           | evantbyrne wrote:
           | Making API calls from a VM on shared hardware to KMS is
           | vastly more secure than doing AES locally? I'm skeptical to
           | say the least.
        
             | UltraSane wrote:
             | Encrypting data is easy, securely managing keys is the hard
             | part. KMS is the Key Management Service. And AWS put a lot
             | of thought and work into it.
             | 
             | https://docs.aws.amazon.com/kms/latest/cryptographic-
             | details...
        
               | evantbyrne wrote:
               | KMS access is granted by either environment variables or
               | by authorizing the instance itself. Either way, if the
               | instance is compromised, then so is access to KMS. So
               | unless your threat model involves preventing the
               | government from looking at your data through some
               | theoretical sophisticated physical attack, then your
               | primary concerns are likely the same as running a box in
               | another physically secure location. So the same rules of
               | needing to design your encryption scheme to minimize
               | blowout from a complete hostile takeover still apply.
        
               | Xylakant wrote:
               | An attacker gaining temporary capability to
               | encrypt/decrypt data through a compromised instance is
               | painful. An attacker gaining a copy of a private key is
               | still an entirely different world of pain.
        
               | evantbyrne wrote:
               | Painful is an understatement. Keys for sensitive customer
               | data should be derived from customer secrets either way.
               | Almost nobody does that though, because it requires
               | actual forethought. Instead they just slap secrets in KMS
               | and pretend it's better than encrypted environment
               | variables or other secrets services. If an attacker can
               | read your secrets with the same level of penetration into
               | your system, then it's all the same security wise.
        
               | Xylakant wrote:
               | There are many kinds of secrets that are used for
               | purposes where they cannot be derived from customer
               | secrets, and those still need to be secured. TLS private
               | keys for example.
               | 
               | I do disagree on the second part - there's a world of a
               | difference whether an attacker obtains a copy of your
               | certificates private key and can impersonate you quietly
               | or whether they gain the capability to perform signing
               | operations on your behalf temporarily while they maintain
               | access to a compromised instance.
        
               | AtlasBarfed wrote:
               | It's now been two years since I used KMS, but at the time
               | it seemed little more than S3 API interface with Twitter
               | size limitations
               | 
               | Fundamentally why would KMS be more secure than S3
               | anyway? Both ultimately have the same fundamental
               | security requirements and do the same thing.
               | 
               | So the big whirlydoo is KMS has hardware keygen. im
               | sorry, that sounds like something almost guaranteed to
               | have nsa backdoor, or has so much nsa attention it has
               | been compromised.
        
               | scrose wrote:
               | If your threat model is the NSA and you're worried about
               | backdoors then don't use any cloud provider?
               | 
               | Maybe I'm just jaded from years doing this, but two
               | things have never failed me for bringing me peace of mind
               | in the infrastructure/ops world:
               | 
               | 1. Use whatever your company has already committed to.
               | Compare options and bring up tradeoffs when committing to
               | a cloud-specific service(ie. AWS Lambdas) versus more
               | generic solutions around cost, security and maintenance.
               | 
               | 2. Use whatever feels right to you for anything else.
               | 
               | Preventing the NSA from cracking into your system is a
               | fun thought exercise, but life is too short to make that
               | the focus of all your hosting concerns
        
           | gauravphoenix wrote:
           | one of my greatest learnings in life is to differentiate
           | between facts and opinions- sometimes opinions are presented
           | as facts and vice-versa. if you think about it- the statement
           | "this is false" is a response to an opinion (presented as a
           | fact) but not a fact. there is no way one can objectively
           | define and defend what does "real technical understanding"
           | means. the cloud space is vast with millions of people having
           | varied understanding and thus opinions.
           | 
           | so let's not fight the battle that will never be won. there
           | is no point in convincing pro-cloud people that cloud isn't
           | the right choice and vice-versa. let people share stories
           | where it made sense and where it didn't.
           | 
           | as someone who has lived in cloud security space since 2009
           | (and was founder of redlock - one of the first CSPMs), in my
           | opinion, there is no doubt that AWS is indeed superiorly
           | designed than most corp. networks- but is that you really
           | need? if you run entire corp and LOB apps on aws but have
           | poor security practices, will it be right decision? what if
           | you have the best security engineers in the world but they
           | are best at Cisco type of security - configuring VLANS and
           | managing endpoints but are not good at detecting someone
           | using IMDSv1 in ec2 exposed to the internet and running a
           | vulnerable (to csrf) app?
           | 
           | when the scope of discussion is as vast as cloud vs on-prem,
           | imo, it is a bad idea to make absolute statements.
        
             | fulafel wrote:
             | Great points. Also if you end up building your apps as rube
             | goldberg machines living up to "AWS Well Architected"
             | criteria (indoctrinated by staff lots of AWS
             | certifications, leading to a lot of AWS certified staff
             | whose paycheck now depends on following AWS recommended
             | practices) the complexity will kill your security, as
             | nobody will understand the systems anymore.
        
           | dehrmann wrote:
           | The other part is that when us-east-1 goes down, you can
           | blame AWS, and a third of your customer's vendors will be
           | doing the same. When you unplug the power to your colo rack
           | while installing a new server, that's on you.
        
             | throwawaysxcd0 wrote:
             | OTOH, when your company's web site is down you can do
             | something about it. When the CEO asks about it, you can
             | explain _why_ its offline and more importantly _what is
             | being done_ to bring it back.
             | 
             | The equivalent situation for those who took a cloud based
             | approach is often... -\\_(tsu)_/-
        
               | szundi wrote:
               | Hey boss, I go to sleep now, site should be up anytime.
               | Cheers
        
               | Xylakant wrote:
               | The more relevant question is whether my efforts to do
               | something lead to a better and faster result than my
               | cloud providers efforts to do something. I get it - it
               | feels powerless to do nothing, but for a lot of
               | organizations I've seen the average downtime would still
               | be higher.
        
               | lukevp wrote:
               | With the cloud, in a lot of cases you can have additional
               | regions that incur very little cost as they scale
               | dynamically with traffic. It's hard to do that with on-
               | prem. Also many AWS services come cross-AZ (AZ is a data
               | center), so their arch is more robust than a single Colo
               | server even if you're in a single region.
        
             | brandon272 wrote:
             | It's not always a full availability zone going down that is
             | the problem. Also, despite the "no one ever got fired for
             | buying Microsoft" logic, in practice I've never actually
             | found stakeholders to be reassured by "its AWS and everyone
             | is affected" when things are down. People want things back
             | up and they want some informed answers about when that
             | might happen, not "ehh its AWS, out of our control".
        
           | wslh wrote:
           | From a critical perspective, your comment made me think about
           | the risks posed by rogue IT personnel, especially at scale in
           | the cloud. For example, Fastmail is a single point of failure
           | as a DoS target, whereas attacking an entire datacenter can
           | impact multiple clients simultaneously. It all comes down to
           | understanding the attack vectors.
        
             | UltraSane wrote:
             | Cloud providers are very big targets but have enormous
             | economic incentive to be secure and thus have very large
             | teams of very competent security experts.
        
               | wslh wrote:
               | You can have full security competence but be a rogue
               | actor at the same time.
        
               | portaouflop wrote:
               | You can also have rogue actors in your company, you don't
               | need 3rd parties for that
        
               | wslh wrote:
               | That doesn't sum up my comments in the thread. A rogue
               | actor in a datacenter could attack zillions of companies
               | at the same time while rogue actors in a single company
               | only once.
        
           | likeabatterycar wrote:
           | AWS hires the same cretins that inhabit every other IT
           | department, they just usually happen to be more technically
           | capable. That doesn't make them any more or less trustworthy
           | or reliable.
        
         | sanderjd wrote:
         | > _All the pro-cloud talking points are just that - talking
         | points that don 't persuade anyone with any real technical
         | understanding, but serve to introduce doubt to non-technical
         | people and to trick people who don't examine what they're
         | told._
         | 
         | This feels like "no true scotsman" to me. I've been building
         | software for close to two decades, but I guess I don't have
         | "any real technical understanding" because I think there's a
         | compelling case for using "cloud" services for many (honestly I
         | would say most) businesses.
         | 
         | Nobody is "afraid to openly discuss how cloud isn't right for
         | many things". This is extremely commonly discussed. We're
         | discussing it right now! I truly cannot stand this modern
         | innovation in discourse of yelling "nobody can talk about XYZ
         | thing!" while noisily talking about XYZ thing on the lowest-
         | friction publishing platforms ever devised by humanity. Nobody
         | is afraid to talk about your thing! People just disagree with
         | you about it! That's ok, differing opinions are normal!
         | 
         | Your comment focuses a lot on cost. But that's just not really
         | what this is all about. Everyone knows that on a long enough
         | timescale with a relatively stable business, the total cost of
         | having your own infrastructure is usually lower than cloud
         | hosting.
         | 
         | But cost is simply not the only thing businesses care about.
         | Many businesses, especially new ones, care more about time to
         | market and flexibility. Questions like "how many servers do we
         | need? with what specs? and where should we put them?" are a
         | giant distraction for a startup, or even for a new product
         | inside a mature firm.
         | 
         | Cloud providers provide the service of "don't worry about all
         | that, figure it out after you have customers and know what you
         | actually need".
         | 
         | It is also true that this (purposefully) creates lock-in that
         | is expensive either to leave in place or unwind later, and it
         | definitely behooves every company to keep that in mind when
         | making architecture decisions, but lots of products never make
         | it to that point, and very few of those teams regret the time
         | they didn't spend building up their own infrastructure in order
         | to save money later.
        
         | mmcwilliams wrote:
         | It seems that the preference is less about understanding or
         | misunderstanding the technical requirements but more that it
         | moves a capital expenditure with some recurring operational
         | expenditure entirely into the opex column.
        
         | sanderjd wrote:
         | Also, by the way, I found it interesting that you framed your
         | side of this disagreement as the technically correct one, but
         | then included this:
         | 
         | > _a desire to not centralize the Internet_
         | 
         | This is an ideological stance! I happen to share this desire.
         | But you should be aware of your own non-technical - "emotional"
         | - biases when dismissing the arguments of others on the grounds
         | that they are "emotional" and+l "fanatical".
        
           | johnklos wrote:
           | I never said that my own reasons were neither personal nor
           | emotional. I was just pointing out that my reasons are easy
           | to articulate.
           | 
           | I do think it's more than just emotional, though, but most
           | people, even technical people, haven't taken the time to
           | truly consider the problems that will likely come with
           | centralization. That's a whole separate discussion, though.
        
         | JOnAgain wrote:
         | As someone who ran a startup with 100's of hosts. As soon as I
         | start to count the salaries, hiring, desk space, etc of the
         | people needed to manage the hosts AWS would look cheap again.
         | Yea, hardware costs they are aggressively expensive. But TCO
         | wise, they're cheap for any decent sized company.
         | 
         | Add in compliance, auditing, etc. all things that you can set
         | up out of the box (PCI, HIPPA, lawsuit retention). Gets even
         | cheaper.
        
         | browningstreet wrote:
         | Most companies severely understaff ops, infra, and security.
         | Your talking points might be good but, in practice, won't apply
         | in many cases because of the intractability of that management
         | mindset. Even when they should know better.
         | 
         | I've worked at _tech_ companies with hundreds of developers and
         | single digit ops staff. Those people will struggle to build and
         | maintain mature infra. By going cloud, you get access to mature
         | infra just by including it in build scripts. Devops is an
         | effective way to move infra back to project teams and cut out
         | infra orgs (this isn't great but I see it happen everywhere).
         | Companies will pay cloud bills but not staffing salaries.
        
           | j45 wrote:
           | Using a commercial cloud provider only cements understaffing
           | in, in too many cases.
        
         | lelanthran wrote:
         | > On the other hand, a business of just about any size that has
         | any reasonable amount of hosting is better off with their own
         | systems when it comes purely to cost
         | 
         | From a cost PoV, sure, but when you're taking money out of
         | capex it represents a big hit to the cash flow, while taking
         | out twice that amount from opex has a lower impact on the
         | company finances.
        
         | swiftcoder wrote:
         | > All the pro-cloud talking points... don't persuade anyone
         | with any real technical understanding
         | 
         | This is a very engineer-centric take. The cloud has some big
         | advantages that are entirely non-technical:
         | 
         | - You don't need to pay for hardware upfront. This is critical
         | for many early-stage startups, who have no real ability to
         | predict CapEx until they find product/market fit.
         | 
         | - You have someone else to point the SOC2/HIPAA/etc auditors
         | at. For anyone launching a company in a regulated space, being
         | able to checkbox your entire infrastructure based on
         | AWS/Azure/etc existing certifications is huge.
        
           | shortsunblack wrote:
           | You can over-provision your own baremetal resources 20x and
           | it will be still cheaper than cloud. The capex talking point
           | is just that, a talking point.
        
             | swiftcoder wrote:
             | As an early-stage startup?
             | 
             | Your spend in the first year on AWS is going to be very
             | close to zero for something like a SaaS shop.
             | 
             | Nor can you possibly scale in-house baremetal fast enough
             | if you hit the fabled hockey stick growth. By the time you
             | sign a colocation contract and order hardware, your day in
             | the sun may be over.
        
           | rakoo wrote:
           | > You have someone else to point the SOC2/HIPAA/etc auditors
           | at.
           | 
           | I would assume you still need to point auditors to your
           | software in any case
        
         | bluedino wrote:
         | I want to see an article like this, but written from a Fortune
         | 500 CTO perspective
         | 
         | It seems like they all abandoned their VMware farms or physical
         | server farms for Azure (they love Microsoft).
         | 
         | Are they actually saving money? Are things faster? How's
         | performance? What was the re-training/hiring like?
         | 
         | In one case I know we got rid of our old database greybeards
         | and replaced them with "DevOps" people that knew nothing about
         | performance etc
         | 
         | And the developers (and many of the admins) we had knew nothing
         | about hardware or anything so keeping the physical hardware
         | around probably wouldn't have made sense anyways
        
           | ndriscoll wrote:
           | Complicating this analysis is that computers have still been
           | making exponential improvements in capability as clouds
           | became popular (e.g. disks are 1000-10000x faster than they
           | were 15 years ago), so you'd naturally expect things to
           | become easier to manage over time as you need fewer machines,
           | assuming of course that your developers focus on e.g.
           | learning how to use a database well instead of how to scale
           | to use massive clusters.
           | 
           | That is, even if things became cheaper/faster, they might
           | have been even better without cloud infrastructure.
        
           | jrs235 wrote:
           | >we got rid of our old database greybeards and replaced them
           | with "DevOps" people that knew nothing about performance etc
           | 
           | Seems a lot of those DevOps people just see Azures
           | recommendations for adding indexes and either just allow auto
           | applying them or just adding them without actually reviewing
           | it understanding what use loads require them and why. This
           | also lands a bit on developers/product that don't critically
           | think about and communicate what queries are common and
           | should have some forethought on what indexes should be
           | beneficial and created. (Yes followup monitoring of actual
           | index usage and possible missing indexes is still needed.)
           | Too many times I've seen dozens of indexes on tables in the
           | cloud where one could cover all of them. Yes, there still
           | might be worthwhile reasons to keep some narrower/smaller
           | indexes but again DBA and critical query analysis seems to be
           | a forgotten and neglected skill. No one owns monitoring and
           | analysing db queries and it only comes up after a fire has
           | already broken out.
        
         | awholescammy wrote:
         | There is a whole ecosystem that pushes cloud to ignorant/fresh
         | graduates/developers. Just take a look at the sponsors for all
         | the most popular frameworks. When your system is super complex
         | and depends on the cloud they make more money. Just look at the
         | PHP ecosystem, Laravel needs 4 times the servers to server
         | something that a pure PHP system would need. Most projects
         | don't need the cloud. Only around 10% of projects actually need
         | what the cloud provides. But they were able to brainwash a
         | whole generation of developers/managers to think that they do.
         | And so it goes.
        
           | gjsman-1000 wrote:
           | Having worked with Laravel, this is absolutely bull.
        
         | irunmyownemail wrote:
         | > If I didn't already self-host email, I'd consider using
         | Fastmail.
         | 
         | Same sentiment all of what you said.
        
         | slothtrop wrote:
         | The bottom line > babysitting hardware. Businesses are
         | transitioning to cloud because it's better for business.
        
           | irunmyownemail wrote:
           | Actually, there's been a reversal trend going on, for many
           | companies, better is often on premises or hybrid now.
        
         | twoparachute45 wrote:
         | >What's particularly fascinating to me, though, is how some
         | people are so pro-cloud that they'd argue with a writeup like
         | this with silly cloud talking points. They don't seem to care
         | much about data or facts, just that they love cloud and want
         | everyone else to be in cloud, too.
         | 
         | The irony is absolutely dripping off this comment, wow.
         | 
         | Commenter makes emotionally charge comment with no data or
         | facts and decries anyone who disagrees with them as "silly
         | talking points" for not caring about data and facts.
         | 
         | Your comment is entirely talking about itself.
        
         | dehrmann wrote:
         | The real cost wins of self-hosted are that anything using new
         | hardware becomes an ordeal, and engineers won't use high-cost,
         | value-added services. I agree that there's often too little
         | restraint in cloud architectures, but if a business truly
         | believes in a project, it shouldn't be held up for six months
         | waiting for server budget with engineers spending doing ops
         | work to get three nines of DB reliability.
         | 
         | There is a size where self-hosting makes sense, but it's much
         | larger than you think.
        
         | mark242 wrote:
         | I'm curious about what "reasonable amount of hosting" means to
         | you, because from my experience, as your internal network's
         | complexity goes up, it's far better for your to move systems to
         | a hyperscaler. The current estimate is >90% of Fortune 500
         | companies are cloud-based. What is it that you know that they
         | don't?
        
         | motorest wrote:
         | > All the pro-cloud talking points are just that - talking
         | points that don't persuade anyone with any real technical
         | understanding,(...)
         | 
         | This is where you lose all credibility.
         | 
         | I'm going to focus on a single aspect: performance. If you're
         | serving a global user base and your business, like practically
         | all online businesses, is greatly impacted by performance
         | problems, the only solution to a physics problem is to deploy
         | your application closer to your users.
         | 
         | With any cloud provider that's done with a few clicks and an
         | invoice of a few hundred bucks a month. If you're running your
         | hardware... What solution do you have to show for? Do you hope
         | to create a corporate structure to rent a place to host your
         | hardware manned by a dedicated team? What options f you have?
        
           | stefan_ wrote:
           | Is everyone running online FPS gaming servers now? If you
           | want your page to load faster, tell your shitty frontend
           | engineers to use less of the latest frameworks. You are not
           | limited by physics, 99% aren't.
           | 
           | I ping HN, it's 150ms away, it still renders in the same time
           | that the Google frontpage does and that one has a 130ms
           | advantage.
        
             | pixelesque wrote:
             | Erm, 99%'s clearly wrong and I think you know it, even if
             | you are falling into the typical trap of "only Americans
             | matter"...
             | 
             | As someone in New Zealand, latency does really matter
             | sometimes, and is painfully obvious at times.
             | 
             | HN's ping for me is around: 330 ms.
             | 
             | Anyway, ping doesn't really describe the latency of the
             | full DNS lookup propogation, TCP connection establishment
             | and TLS handshake: full responses for HN are around 900 ms
             | for me till last byte.
        
               | justsomehnguy wrote:
               | > latency does really matter sometimes
               | 
               | Yes, _sometimes_.
               | 
               | You know what matters way more?
               | 
               | If you throw 12MBytes to the client in a multiple
               | connections on multiple domains to display 1KByte of
               | information. Eg: 'new' Reddit.
        
           | johnklos wrote:
           | > This is where you lose all credibility.
           | 
           | People who write that, well...
           | 
           | If you're greatly impacted by performance problems, how does
           | that become a physics problem that has as a solution which is
           | being closer to your users?
           | 
           | I think you're mixing up your sales points. One, how do you
           | scale hardware? Simple: you buy some more, and/or you plan
           | for more from the beginning.
           | 
           | How do you deal with network latency for users on the other
           | side of the planet? Either you plan for and design for long
           | tail networking, and/or you colocate in multiple places,
           | and/or you host in multiple places. Being aware of cloud
           | costs, problems and limitations doesn't mean you can't or
           | shouldn't use cloud at all - it just means to do it where it
           | makes sense.
           | 
           | You're making my point for me - you've got emotional
           | generalizations ("you lose all credibility"), you're using
           | examples that people use often but that don't even go
           | together, plus you seem to forget that hardly anyone
           | advocates for all one or all the other, without some kind of
           | sensible mix. Thank you for making a good example of exactly
           | what I'm talking about.
        
           | noprocrasted wrote:
           | The complexity of scaling out an application to be closer to
           | the users has never been about getting the hardware closer.
           | It's always about how do you get the data there and dealing
           | with the CAP theorem, which requires hard tradeoffs to be
           | decided on when designing the application and can't be just
           | tacked on - there is no magic button to do this, in the AWS
           | console or otherwise.
           | 
           | Getting the _hardware_ closer to the users has always been
           | trivial - call up any of the many hosting providers out there
           | and get a dedicated server, or a colo and ship them some
           | hardware (directly from the vendor if needed).
        
           | jread wrote:
           | If have a global user base, depending on your workload, a
           | simple CDN in front of your hardware can often go a long ways
           | with minimal cost and complexity.
        
             | motorest wrote:
             | > If have a global user base, depending on your workload, a
             | simple CDN in front of your hardware can often go a long
             | ways with minimal cost and complexity.
             | 
             | Let's squint hard enough to pretend a CDN does not qualify
             | as "the cloud". That alone requires a lot of goodwill.
             | 
             | A CDN distributes read-only content. Any usecase that
             | requires interacting with a service is automatically
             | excluded.
             | 
             | So, no.
        
               | jread wrote:
               | > Any usecase that requires interacting with a service is
               | automatically excluded
               | 
               | This isn't correct. Many applications consist of a mix of
               | static and dynamic content. Even dynamic content is often
               | cacheable for a time. All of this can be served by a CDN
               | (using TTLs) which is a much simpler and more cost
               | effective solution than multi-region cloud infra, with
               | the same performance benefits.
        
         | kevin_thibedeau wrote:
         | Capital expenditures are kryptonite to financial engineers. The
         | cloud selling point was to trade those costs for operational
         | expenses and profit in phase 3.
        
         | jandrewrogers wrote:
         | This trivializes some real issues.
         | 
         | The biggest problem the cloud solves is hardware supply chain
         | management. To realize the full benefits of doing your own
         | build at any kind of non-trivial scale you will need to become
         | an expert in designing, sourcing, and assembling your hardware.
         | Getting hardware delivered when and where you need it is not
         | entirely trivial -- components are delayed, bigger customers
         | are given priority allocation, etc. The technical parts are
         | relatively straightforward; managing hardware vendors,
         | logistics, and delivery dates on an ongoing basis is a giant
         | time suck. When you use the cloud, you are outsourcing this
         | part of the work.
         | 
         | If you do this well and correctly then yes, you will reduce
         | costs several-fold. But most people that build their own data
         | infrastructure do a half-ass job of it because they
         | (understandably) don't want to be bothered with any of these
         | details and much of the nominal cost savings evaporate.
         | 
         | Very few companies do security as well as the major cloud
         | vendors. This isn't even arguable.
         | 
         | On the other hand, you will need roughly the same number of
         | people for operations support whether it is private data
         | infrastructure or the cloud, there is little or no savings to
         | be had here. The fixed operations people overhead scales to
         | such a huge number of servers that it is inconsequential as a
         | practical matter.
         | 
         | It also depends on your workload. The types of workloads that
         | benefit most from private data infrastructure are large-scale
         | data-intensive workloads. If your day-to-day is sling tens or
         | hundreds of PB of data for analytics, the economics of private
         | data infrastructure is extremely compelling.
        
         | swozey wrote:
         | I have about 30 years as a linux eng, starting with openbsd and
         | have spent a LOT of time with hardware building webhosts and
         | CDNs until about 2020 where my last few roles have been 100%
         | aws/gcloud/heroku.
         | 
         | I love building the cool edge network stuff with expensive
         | bleeding edge hardware, smartnics, nvmeOF, etc but its
         | infinitely more complicated and stressful than terraforming an
         | AWS infra. Every cluster I set up I had to interact with
         | multiple teams like networking, security, storage sometimes
         | maintenance/electrical, etc. You've got some random tech you
         | have to rely on across the country in one of your POPs with a
         | blown server. Every single hardware infra person has had a NOC
         | tech kick/unplug a server at least once if they've been in long
         | enough.
         | 
         | And then when I get the hardware sometimes you have different
         | people doing different parts of setup, like NOC does the boot,
         | maybe boostraps the hardware with something that works over ssh
         | before an agent is installed (ansible, etc), then your linux
         | eng invokes their magic with a ton of bash or perl, then your
         | k8s person sets up the k8s clusters with usually something like
         | terraform/puppet/chef/salt probably calling helm charts. Then
         | your monitoring person gets it into OTEL/grafana, etc. This all
         | organically becomes more automated as time goes on, but I've
         | seen it from a brand new infra where you've got no automation
         | many times.
         | 
         | Now you're automating 90% of this via scripts and IAC, etc, but
         | you're still doing a lot of tedious work.
         | 
         | You also have a much more difficult time hiring good engineers.
         | The markets gone so heavily AWS (I'm no help) that its rare
         | that I come across an ops resume that's ever touched hardware,
         | especially not at the CDN distributed systems level.
         | 
         | So.. aws is the chill infra that stays online and you can
         | basically rely on 99.99something%. Get some terraform
         | blueprints going and your own developers can self serve. Don't
         | need hardware or ops involved.
         | 
         | And none of this is even getting into supporting the clusters.
         | Failing clusters. Dealing with maintenance, zero downtime
         | kernel upgrades, rollbacks, yaddayadda.
        
         | jhwhite wrote:
         | > It makes me wonder: how do people get so sold on a thing that
         | they'll go online and fight about it, even when they lack facts
         | or often even basic understanding?
         | 
         | I feel like this can be applied to anything.
         | 
         | I had a manager take one SAFe for Leaders class then came back
         | wanting to implement it. They had no previous AGILE classes or
         | experience. And the Enterprise Agile Office was saying DON'T
         | USE SAFe!!
         | 
         | But they had one class and that was the only way they would
         | agree to structure their group.
        
         | cookiengineer wrote:
         | My take on this whole cloud fatigue is that system maintenance
         | got overly complex over the last couple years/decades. So much
         | that management people now think that it's too expensive in
         | terms of hiring people that can do it compared to the higher
         | managed hosting costs.
         | 
         | DevOps and kubernetes come to mind. A lot of people using
         | kubernetes don't know what they're getting into, and k0s or
         | another single machine solution would have been enough for 99%
         | of SMEs.
         | 
         | In terms of cyber security (my field) everything got so
         | ridiculously complex that even the folks that use 3 different
         | dashboards in parallel will guess the answers as to whether or
         | not they're affected by a bug/RCE/security flaw/weakness
         | because all of the data sources (even the expensively paid for
         | ones) are human-edited text databases. They're so buggy that
         | they even have Chinese idiom symbols instead of a dot character
         | in the version fields without anyone ever fixing it upstream in
         | the NVD/CVE process.
         | 
         | I started to build my EDR agent for POSIX systems specifically,
         | because I hope that at some point this can help companies to
         | ditch the cloud and allows them to selfhost again - which in
         | return would indirectly prevent 13 year old kids like from
         | LAPSUS to pwn major infrastructure via simple tech support
         | hotline calls.
         | 
         | When I think of it in terms of hosting, the vertical
         | scalability of EPYC machines is so high that most of the time
         | when you need its resources you are either doing something
         | completely wrong and you should refactor your code or you are a
         | video streaming service.
        
         | cyberax wrote:
         | > The whole push to the cloud has always fascinated me. I get
         | it - most people aren't interested in babysitting their own
         | hardware.
         | 
         | For businesses, it's a very typical lease-or-own decision.
         | There's really nothing too special about cloud.
         | 
         | > On the other hand, a business of just about any size that has
         | any reasonable amount of hosting is better off with their own
         | systems when it comes purely to cost.
         | 
         | Nope. Not if you factor-in 24/7 support, geographic redundancy,
         | and uptime guarantees. With EC2 you can break even at about
         | $2-5m a year of cloud spending if you want your own hardware.
        
         | hnthrowaway6543 wrote:
         | > a desire to not centralize the Internet
         | 
         | > If I didn't already self-host email
         | 
         | this really says all that needs to be said about your
         | perspective. you have an engineer and OSS advocate's mindset.
         | which is fine, but most business leaders (including technical
         | leaders like CTOs) have a business mindset, and their goal is
         | to build a business that makes money, not avoid contributing to
         | the centralization of the internet
        
         | ants_everywhere wrote:
         | ...but your post reads like you _do_ have an emotional reaction
         | to this question and you 're ready to believe someone who
         | shares your views.
         | 
         | There's not nearly enough in here to make a judgment about
         | things like security or privacy. They have the bare minimum
         | encryption enabled. That's better than nothing. But how is key
         | access handled? Can they recover your email if the entire
         | cluster goes down? If so, then someone has access to the
         | encryption keys. If not, then how do they meet reliability
         | guarantees?
         | 
         | Three letter agencies and cyber spies like to own switches and
         | firewalls with zero days. What hardware are they using, and how
         | do they mitigate against backdoors? If you really cared about
         | this you would have to roll your own networking hardware down
         | to the chips. Some companies do this, but you need to have a
         | whole lot of servers to make it economical.
         | 
         | It's really about trade-offs. I think the big trade-offs
         | favoring staying off cloud are cost (in some applications),
         | distrust of the cloud providers,and avoiding the US Government.
         | 
         | The last two are arguably judgment calls that have some
         | inherent emotional content. The first is calculable in
         | principle, but people may not be using the same metrics. For
         | example if you don't care that much about security breaches or
         | you don't have to provide top tier reliability, then you can
         | save a ton of money. But if you do have to provide those
         | guarantees, it would be hard to beat Cloud prices.
        
         | fnord77 wrote:
         | capex vs opex
        
         | ttul wrote:
         | My firm belief after building a service at scale (tens of
         | millions of end users, > 100K tps) is that AWS is unbeatable.
         | We don't even think about building our own infrastructure.
         | There's no way we could ever make it reliable enough, secure
         | enough, and future-proof enough to ever pay back the cost
         | difference.
         | 
         | Something people neglect to mention when they tout their home
         | grown cloud is that AWS spends significant cycles constantly
         | eliminating technical debt that would absolutely destroy most
         | companies - even ones with billion dollar services of their
         | own. The things you rely on are constantly evolving and
         | changing. It's hard enough to keep up at the high level of a
         | SaaS built on top of someone else's bulletproof cloud. But
         | imagine also having to keep up with the low level stuff like
         | networking and storage tech?
         | 
         | No thanks.
        
         | RainyDayTmrw wrote:
         | I hear this debate repeated often, and I think there's another
         | important factor. It took me some time to figure out how to
         | explain it, and the best I came up with was this: It is
         | extremely difficult to bootstrap from zero to baseline
         | competence, in general, and especially in an existing
         | organization.
         | 
         | In particular, there is a limit to paying for competence, and
         | paying more money doesn't automatically get you more
         | competence, which is especially perilous if your organization
         | lacks the competence to judge competence. In the limit case,
         | this gets you the Big N consultancies like PWC or EY. It's
         | entirely reasonable to hire PWC or EY to run your accounting or
         | compliance. Hiring PWC or EY to run your software development
         | lifecycle is almost guaranteed doom, and there is no shortage
         | of stories on this site to support that.
         | 
         | In comparison, if you're one of these organizations, who don't
         | yet have baseline competence in technology, then what the
         | public cloud is selling is nothing short of magical: You pay
         | money, and, in return, you receive a baseline set of tools,
         | which all do more or less what they say they will do. If no
         | amount of money would let you bootstrap this competence
         | internally, you'd be much more willing to pay a premium for it.
         | 
         | As an anecdote, my much younger self worked in mid-sized tech
         | team in a large household brand in a legacy industry. We were
         | building out a web product that, for product reasons, had
         | surprisingly high uptime and scalability requirements, relative
         | to legacy industry standards. We leaned heavily on public cloud
         | and CDNs. We used a lot of S3 and SQS, which allowed us to
         | build systems with strong reliability characteristics, despite
         | none of us having that background at the time.
        
         | ksec wrote:
         | Even as an Anti-Cloud ( Or more accurately Anti-everything
         | Cloud ) person I still think there are many benefits to cloud.
         | Just most of the them are over sold and people dont need it.
         | 
         | Number one is company bureaucracy and politics. No one wants to
         | beg another person or department, go on endless meetings just
         | to have extra hardware provisioned. For engineers that alone is
         | worth perhaps 99% of all current cloud margins.
         | 
         | Number two is also company bureaucracy and politics. CFOs dont
         | like CapX. Turning it into OpeX makes things easier for them.
         | Along with end of year company budget turning into Cloud
         | credits for different departments. Especially for companies
         | with government fundings.
         | 
         | Number three is really company bureaucracy and politics.
         | Dealing with either Google, AWS and Microsoft meant you no
         | longer have to deal with dozens of different vendors from on
         | server, networking hardware, software licenses etc. Instead it
         | is all pre-approved into AWS, GCP or Azure. This is especially
         | useful for things that involves Government contracts or
         | fundings.
         | 
         | There are also things like instant worldwide deployment. You
         | can have things up and running in any regions within seconds.
         | And useful when you have site that gets 10 to 1000x the normal
         | traffic from time to time.
         | 
         | But then a lot of small business dont have these sort of
         | issues. Especially non-consumer facing services. Business or
         | SaaS are highly unlikely to get 10x more customers within short
         | period of time.
         | 
         | I continue to wish there is a middle ground somewhere. You rent
         | dedicated server for cheap as base load and use cloud for
         | everything else.
        
       | tiffanyh wrote:
       | FYI - Fastmail web client has Offline support in beta right now.
       | 
       | https://www.fastmail.com/blog/offline-in-beta/
        
         | ForHackernews wrote:
         | Very confused by this. What is in beta? I've had "offline"
         | email access for 25 years. It's called an IMAP client.
        
         | mdaniel wrote:
         | And if anyone is curious, I actually live on their
         | https://betaapp.fastmail.com release and find it just as stable
         | as the "mainline" one but with the advantage of getting to play
         | with all the cool toys earlier. Bonus points (for me) in that
         | they will periodically conduct surveys to see how you like
         | things
        
       | DarkCrusader2 wrote:
       | I have seen a common sentiment that self hosting is almost always
       | better than cloud. What these discussions does not mention is how
       | to effectively run your business applications on this
       | infrastructure.
       | 
       | Things like identity management (AAD/IAM), provisioning and
       | running VMs, deployments. Network side of things like VNet, DNS,
       | securely opening ports etc. Monitoring setup across the stack.
       | There is so much functionalities that will be required to safely
       | expose an application externally that I can't even coherently
       | list them out here. Are people just using Saas for everything
       | (which I think will defeat the purpose of on-prem infra) or a
       | competent Sys admin can handle all this to give a cloud like
       | experience for end developers?
       | 
       | Can someone share their experience or share any write ups on this
       | topic?
       | 
       | For more context, I worked at a very large hedge fund briefly
       | which had a small DC worth of VERY beefy machines but absolutely
       | no platform on top of it. Hosting application was done by copying
       | the binaries on a particular well known machine and running npm
       | commands and restarting nginx. Log a ticket with sys admin to
       | create a DNS entry to point a reserve and point a internal DNS to
       | this machine (no load balancer). Deployment was a shell script
       | which rcp new binaries and restarts nginx. No monitoring or
       | observability stack. There was a script which will log you into a
       | random machine for you to run your workloads (be ready to get
       | angry IMs from more senior quants running their workload in that
       | random machine if your development build takes up enough
       | resources to effect their work). I can go on and on but I think
       | you get the idea.
        
         | noprocrasted wrote:
         | > identity management (AAD/IAM)
         | 
         | Do you mean for administrative access to the machines (over
         | SSH, etc) or for "normal" access to the hosted applications?
         | 
         | Admin access: Ansible-managed set of UNIX users & associated
         | SSH public keys, combined with remote logging so every access
         | is audited and a malicious operator wiping the machine can't
         | cover their tracks will generally get you pretty far. Beyond
         | that, there are commercial solutions like Teleport which
         | provide integration with an IdP, management web UI, session
         | logging & replay, etc.
         | 
         | Normal line-of-business access: this would be managed by
         | whatever application you're running, not much different to the
         | cloud. But if your application isn't auth-aware or is unsafe to
         | expose to the wider internet, you can stick it behind various
         | auth proxies such as Pomerium - it will effectively handle auth
         | against an IdP and only pass through traffic to the underlying
         | app once the user is authenticated. This is also useful for
         | isolating potentially vulnerable apps.
         | 
         | > provisioning and running VMs
         | 
         | Provisioning: once a VM (or even a physical server) is up and
         | running enough to be SSH'd into, you should have a
         | configuration management tool (Ansible, etc) apply whatever
         | configuration you want. This would generally involve
         | provisioning users, disabling some stupid defaults (SSH
         | password authentication, etc), installing required packages,
         | etc.
         | 
         | To get a VM to an SSH'able state in the first place, you can
         | configure your hypervisor to pass through "user data" which
         | will be picked up by something like cloud-init (integrated by
         | most distros) and interpreted at first boot - this allows you
         | to do things like include an initial SSH key, create a user,
         | etc.
         | 
         | To run VMs on self-managed hardware: libvirt, proxmox in the
         | Linux world. bhyve in the BSD world. Unfortunately most of
         | these have rough edges, so commercial solutions there are worth
         | exploring. Alternatively, consider if you actually _need_ VMs
         | or if things like containers (which have much nicer tooling and
         | a better performance profile) would fit your use-case.
         | 
         | > deployments
         | 
         | Depends on your application. But let's assume it can fit in a
         | container - there's nothing wrong with a systemd service that
         | just reads a container image reference in /etc/... and uses
         | `docker run` to run it. Your deployment task can just SSH into
         | the server, update that reference in /etc/ and bounce the
         | service. Evaluate Kamal which is a slightly fancier version of
         | the above. Need more? Explore cluster managers like Hashicorp
         | Nomad or even Kubernetes.
         | 
         | > Network side of things like VNet
         | 
         | Wireguard tunnels set up (by your config management tool)
         | between your machines, which will appear as standard network
         | interfaces with their own (typically non-publicly-routable) IP
         | addresses, and anything sent over them will transparently be
         | encrypted.
         | 
         | > DNS
         | 
         | Generally very little reason not to outsource that to a cloud
         | provider or even your (reputable!) domain registrar. DNS is
         | mostly static data though, which also means if you do need to
         | do it in-house for whatever reason, it's just a matter of
         | getting a CoreDNS/etc container running on multiple machines
         | (maybe even distributed across the world). But really, there's
         | no reason not to outsource that and hosted offerings are super
         | cheap - so go open an AWS account and configure Route53.
         | 
         | > securely opening ports
         | 
         | To begin with, you shouldn't have anything listening that you
         | don't want to be accessible. Then it's not a matter of
         | "opening" or closing ports - the only ports that actually
         | listen are the ones you _want_ open by definition because it 's
         | your application listening for outside traffic. But you can
         | configure iptables/nftables as a second layer of defense, in
         | case you accidentally start something that unexpectedly exposes
         | some control socket you're not aware of.
         | 
         | > Monitoring setup across the stack
         | 
         | collectd running on each machine (deployed by your
         | configuration management tool) sending metrics to a central
         | machine. That machine runs Grafana/etc. You can also explore
         | "modern" stuff that the cool kids play with nowadays like
         | VictoriaMetrics, etc, but metrics is mostly a solved problem so
         | there's nothing wrong with using old tools if they work and fit
         | your needs.
         | 
         | For logs, configure rsyslogd to log to a central machine - on
         | that one, you can have log rotation. Or look into an ELK stack.
         | Or use a hosted service - again nothing prevents you from
         | picking the best of cloud _and_ bare-metal, it 's not one or
         | the other.
         | 
         | > safely expose an application externally
         | 
         | There's a lot of snake oil and fear-mongering around this.
         | First off, you need to differentiate between vulnerabilities of
         | your application and vulnerabilities of the underlying
         | infrastructure/host system/etc.
         | 
         | App vulnerabilities, in your code or dependencies: cloud won't
         | save you. It runs your application just like it's been told. If
         | your app has an SQL injection vuln or one of your dependencies
         | has an RCE, you're screwed either way. To manage this you'd do
         | the same as you do in cloud - code reviews, pentesting,
         | monitoring & keeping dependencies up to date, etc.
         | 
         | Infrastructure-level vulnerabilities: cloud providers are
         | responsible for keeping the host OS and their provided services
         | (load balancers, etc) up to date and secure. You can do the
         | same. Some distros provide unattended updates (which your
         | config management tool) can enable. Stuff that doesn't need to
         | be reachable from the internet shouldn't be (bind internal
         | stuff to your Wireguard interfaces). Put admin stuff behind
         | some strong auth - TLS client certificates are the gold
         | standard but have management overheads. Otherwise, use an IdP-
         | aware proxy (like mentioned above). Don't always trust app-
         | level auth. Beyond that, it's the usual - common sense,
         | monitoring for "spooky action at a distance", and luck. Not too
         | much different from your cloud provider, because they won't
         | compensate you either if they do get hacked.
         | 
         | > For more context, I worked at a very large hedge fund briefly
         | which had a small DC worth of VERY beefy machines but
         | absolutely no platform on top of it...
         | 
         | Nomad or Kubernetes.
        
           | rtfusgihkuj wrote:
           | No, using Ansible to distribute public keys does _not_ get
           | you very far. It 's fine for a personal project or even a
           | team of 5-6 with a handful, but beyond that you really need a
           | better way to onboard, offboard, and modify accounts. If
           | you're doing anything but a toy project, you're better off
           | starting off with something like IPA for host access
           | controls.
        
             | noprocrasted wrote:
             | What's the risk you're trying to protect against, that a
             | "better" (which one?) way would mitigate that this one
             | wouldn't?
             | 
             | > IPA
             | 
             | Do you mean https://en.wikipedia.org/wiki/FreeIPA ? That
             | seems like a huge amalgamation of complexity in a non-
             | memory-safe language that I feel like would introduce a
             | much bigger security liability than the problem it's trying
             | to solve.
             | 
             | I'd rather pony up the money and use Teleport at that
             | point.
        
               | dpe82 wrote:
               | It's basically Kerberos and an LDAP server, which are
               | technologies old and reliable as dirt.
               | 
               | This sort of FUD is why people needlessly spend so much
               | money on cloud.
        
               | noprocrasted wrote:
               | > which are technologies old and reliable as dirt.
               | 
               | Technologies, sure. Implementations? Not so much.
               | 
               | I can trust OpenSSH because it's deployed everywhere and
               | I can be confident all the low-hanging fruits are gone by
               | now, and if not, its widespreadness means I'm unlikely to
               | be the most interesting target, so I am more likely to
               | escape a potential zero-day unscathed.
               | 
               | What't the marketshare of IPA in comparison? Has it seen
               | any meaningful action in the last decade years, and the
               | same attention, from both white-hats (audits, pentesting,
               | etc) as well as black-hats (trying to break into every
               | exposed service)? I very much doubt it, so the safe thing
               | to assume is that it's nowhere as bulletproof as OpenSSH
               | and that it's more likely for a dedicated attacker to
               | find a vuln there.
        
               | dpe82 wrote:
               | MIT's Kerberos 5 implementation is 30 years old and has
               | been _very_ widely deployed.
        
             | xorcist wrote:
             | Why do think that? I did something similar at a previous
             | work for something bordering on 1k employees.
             | 
             | User administration was done by modifying a yaml file in
             | git. Nothing bad to say about it really. It sure beats
             | point-and-click Active Directory any day of the week.
             | Commit log handy for audits.
             | 
             | If there are no externalities demanding anything else, I'd
             | happily do it again.
        
               | kasey_junk wrote:
               | There is nothing _wrong_ with it, and so long as you can
               | prove that your offboarding is consistent and quick then
               | feel free to use it.
               | 
               | But a central system that uses the same identity/auth
               | everywhere is much easier to keep consistent and fast.
               | That's why auditors and security professionals will harp
               | on idp/sso solutions as some of the first things to
               | invest in.
        
               | xorcist wrote:
               | I found that the commit log made auditing on- and
               | offboarding easier, not harder. Of course it won't help
               | you if your process is dysfunctional. You still have to
               | trigger the process somehow, which can be a problem in
               | itself when growing from a startup, but once you do that
               | it's smooth.
               | 
               | However git _is_ a central system, a database if you
               | will, where you can keep identities globally consistent.
               | That 's the whole point. In my experience, the reason
               | people leave it is because you grow the need to
               | interoperate with third party stuff which only supports
               | AD or Okta or something. Should I get to grow past that
               | phase myself I would feed my chosen IdM with that data
               | instead.
        
       | briHass wrote:
       | The biggest win with running your own infra is disk/IO speeds, as
       | noted here and in DHH's series on leaving cloud
       | (https://world.hey.com/dhh/we-have-left-the-cloud-251760fb)
       | 
       | The cloud providers really kill you on IO for your VMs. Even if
       | 'remote' SSDs are available with configurable ($$) IOPs/bandwidth
       | limits, the size of your VM usually dictates a pitiful max IO/BW
       | limit. In Azure, something like a 4-core 16GB RAM VM will be
       | limited to 150MB/s across all attached disks. For most hosting
       | tasks, you're going to hit that limit far before you max out '4
       | cores' of a modern CPU or 16GB of RAM.
       | 
       | On the other hand, if you buy a server from Dell and run your own
       | hypervisor, you get a massive reserve of IO, especially with
       | modern SSDs. Sure, you have to share it between your VMs, but you
       | own all of the IO of the hardware, not some pathetic slice of it
       | like in the cloud.
       | 
       | As is always said in these discussions, unless you're able to
       | move your workload to PaaS offerings in the cloud (serverless),
       | you're not taking advantage of what large public clouds are good
       | at.
        
         | noprocrasted wrote:
         | Biggest issue isn't even sequential speed but latency. In the
         | cloud all persistent storage is networked and has significantly
         | more latency than direct-attached disks. This is a physical
         | (speed of light) limit, you can't pay your way out of it, or
         | throw more CPU at it. This has a huge impact for certain
         | workloads like relational databases.
        
           | sgarland wrote:
           | Yep. This is why my 12-year old Dell R620s with Ceph on NVMe
           | via Infiniband outperform the newest RDS and Aurora
           | instances: the disk latency is measured in microseconds.
           | Locally attached is of course even faster.
        
           | briHass wrote:
           | I ran into this directly trying to use Azure's SMB as a
           | service offering (Azure Files) for a file-based DB. It
           | currently runs on a network share on-prem, but moving it to
           | an Azure VM using that service killed performance. SMB is
           | chatty as it is, and the latency of tons of small file IO was
           | horrendous.
           | 
           | Interestingly, creating a file share VM deployed in the same
           | proximity group has acceptable latency.
        
       | Axsuul wrote:
       | Anyone know what are some good data centers or providers to host
       | your bare metal servers?
        
         | klysm wrote:
         | You're probably looking for the term "colo"
        
       | nisa wrote:
       | Love this article and I'm also running some stuff on old
       | enterprise servers in some racks somehwere. Now over the last
       | year I've had to dive into Azure Cloud as we have customers using
       | this (b2b company) and I finally understood why everyone is doing
       | cloud despite the price:
       | 
       | Global permissions, seamless organization and IaC. If you are
       | Fastmail or a small startup - go buy some used dell poweredge
       | with epycs in some Colo rack with 10Gbe transit and save tons of
       | money.
       | 
       | If you are a company with tons of customers, ton's of
       | requirements it's powerful to put each concern into a landing
       | zone, run some bicep/terraform - have a ressource group to
       | control costs and get savings on overall core-count and be done
       | with it.
       | 
       | Assign permissions into a namespace for your employe or customer
       | - have some back and forth about requirements and it's done. No
       | need to sysadmin across servers. No need to check for broken
       | disks.
       | 
       | I'm also blaming the hell of vmware and virtual machines for
       | everything that is a PITA to maintain as a sysadmin but is loved
       | because it's common knowledge. I would only do k8s on bare-metal
       | today and skip the whole virtualization thing completly. I guess
       | it's also these pains that are softened in the cloud.
        
       | akpa1 wrote:
       | The fact that Fastmail work like this, are transparent about what
       | they're up to and how they're storing my email and the fact that
       | they're making logical decisions and have been doing so for quite
       | a long time is exactly the reason I practically trip over myself
       | to pay them for my email. Big fan of Fastmail.
        
         | xyst wrote:
         | They are also active in contributing to cyrus-imap
        
       | pammf wrote:
       | Cost isn't always the most important metric. If that was the
       | case, people would always buy the cheapest option of everything.
        
       | veidr wrote:
       | "WHY we use our own hardware..."
       | 
       | The why is is the interesting part of this article.
        
         | veidr wrote:
         | I take that back; _this_ is (to me)t he most interesting part:
         | 
         | "Although we've only ever used datacenter class SSDs and HDDs
         | failures and replacements every few weeks were a regular
         | occurrence on the old fleet of servers. Over the last 3+ years,
         | we've only seen a couple of SSD failures in total across the
         | entire upgraded fleet of servers. This is easily less than one
         | tenth the failure rate we used to have with HDDs."
        
       | indulona wrote:
       | I am working on a personal project(some would call it startup,
       | but i have no intention of getting external financing and other
       | americanisms) where i have set up my own cdn and video encoding,
       | among other things. These days, whenever you have a problem,
       | everyone answers "just use cloud" and that results in people
       | really knowing nothing any more. It is saddening. But on the
       | other hand it ensures all my decades of knowledge will be very
       | well paid in the future, if i'd need to get a job.
        
       | Beijinger wrote:
       | I was told Fastmail is excellent, and I am not a big fan of
       | gmail. Once locked out for good in gmail, your email and apps
       | associated with it, are gone forever. Source? Personal
       | experience.
       | 
       | "A private inbox $60 for 12 months". I assume it is USD, not AU$
       | (AFAIK, Fastmail is based in Australia.) Still pricey.
       | 
       | At https://www.infomaniak.com/ I can buy email service for an (in
       | my case external) domain for 18 Euro a year and I get 5 inboxes.
       | And it is based in Switzerland, so no EU or US jurisdiction.
       | 
       | I have a few websites and fastmail would just be prohibitive
       | expensive for me.
        
         | qingcharles wrote:
         | You can have as many domains as you want for free in your
         | Fastmail account. There are no extra fees.
         | 
         | I've used them for 20 years now. Highly recommended.
        
           | steve_adams_86 wrote:
           | Wait, really? I pay for two separate domains. What am I
           | missing?
           | 
           | I'm happy to pay them because I love the service (and it's
           | convenient for taxes), but I feel like I should know how to
           | configure multiple domains under one account.
        
             | xerp2914 wrote:
             | Under Settings => Domains you can add additional domains.
             | If you use Fastmail as domain registrar you have to pay for
             | each additional domain, of course.
        
         | mariusor wrote:
         | My suggestion would be to try Purelymail. They don't offer much
         | in the way of a web interface to email, but if you bring your
         | own client, it's a very good provider.
         | 
         | I'm paying something like $10 per year for multiple domains
         | with multiple email addresses (though with little traffic).
         | I've been using them for about 5 years and I had absolutely no
         | issues.
        
         | aquariusDue wrote:
         | Personally I prefer Migadu and tend to recommend them to tech
         | savvy people. Their admin panel is excellent and
         | straightforward to use, prices are based on usage limits
         | (amount of emails sent/received) instead of number of
         | mailboxes.
         | 
         | Migadu is just all around good, only downsides I can find are
         | subjective. The fact that they're based in Switzerland and
         | unless you're "good with computers" something like Fastmail
         | will probably be better.
        
           | Amfy wrote:
           | Seems Migadu is hosted on OVH though? Huge red flag.. no
           | control over infrastructure (think of Hetzner shutting down
           | customers with little to no warning)
        
       | throw0101b wrote:
       | > _So after the success of our initial testing, we decided to go
       | all in on ZFS for all our large data storage needs. We've now
       | been using ZFS for all our email servers for over 3 years and
       | have been very happy with it. We've also moved over all our
       | database, log and backup servers to using ZFS on NVMe SSDs as
       | well with equally good results._
       | 
       | If you're looking at ZFS on NVMe you may want to look at Alan
       | Jude's talk on the topic, "Scaling ZFS for the future", from the
       | 2024 OpenZFS User and Developer Summit:
       | 
       | * https://www.youtube.com/watch?v=wA6hL4opG4I
       | 
       | * https://openzfs.org/wiki/OpenZFS_Developer_Summit_2024
       | 
       | There are some bottlenecks that get in the way of getting all the
       | performance that the hardware often is capable of.
        
       | rmbyrro wrote:
       | if you don't have high bandwidth requirements, like for
       | background / batch processing, the ovh eco family [1] of bare
       | metal servers is incredibly cheap
       | 
       | [1] https://eco.ovhcloud.com/en/
        
       | xiande04 wrote:
       | Aside: Fastmail was the best email provider I ever used. The
       | interface was intuitive and responsive, both on mobile and web.
       | They have extensive documentation for everything. I was able to
       | set up a custom domain and and a catch-all email address in a few
       | minutes. Customer support is great, too. I emailed them about an
       | issue and they responded within the hour (turns out it was my
       | fault). I feel like it's a really mature product/company and they
       | really know what they're doing, and have a plan for where they're
       | going.
       | 
       | I ended up switching to Protonmail, because of privacy (Fastmail
       | is within the Five Eyes (Australia)), which is the only thing I
       | really like about Protonmail. But I'm considering switching back
       | to Fastmail, because I liked it so much.
        
         | gausswho wrote:
         | I also chose Proton for the same reason. It hurts that their
         | product development is glacial but that's a crucial component
         | that I don't understand why Fastmail doesn't try to offer.
        
         | kevin_thibedeau wrote:
         | Their Android client has been less than stellar in the past but
         | recent releases are significantly improved. Uploading files, in
         | particular, was a crapshoot.
        
       | dorongrinstein wrote:
       | We at Control Plane (https://cpln.com) make it easy to repatriate
       | from the cloud, yet leverage the union of all the services
       | provided by AWS, GCP and Azure. Many of our customers moved from
       | cloud A to cloud B, and often to their own colocation cage, and
       | in one case their own home cluster. Check out
       | https://repatriate.cloud
        
       | 0xbadcafebee wrote:
       | I've been doing this job for almost as long as they have. I work
       | with companies that do on-prem, and I work with companies in the
       | cloud, and both. Here's the low down:
       | 
       | 1. The cost of the server is not the cost of on-prem. There are
       | so many different _kinds_ of costs that aren 't just monetary.
       | ("we have to do more ourselves, including _planning, choosing,
       | buying, installing, etc,_ ") Those are tasks that require
       | expertise (which 99% of "engineers" do not possess at more than a
       | junior level), and time, and staff, and correct execution. They
       | are much more expensive than you will ever imagine. Doing any of
       | them wrong will causes issues that will eventually cost you
       | business (customers fleeing, avoiding). That's much worse than a
       | line-item cost.
       | 
       | 2. You have to develop relationships for good on-prem. In order
       | to get good service in your rack (assuming you don't hire your
       | own cage monkey), in order to get good repair people for your
       | hardware service accounts, in order to ensure when you order a
       | server that it'll actually arrive, in order to ensure the DC
       | won't fuck up the power or cooling or network, etc. This is not
       | something you can just read reviews on. You have to actually
       | physically and over time develop these relationships, or you will
       | suffer.
       | 
       | 3. What kind of load you have and how you maintain your gear is
       | what makes a difference between being able to use one server for
       | 10 years, and needing to buy 1 server every year. For some use
       | cases it makes sense, for some it really doesn't.
       | 
       | 4. Look at all the complex details mentioned in this article.
       | These people go _deep_ , building loads of technical expertise at
       | the OS level, hardware level, and DC level. It takes a long time
       | to build that expertise, and you usually cannot just hire for it,
       | because it's generally hard to find. This company is very unique
       | (hell, their stack is based on Perl). Your company won't be that
       | unique, and you won't have their expertise.
       | 
       | 5. If you hire someone who actually knows the cloud really well,
       | and they build out your cloud env based on published well-
       | architected standards, you gain not only the benefits of rock-
       | solid hardware management, but benefits in security, reliability,
       | software updates, automation, and tons of unique features like
       | added replication, consistency, availability. You get a lot more
       | for your money than just "managed hardware", things that you
       | literally could never do yourself without 100 million dollars and
       | five years, but you only pay a few bucks for it. The _value_ in
       | the cloud is insane.
       | 
       | 6. Everyone does cloud costs wrong the first time. If you hire
       | somebody who does have cloud expertise (who hopefully did the
       | well-architected buildout above), they can save you 75% off your
       | bill, by default, with nothing more complex than checking a box
       | and paying some money up front (the same way you would for your
       | on-prem server fleet). Or they can use spot instances, or
       | serverless. If you choose software developers who care about
       | efficiency, they too can help you save money by not needing to
       | over-allocate resources, and right-sizing existing ones.
       | (Remember: you'd be doing this cost and resource optimization
       | already with on-prem to make sure you don't waste those servers
       | you bought, and that you know how many to buy and when)
       | 
       | 7. The major takeaway at the end of the article is _" when you
       | have the experience and the knowledge"_. If you don't, then
       | attempting on-prem can end calamitously. I have seen it several
       | times. In fact, just one week ago, a business I work for had
       | _three days of downtime_ , due to hardware failing, and not being
       | able to recover it, their backup hardware failing, and there
       | being no way to get new gear in quickly. Another business I
       | worked for literally hired and fired four separate teams to build
       | an on-prem OpenStack cluster, and it was the most unstable,
       | terrible computing platform I've used, that constantly caused
       | service outages for a large-scale distributed system.
       | 
       | If you're not 100% positive you have the expertise, just don't do
       | it.
        
       | herf wrote:
       | zfs encryption is still corrupting datasets when using zfs
       | send/receive for backup (huge win for mail datasets), would be
       | cautious about using it in production:
       | 
       | https://github.com/openzfs/zfs/issues/12014
        
         | klysm wrote:
         | I'll never use ZFS in production after I was on a team that
         | used it at petabyte scale. It's too complex and tries to solve
         | problems that should be solved at higher layers.
        
       | TheFlyingFish wrote:
       | Lots of people here mentioning reasons to both use and avoid the
       | cloud. I'll just chip in one more on the pro-cloud side:
       | reliability at low scale.
       | 
       | To expand: At $dayjob we use AWS, and we have no plans to switch
       | because we're _tiny_ , like ~5000 DAU last I checked. Our AWS
       | bill is <$600/mo. To get anything remotely resembling the
       | reliability that AWS gives us we would need to spend tens of
       | thousands up-front buying hardware, then something approximating
       | our current AWS bill for colocation services. Or we could host
       | fully on-prem, but then we're paying even more up-front for site-
       | level stuff like backup generators and network multihoming.
       | 
       | Meanwhile, RDS (for example) has given us something like one
       | unexplained 15-minute outage in the last six years.
       | 
       | Obviously every situation is unique, and what works for one won't
       | work for another. We have no expectation of ever having to
       | suddenly 10x our scale, for instance, because we our growth is
       | limited by other factors. But at our scale, given our business
       | realities, I'm convinced that the cloud is the best option.
        
         | jjeaff wrote:
         | This is a common false dichotomy I see constantly. Cloud vs,
         | buy and build your own hardware from scratch and colocate/build
         | own datacenter.
         | 
         | Very few non-cloud users are buying their own hardware. You can
         | simply rent dedicated hardware in a datacenter. For
         | significantly cheaper than anything in the cloud. That being
         | said, certain things like object storage, if you don't need
         | very large amounts of data, are very handy and inexpensive from
         | cloud services considering the redundancy and uptime they
         | offer.
        
         | ttul wrote:
         | This works even at $1M/mo AWS spend. As you scale, the
         | discounts get better. You get into the range of special pricing
         | where they will make it work against your P&L. If you're
         | venture funded, they have a special arm that can do backflips
         | for you.
         | 
         | I should note that Microsoft also does this.
        
       | kayson wrote:
       | Any ideas how they manage the ZFS encryption key? I've always
       | wondered what you'd do in an enterprise production setting.
       | Typing the password in at a prompt as any seem scalable (but
       | maybe they have few enough servers that it's manageable) and
       | keeping it in a file on disk or on removable storage would seem
       | to defeat the purpose...
        
       | ttul wrote:
       | I think mailbox hosting is a special use case. The primary cost
       | is storage and bandwidth and you can indeed do better on storage
       | and bandwidth than what Amazon offers. That being said, if
       | Fastmail asked Amazon for special pricing to make the move, they
       | would get it.
        
       | jph00 wrote:
       | The original answer to "why does FastMail use their own hardware"
       | is that when I started the company in 1999 there weren't many
       | options. I actually originally used a single bare metal server at
       | Rackspace, which at that time was a small scrappy startup. IIRC
       | it cost $70/month. There weren't really practical VPS or SaaS
       | alternatives back then for what I needed.
       | 
       | Rob (the author of the linked article) joined a few months later,
       | and when we got too big for our Rackspace server, we looked at
       | the cost of buying something and doing colo instead. The biggest
       | challenge was trying to convince a vendor to let me use my
       | Australian credit card but ship the server to a US address (we
       | decided to use NYI for colo, based in NY). It turned out that IBM
       | were able to do that, so they got our business. Both IBM and NYI
       | were great for handling remote hands and hardware issues, which
       | obviously we couldn't do from Australia.
       | 
       | A little bit later Bron joined us, and he automated absolutely
       | everything, so that we were able to just have NYI plug in a new
       | machine and it would set itself up from scratch. This all just
       | used regular Linux capabilities and simple open source tools,
       | plus of course a whole lot of Perl.
       | 
       | As the fortunes of AWS et al rose and rose and rose, I kept
       | looking at their pricing at features and kept wondering what I
       | was missing. They seemed orders of magnitude more expensive for
       | something that was more complex to manage and would have locked
       | us into a specific vendor's tooling. But everyone seemed to be
       | flocking to them.
       | 
       | To this day I still use bare metal servers for pretty much
       | everything, and still love having the ability to use simple
       | universally-applicable tools like plain Linux, Bash, Perl,
       | Python, and SSH, to handle everything cheaply and reliably.
       | 
       | I've been doing some planning over the last couple of years on
       | teaching a course on how to do all this, although I was worried
       | that folks are too locked in to SaaS stuff -- but perhaps things
       | are changing and there might be interest in that after all?...
        
         | basilgohar wrote:
         | Please do this course. It's still needed and a lot of people
         | would benefit from it. It's just that the loudest voices are
         | all in on Cloud that it seems otherwise.
        
         | ksec wrote:
         | >But everyone seemed to be flocking to them.
         | 
         | To the point we have young Devs today that dont know what VPS
         | and Colo ( Colocation) meant.
         | 
         | Back to the article, I am surprised it was only a "A few years
         | ago" Fastmail adopted SSD. Which certainly seems late in the
         | cycle for the benefits of what SSD offers.
         | 
         | Price for Colo on the order of $3000/2U/year. That is $125
         | /U/month.
        
           | flemhans wrote:
           | HDDs are still the best option for many workloads, including
           | email.
        
           | matt-p wrote:
           | Colo is typically sold on power not space, from your example
           | you're either getting ripped off if it's for low power
           | servers or massively undercharged for a 4xa100 machine
        
           | justsomehnguy wrote:
           | > Which certainly seems late in the cycle for the benefits of
           | what SSD offers.
           | 
           | 90% of emails are never read, 9% are read once. What SSD
           | could offer for this use case except at least 2x cost ?
        
             | bluGill wrote:
             | Don't forget that fastmail is through an internet transport
             | with enough latency to make hdd seek times noise
        
           | brongondwana wrote:
           | We adopted SSD for the current week's email and rust for the
           | deeper storage many years ago. A few years ago we switched to
           | everything on NVMe, so there's no longer two tiers of
           | storage. That's when the pricing switched to make it
           | worthwhile.
        
         | milesvp wrote:
         | As someone who lived through that era, I can tell you there are
         | legions of devs and dev adjacent people who have no idea what
         | it's like to automate mission critical hardware. Everyone had
         | to do it in the early 2000s. But it's been long enough that
         | there are people in the workforce who just have no idea about
         | running your own hardware since they never had to. I suspect
         | there is a lot of interest, especially since we're likely
         | approaching the bring it back in house cycle, as CTOs try to
         | reign in their cloud spend.
        
         | packtreefly wrote:
         | > although I was worried that folks are too locked in to SaaS
         | stuff
         | 
         | For some people the cloud is straight magic, but for many of
         | us, it just represents work we don't have to do. Let "the
         | cloud" manage the hardware and you can deliver a SaaS product
         | with all the nines you could ask for...
         | 
         | > teaching a course on how to do all this ... there might be
         | interest in that after all?
         | 
         | Idk about a course, but I'd be interested in a blog post or
         | something that addresses the pain points that I conveniently
         | outsource to AWS. We have to maintain SOC 2 compliance, and
         | there's a good chunk of stuff in those compliance requirements
         | around physical security and datacenter hygiene that I get to
         | just point at AWS for.
         | 
         | I've run physical servers for production resources in the past,
         | but they weren't exactly locked up in Fort Knox.
         | 
         | I would find some in-depth details on these aspects
         | interesting, but from a less-clinical viewpoint than the ones
         | presented in the cloud vendors' SOC reports.
        
           | dijit wrote:
           | I've never visited a datacenter that wasn't SOC2 compliant.
           | Bahnhof, SAVVIS, Telecity, Equinox etc.
           | 
           | Of course, their SOC 2 compliance doesn't mean we are
           | absolved of securing our databases and services.
           | 
           | Theres a big gap between throwing some compute in a closet
           | and having someone "run the closet" for you.
           | 
           | There is, a significantly larger gap between having someone
           | "run the closet" and building your own datacenter from
           | scratch.
        
         | benterix wrote:
         | > As the fortunes of AWS et al rose and rose and rose, I kept
         | looking at their pricing at features and kept wondering what I
         | was missing.
         | 
         | You are not the only one. There are several factors at play but
         | I believe one of the strongest today is the generational
         | divide: the people lost the ability to manage their own infra
         | or don't know it well enough to do it well so it's true when
         | they say "It's too much hassle". I say this as an AWS guy who
         | occasionally works on on-prem infra.[0]
         | 
         | [0] As a side note, I don't believe the lack of skills is the
         | main reason organizations have problem - skills can be learned,
         | but if you mess up the initial architecture design, fixing that
         | can easily take years.
        
         | riezebos wrote:
         | As a customer of Fastmail and a fan of your work at FastAI and
         | FastHTML I feel a bit stupid now for not knowing you started
         | Fastmail.
         | 
         | Now I'm wondering how much you'd look like tiangolo if you wore
         | a moustache.
        
           | brongondwana wrote:
           | Jeremy is all the Fast things!
        
         | llm_trw wrote:
         | >As the fortunes of AWS et al rose and rose and rose, I kept
         | looking at their pricing at features and kept wondering what I
         | was missing. They seemed orders of magnitude more expensive for
         | something that was more complex to manage and would have locked
         | us into a specific vendor's tooling. But everyone seemed to be
         | flocking to them.
         | 
         | In 2006 when the first aws instances showed up it would take
         | you two years of on demand bills to match the cost of buying
         | the hardware from a retail store and using it continuously.
         | 
         | Today it's between 2 weeks for ML workloads to three months for
         | the mid sized instances.
         | 
         | AWS made sense in big Corp when it would take you six months to
         | get approval for buying the hardware and another six for the
         | software. Today I'd only use it to do a prototype that I move
         | on prem the second it looks like it will make it past one
         | quarter.
        
           | bluGill wrote:
           | Aws is useful if you have uneven loads. why pay for the
           | number of servers you need for christmas the rest of the
           | year? But if your load is more even it doesn't make as much
           | sense.
        
         | 0xbadcafebee wrote:
         | You know how to set up a rock-solid remote hands console to all
         | your servers, I take it? Dial-up modem to a serial console
         | server, serial cables to all the servers (or IPMI on a
         | segregated network and management ports). Then you deal with
         | varying hardware implementations, OSes, setting that up in all
         | your racks in all your colos.
         | 
         | Compare that to AWS, where there are 6 different kinds of
         | remote hands, that work on all hardware and OSes, with no need
         | for expertise, no time taken. No planning, no purchases, no
         | shipment time, no waiting for remote hands to set it up, no
         | diagnosing failures, etc, etc, etc...
         | 
         | That's just _one thing_. There 's a _thousand_ more things,
         | just for a plain old VM. And the cloud provides way more than
         | VMs.
         | 
         | The number of failures you can have on-prem is insane. Hardware
         | can fail for all kinds of reasons (you must know this), and you
         | have to have hot backup/spares, because otherwise you'll find
         | out your spares don't work. Getting new gear in can take weeks
         | (it "shouldn't" take that long, but there's little things like
         | pandemics and global shortages on chips and disks that you
         | can't predict). Power and cooling can go out. There's so many
         | things that can (and eventually will) go wrong.
         | 
         | Why expose your business to that much risk, and have to build
         | that much expertise? To save a few bucks on a server?
        
           | switch007 wrote:
           | This. All of this and more. I've got friends who worked for a
           | hosting providers who over the years have echoed this
           | comment. It's endless.
        
           | jread wrote:
           | > Hardware can fail for all kinds of reasons
           | 
           | Complex cloud infra can also fail for all kinds of reasons,
           | and they are often harder to troubleshoot than a hardware
           | failure. My experience with server grade hardware in a
           | reliable colo with a good uplink is it's generally an
           | extremely reliable combination.
        
           | likeabatterycar wrote:
           | > The number of failures you can have on-prem is insane.
           | Hardware can fail for all kinds of reasons (you must know
           | this)
           | 
           | Cloud vendors are not immune from hardware failure. What do
           | you think their underlying infrastructure runs on, some
           | magical contraption made from Lego bricks, Swiss chocolate,
           | and positive vibes?
           | 
           | It's the same hardware, prone to the same failures. You've
           | just outsourced worrying about it.
        
         | jasode wrote:
         | _> As the fortunes of AWS et al rose and rose and rose, I kept
         | looking at their pricing at features and kept wondering what I
         | was missing. They seemed orders of magnitude more expensive
         | [...] To this day I still use bare metal servers for pretty
         | much everything, [...] plain Linux, Bash, Perl, Python, and
         | SSH, to handle everything cheaply _
         | 
         | Your FastMail use case of (relatively) predictable server
         | workload and product roadmap combined with agile Linux admins
         | who are motivated to use close-to-bare-metal tools isn't an
         | optimal cost fit for AWS. You're not missing anything and
         | FastMail would have been overpaying for cloud.
         | 
         | Where AWS/GCP/Azure shine is organizations that need _higher-
         | level PaaS_ like managed DynamoDB, RedShift, SQS, etc that run
         | on top of bare metal. Most _non-tech_ companies with internal
         | IT departments cannot create /operate "internal cloud services"
         | that's on par with AWS.[1] Some companies like Facebook and
         | Walmart can run internal IT departments with advanced
         | capabilities like AWS but most non-tech companies can't. This
         | means paying AWS' fat profit margins _can actually be cheaper_
         | than paying internal IT salaries to  "reinvent AWS badly" by
         | installing MySQL, Kafka, etc on bare metal Linux. E.g. Netflix
         | had their own datacenters in 2008 but a 3-day database outage
         | that stopped them from shipping DVDs was one of the reasons
         | they quit running their datacenters and migrated to AWS.[2]
         | Their complex workload isn't a good fit for bare-metal Linux
         | and bash scripts; Netflix uses a ton of high-level PaaS managed
         | services from AWS.
         | 
         | If bare metal is the layer of abstraction the IT & dev
         | departments are comfortable working at, then self-host on-
         | premise, or co-lo, or Hetzner are all cheaper than AWS.
         | 
         | [1]
         | https://web.archive.org/web/20160319022029/https://www.compu...
         | 
         | [2] https://media.netflix.com/en/company-blog/completing-the-
         | net...
        
         | e12e wrote:
         | I used to help manage a couple of racks worth of on premise hw
         | in early to mid 2000.
         | 
         | We had some old Compaq (?) servers, most of the newer stuff was
         | Dell. Mix of windows and Linux servers.
         | 
         | Even with the Dell boxes, things wasn't really standard across
         | different server generations, and every upgrade was bespoke,
         | except in cases when we bought multiple boxes for
         | redundancy/scaling of a particular service.
         | 
         | What I'd like to see is something like oxide computer servers
         | that scales way _down_ at least down to quarter rack. Like some
         | kind of Supermicro meets backlblaze storage pod - but riffing
         | on Joyent 's idea of colocating storage and compute. A sort of
         | composable mainframe for small businesses in the 2020s.
         | 
         | I guess maybe that is part of what Triton is all about.
         | 
         | But anyway - somewhere to start, and grow into the future with
         | sensible redundancies and open source bios/firmware/etc.
         | 
         | Not typical situation for today, where you buy two (for
         | redundancy) "big enough" boxes - and then need to reinvent your
         | setup/deployment when you need two bigger boxes in three years.
        
       | lukevp wrote:
       | To me, Cloud is all about the shift left of DevOps. It's not a
       | cost play. I'm a Dev Lead / Manager and have worked in both types
       | of environments over the last 10 years. It's immeasurable the
       | velocity difference as far as system provisioning between the two
       | approaches. In the hardware space, it took months to years to
       | provision new machines or upgrade OSes. In the cloud, it's a new
       | terraform script and a CI deploy away. Need more storage? It's
       | just there, available all the time. Need to add a new firewall
       | between machines or redo the network topology? Free. Need a warm
       | standby in 4 different regions that costs almost nothing but can
       | scale to full production capacity within a couple of minutes?
       | Done. Those types of things are difficult to do with physical
       | hardware. And if you have an engineering culture where the
       | operational work and the development work are at odds (think the
       | old style of Dev / QA / Networking / Servers / Security all being
       | separate teams), processes and handoffs eat your lunch and it
       | becomes crippling to your ability to innovate. Cloud and DevOps
       | are to me about reducing the differentiation between these roles
       | so that a single engineer can do any part of the stack, which
       | cuts out the communication overhead and the handoff time and the
       | processes significantly.
       | 
       | If you have predictable workloads, a competent engineering
       | culture that fights against process culture, and are willing to
       | spend the money to have good hardware and the people to man it
       | 24x7x365 then I don't think cloud makes sense at all. Seems like
       | that's what y'all have and you should keep up with it.
        
         | Jenk wrote:
         | Exactly this. It is culture and organisation (structure)
         | dependent. I'm in the throes of the same discussion with my
         | leader ship team, some of whom have built themselves an
         | ops/qa/etc. empire and want to keep their moat.
         | 
         | Are you running a well understood and predictable (as in,
         | little change, growth, nor feature additions) system? Are your
         | developers handing over to central platform/infra/ops teams?
         | You'll probably save some cash by buying and owning the
         | hardware you need for your use case(s). Elasticity is
         | (probably) not part of your vocabulary, perhaps outside of "I
         | wish we had it" anyway.
         | 
         | Have you got teams and/or products that are scaling rapidly or
         | unpredictably? Have you still got a lot of learning and
         | experimenting to do with how your stack will work? Do you need
         | flexibility but can't wait for that flexibility? Then cloud is
         | for you.
         | 
         | n.b. I don't think I've ever felt more validated by a
         | post/comment than yours.
        
         | comprev wrote:
         | Our CI pipelines can spin up some seriously meaty hardware, run
         | some very resource intensive tests, and destroy the
         | infrastructure when finished.
         | 
         | Bonus points: they can do it with spot pricing to further lower
         | the bill.
         | 
         | The cloud offers immense flexibility and empowers _developers_
         | to easily manage their own infrastructure without depending on
         | other teams.
         | 
         | Speed of development is the primary reason $DayJob is moving
         | into the cloud, while maintaining bare-metal for platforms
         | which rarely change.
        
         | drdaeman wrote:
         | > In the hardware space, it took months to years to provision
         | new machines or upgrade OSes.
         | 
         | If it takes this long to manage a machine, I strongly suspect
         | it means that when initially designing the system engineers had
         | failed to account for those for some reason. Was that true in
         | your case?
         | 
         | Back in late '00s until mid '10s, I worked for an ISP startup
         | as a SWE. We had a few core machines (database, RADIUS server,
         | self-service website, etc) - ugly mess TBH - initially
         | provisioned and originally managed entirely by hand as we
         | didn't knew any better back then. Naturally, maintaining those
         | was a major PITA, so they sat on the same dated distro for
         | years. That was before Ansible was a thing, and we haven't
         | really heard about Salt or Chef before we started to feel the
         | pains and started to search for solutions. Virtualization
         | (OpenVZ, then Docker) helped to soften a lot of issues, making
         | it significantly easier to maintain the components, but the
         | pains from our original sins were felt for a long time.
         | 
         | But we also had a fleet of other machines, where we understood
         | our issues with the servers enough to design new nodes to be as
         | stateless as possible, with automatic rollout scripts for
         | whatever we were able to automate. Provisioning a new host took
         | only a few hours, with most time spent unpacking, driving,
         | accessing the server room, and physically connecting things.
         | Upgrades were pretty easy too - reroute customers to another
         | failover node, write a new system image to the old one, reboot,
         | test, re-route traffic back, done.
         | 
         | So it's not like self-owned bare metal is harder to manage -
         | the lesson I learned is that one just gotta think ahead of time
         | what the future would require. Same as the clouds, I guess, one
         | has to follow best practices or they'll end up with crappy
         | architectures that will be painful to rework. Just different
         | set of practices, because of the different nature of the
         | systems.
        
         | eddsolves wrote:
         | My first job in tech was building servers for companies when
         | they needed more compute, physically building them from our
         | warehouse of components, driving them to their site, and
         | setting it up in their network.
         | 
         | You could get same day builds deployed on prem with the right
         | support bundle!
        
       | nprateem wrote:
       | Yeah and some people reckon web frameworks are bad too. Sometimes
       | it might make sense to host your on your own hardware but almost
       | certainly not for startups.
        
       | lakomen wrote:
       | You also terminate accounts at your sole discretion
        
       | awinter-py wrote:
       | everyone is 'cattle not pets' except the farm vet who is
       | shoulder-deep in a cow
       | 
       | (my experience with managed kubernetes)
        
       | EdJiang wrote:
       | I was a bit confused by the section on backups. How do they
       | manage moving the data offsite with the on-premises backup
       | servers? Wouldn't that be a cost savings by going cloud?
        
       | kwakubiney wrote:
       | If I remember correctly, StackOverflow does something similar.
       | The then Director of Engineering speaks about it on here[1]
       | 
       | [1]https://hanselminutes.com/847/engineering-stack-overflow-
       | wit...
        
         | e12e wrote:
         | They also have a SaaS product that lives in the cloud:
         | 
         | https://stackoverflow.blog/2023/08/30/journey-to-the-cloud-p...
        
       ___________________________________________________________________
       (page generated 2024-12-22 23:00 UTC)