[HN Gopher] Why we use our own hardware
___________________________________________________________________
Why we use our own hardware
Author : nmjenkins
Score : 716 points
Date : 2024-12-22 08:36 UTC (14 hours ago)
(HTM) web link (www.fastmail.com)
(TXT) w3m dump (www.fastmail.com)
| oldpersonintx wrote:
| longtime FM user here
|
| good on them, understanding infrastructure and cost/benefit is
| essential in any business you hope to run for the long haul
| bartvk wrote:
| Such an awesome article. I like how they didn't just go with the
| Cloud wave but kept sysadmin'ing, like ol' Unix graybeards. Two
| interesting things they wrote about their SSDs:
|
| 1) "At this rate, we'll replace these [SSD] drives due to
| increased drive sizes, or entirely new physical drive formats
| (such E3.S which appears to finally be gaining traction) long
| before they get close to their rated write capacity."
|
| and
|
| 2) "We've also anecdotally found SSDs just to be much more
| reliable compared to HDDs (..) easily less than one tenth the
| failure rate we used to have with HDDs."
| tgv wrote:
| To avoid sysadmin tasks, and keep costs down, you've got to go
| so deep in the cloud, that it becomes just another arcane skill
| set. I run most of my stuff on virtual Linux servers, but some
| on AWS, and that's hard to learn, and doesn't transfer to GCP
| or Azure. Unless your needs are extreme, I think sysadmin'ing
| is the easier route in most cases.
| baxtr wrote:
| I predict a slow but unstoppable comeback of the sysadmin job
| over the next 5-10 years.
| homebrewer wrote:
| It never disappeared in some places. In my region there's
| been zero interest in "the cloud" because of physical
| remoteness from all major GCP/AWS/Azure datacenters
| (resulting in high latency), for compliance reasons, and
| because it's easier and faster to solve problems by dealing
| with a local company than pleading with a global giant that
| gives zero shits about you because you're less than a
| rounding error in its books.
| wongarsu wrote:
| For so many things the cloud isn't really easier or cheaper,
| and most cloud providers stopped advertising it as such. My
| assumption is that cloud adoption is mainly driven by 3
| forces:
|
| - for small companies: free credits
|
| - for large companies: moving prices as far away as possible
| from the deploy button, allowing dev and it to just deploy
| stuff without purchase orders
|
| - self-perpetuating due to hype, cv-driven development, and
| ease of hiring
|
| All of these are decent reasons, but none of them may apply
| to a company like fastmail
| graemep wrote:
| Also CYA. If you run your own servers and something goes
| wrong its your fault. if its an outage at AWS its their
| fault.
|
| Also a huge element of follow the crowd, branding non-
| technical management are familiar with, and so on. I have
| also found some developers (front end devs, or back end
| devs who do not have sysadmin skills) feel cloud is the
| safe choice. This is very common for small companies as
| they may have limited sysadmin skills (people who know how
| to keep windows desktops running are not likely to be who
| you want to deploy servers) and a web GUI _looks_ a lot
| easier to learn.
| dietr1ch wrote:
| > If its an outage at AWS its their fault.
|
| Well, still your fault, but easy to judo the risk into
| clients saying supporting multi-cloud is expensive and
| not a priority.
| graemep wrote:
| Management in many places will not even know what multi-
| cloud is (or even multi-region).
|
| As Cloudstrike showed, if you follow the crowd and tick
| the right boxes you will not be blamed.
| bobnamob wrote:
| nit: Crowdstrike
|
| Unless the incident is now being referred to as
| "Cloudstrike", in which case, eww
| dietr1ch wrote:
| Yeah, he meant Crowdstrike. Cloudstrike is the name of a
| future security incident affecting multiple cloud
| provides. I can't disclose more details.
| ghaff wrote:
| There are other, if often at least tangentially related,
| reasons but more than I can give justice to in a comment.
|
| Many people largely got a lot of things wrong about cloud
| that I've been meaning to write about for a while. I'll get
| to it after the holidays. But probably none more than the
| idea that massive centralized computing (which was wrongly
| characterized as a utility like the electric grid) would
| have economics with which more local computing options
| could never compete.
| Winsaucerer wrote:
| I'm very interested in approaches that avoid cloud, so
| please don't read this as me saying cloud is superior. I
| can think of some other advantages of cloud:
|
| - easy to setup different permissions for users
| (authorisation considerations).
|
| - able to transfer assets to another owner (e.g., if
| there's a sale of a business) without needing to move
| physical hardware.
|
| - other outsiders (consultants, auditors, whatever) can
| come in and verify the security (or other) of your setup,
| because it's using a standard well known cloud platform.
| wongarsu wrote:
| Those are valid reasons, but not always as straight
| forward:
|
| > easy to setup different permissions for users
| (authorisation considerations)
|
| Centralized permission management is an advantage of the
| cloud. At the same time it's easy to do wrong. Without
| the cloud you usually have more piecemeal solutions
| depending on segmenting network access and using the
| permission systems of each service
|
| > able to transfer assets to another owner (e.g., if
| there's a sale of a business) without needing to move
| physical hardware
|
| The obvious solution here is to not own your hardware but
| to rent dedicated servers. Removes some of the
| maintenance burden, and the servers can be moved between
| entities as you like. The cloud does give you more
| granularity though
|
| > other outsiders (consultants, auditors, whatever) can
| come in and verify the security (or other) of your setup,
| because it's using a standard well known cloud platform
|
| There is a huge cottage industry of software trying to
| scan for security issues in your cloud setups. On the one
| hand that's an advantage of a unified interface, on the
| other hand a lot of those issues wouldn't occur outside
| the cloud. In any case, verifying security isn't easy in
| or out of the cloud. But if you have an auditor that is
| used to cloud deployments it will be easier to satisfy
| them there, that's certainly true
| oftenwrong wrote:
| In small companies, cloud also provides the ability to work
| around technical debt and to reduce risk.
|
| For example, I have seen several cases where poorly
| designed systems that unexpectedly used too much memory,
| and there was no time to fix it, so the company increased
| the memory on all instances with a few clicks. When you
| need to do this immediately to avoid a botched release that
| has already been called "successful" and announced as such
| to stakeholders, that is a capability that saves the day.
|
| An example of de-risking is using a cloud filesystem like
| EFS to provide a pseudo-infinite volume. No risk of an
| outage due to an unexpectedly full disk.
|
| Another example would be using a managed database system
| like RDS vs self-managing the same RDBMS: using the managed
| version saves on labor and reduces risk for things like
| upgrades. What would ordinarily be a significant effort for
| a small company becomes automatic, and RDS includes various
| sanity checks to help prevent you from making mistakes.
|
| The reality of the industry is that many companies are just
| trying to hit the next milestone of their business by a
| deadline, and the cloud can help despite the downsides.
| sgarland wrote:
| > For example, I have seen several cases where poorly
| designed systems that unexpectedly used too much memory
|
| > using a managed database system like RDS vs self-
| managing the same RDBMS: using the managed version saves
| on labor
|
| As a DBRE / SRE, I can confidently assert that belief in
| the latter is often directly responsible for the former.
| AWS is quite clear in their shared responsibility model
| [0] that you are still responsible for making sound
| decisions, tuning various configurations, etc. Having
| staff that knows how to do these things often prevents
| the poor decisions from being made in the first place.
|
| [0]: https://aws.amazon.com/compliance/shared-
| responsibility-mode...
| graemep wrote:
| Not a DB admin, but I do install and manage DBs for small
| clients.
|
| My experience is that AWS makes the easy things easy and
| the difficult things difficult, and the knowledge is not
| transferable.
|
| With a CLI or non-cloud management tools I can create,
| admin and upgrade a database (or anything else) exactly
| the same way, locally, on a local VM, and on a cloud VM
| from any provider (including AWS). Doing it with a
| managed database means learning how the provider does it
| - which takes longer and I personally find it more
| difficult (and stressful).
|
| What I cannot do as well as a real DB admin could do is
| things like tuning. Its not really an issue for small
| clients (a few generic changes to scale settings to
| available resources is enough - and cheaper than paying
| someone to tune it). Come to think of it, I do not even
| know how to make those changes on AWS and just hope the
| defaults match the size of RDS you are paying for (and
| change when you scale up?).
|
| having written the above I am now doubting whether I have
| done the right thing in the past.
| nine_k wrote:
| A cloud is really easy to get started with.
|
| Free tiers, startup credits, easily available managed
| databases, queues, object storage, lambdas, load-balancing,
| DNS, TLS, specialist stuff like OCR. It's easy to prototype
| something, run for free or for peanuts, start getting some
| revenue.
|
| Then, as you grow, the costs become steeper, but migrating
| off of the cloud looks even more expensive, especially if
| you have accumulated a lot of data (egress costs you,
| especially from AWS). Congrats, you have become the
| desirable, typical cloud customer.
| graemep wrote:
| > it becomes just another arcane skill set
|
| Its an arcane skill set with a GUI. It makes it _look_ much
| easier to learn.
| edward28 wrote:
| The power of Moore's law.
| jeffbee wrote:
| I don't see how point 2 could have come as a surprise to
| anyone.
| kwillets wrote:
| SSD's are also a bit of an achilles heel for AWS -- they have
| their own Nitro firmware for wear levelling and key rotations,
| due to the hazards of multitenant. It's possible for one EC2
| tenant to use up all the write cycles and then pass it to
| another, and encryption with key rotation is required to keep
| data from leaking across tenant changes. It's also slower.
|
| We had one outage where key rotation had been enabled on
| reboot, so data partitions were lost after what should have
| been a routine crash. Overall, for data warehousing, our
| failure rate on on-prem (DC-hosted) hardware was lower IME.
| louwrentius wrote:
| I like this writeup, informative and to-the-point.
|
| Today, the cloud isn't about other people's hardware.
|
| It's about infrastructure being an API call away. Not just
| virtual machines but also databases, load-balancers, storage, and
| so on.
|
| The cost isn't the DC or the hardware, but the hours spend on
| operations.
|
| And you can abuse developers to do operations on the side :-)
| zelphirkalt wrote:
| And then come the weird aspects of bad cloud service providers,
| like IONOS, who have broken OS images, a provisioning API, that
| is a bottleneck, where what other people do and how much they
| do can slow down your own provisioning and creating network
| interfaces can take minutes via their API and their customer
| services says "That's how it is, cannot change it.", and you
| get a very shitty web user interface, that desperately tries to
| be a single page app, yet has all the default browser
| functionality like the back button broken. Yet they still cost
| literally 10x what Hetzner cloud costs, while Hetzner basically
| does everything better.
|
| And then it is still also about other people's hardware in
| addition to that.
| goldeneye13_ wrote:
| Didn't see this in the article, do they have multi az redundancy?
| I.e. if the entire raid goes up in flames what's the recovery
| process?
| comboy wrote:
| Yeah, that makes me feel uneasy as a long time fastmail user.
| cyrnel wrote:
| Looks like they do mention that elsewhere:
| https://www.fastmail.com/features/reliability/
|
| > Fastmail has some of the best uptime in the business, plus a
| comprehensive multi data center backup system. It starts with
| real-time replication to geographically dispersed data centers,
| with additional daily backups and checksummed copies of
| everything. Redundant mirrors allow us to failover a server or
| even entire rack in the case of hardware failure, keeping your
| mail running.
| Amfy wrote:
| I believe they replicate from NJ to WA (Seattle). At least
| that's something they spoke about many years ago.
| sufehmi wrote:
| https://www.fastmail.com/blog/throwback-security-confidentia...
| jmakov wrote:
| Would be interesting to know how files get stored. They don't
| mention any distributed FS solutions like SeaweedFS so once a
| drive is full, does the file get sent to another one via some
| service? Also ZFS seems an odd choice since deletions (esp of
| small files) at +80% full drive are crazy slow.
| shrubble wrote:
| The open-source Cyrus IMAP server which they mention using, has
| replication built-in. ZFS also has built-in replication
| available.
|
| Deletion of files depends on how they have configured the
| message store - they may be storing a lot of data into a
| database, for example.
| mastax wrote:
| ZFS replication is quite unreliable when used with ZFS native
| encryption, in my experience. Didn't lose data but constant
| bugs.
| ackshi wrote:
| Keeping enough free space should be much less of a problem with
| SSDs. They can tune it so the array needs to be 95% full before
| the slower best-fit allocator kicks in.
| https://openzfs.readthedocs.io/en/latest/performance-tuning....
|
| I think that 80% figure is from when drives were much smaller
| and finding free space over that threshold with the first-fit
| allocator was harder.
| ryao wrote:
| Unlike ext4 that locks the directory when unlinking, ZFS is
| able to scale on parallel unlinking. In specific, ZFS has range
| locks that permit directory entries to be removed in parallel
| from the extendible hash trees that store them. While this is
| relatively slow for sequential workloads, it is fast on
| parallel workloads. If you want to delete a large directory
| subtree fast on ZFS, do the rm operations in parallel. For
| example, this will run faster on ZFS than a naive rm operation:
| find /path/to/subtree -name -type f | parallel -j250 rm --
| rm -r /path/to/subtree
|
| A friend had this issue on spinning disks the other day. I
| suggested he do this and the remaining files were gone in
| seconds when at the rate his naive rm was running, it should
| have taken minutes. It is a shame that rm does not implement a
| parallel unlink option internally (e.g. -j), which would be
| even faster, since it would eliminate the execve overhead and
| likely would eliminate some directory lookup overhead too,
| versus using find and parallel to run many rm processes.
|
| For something like fast mail that has many users, unlinking
| should be parallel already, so unlinking on ZFS will not be
| slow for them.
|
| By the way, that 80% figure has not been true for more than a
| decade. You are referring to the best fit allocator being used
| to minimize external fragmentation under low space conditions.
| The new figure is 96%. It is controlled by metaslab_df_free_pct
| in metaslab.c:
|
| https://github.com/openzfs/zfs/blob/zfs-2.2.0/module/zfs/met...
|
| Modification operations become slow when you are at/above 96%
| space filled, but that is to prevent even worse problems from
| happening. Note that my friend's pool was below the 96%
| threshold when he was suffering from a slow rm -r. He just had
| a directory subtree with a large amount of directory entries he
| wanted to remove.
|
| For what it is worth, I am the ryao listed here and I was
| around when the 80% to 96% change was made:
|
| https://github.com/openzfs/zfs/graphs/contributors
| switch007 wrote:
| I discovered this yesterday! Blew my mind. I had to check 3
| times that the files were actually gone and that I specified
| the correct directory as I couldn't believe how quick it ran.
| Super cool
| jmakov wrote:
| Thank you very much for sharing this, very insightful.
| ryao wrote:
| Thank you for posting your original comment. The process of
| writing my reply gave me a flash of inspiration:
|
| https://github.com/openzfs/zfs/pull/16896
|
| I doubt that this will make us as fast as ext4 at unlinking
| files in a single thread, but it should narrow the gap
| somewhat. It also should make many other common operations
| slightly faster.
|
| I had looked into range lock overhead years ago, but when I
| saw the majority of time entering range locks was spent in
| an "unavoidable" memory allocation, I did not feel that
| making the operations outside the memory allocation faster
| would make much difference, so I put this down. I imagine
| many others profiling the code came to the same conclusion.
| Now that the memory allocation overhead will soon be gone,
| additional profiling might yield further improvements. :)
| caidan wrote:
| I absolutely love Fastmail. I moved off of Gmail years ago with
| zero regrets. Better UI, better apps, better company, and need I
| say better service? I still maintain and fetch from a Gmail
| account so it all just works seamlessly for receiving and sending
| Gmail, so you don't have to give anything up either.
| pawelduda wrote:
| Their android app has always been much snappier than Gmail,
| it's the little things that drew me to it years ago
| jb1991 wrote:
| Their UI is definitely faster but I do prefer the gmail UI, for
| example how new messages are displayed in threads is quite
| useless in fastmail.
| petesergeant wrote:
| I use Fastmail for my personal mail, and I don't regret it, but
| I'm not quite as sold as you are, I guess maybe because I still
| have a few Google work accounts I need to use. Spam filtering
| in Fastmail is a little worse, and the search is _terrible_.
| The iOS app is usable but buggy. The easy masked emails are a
| big win though, and setting up new domains feels like less of a
| hassle with FM. I don't regret using Fastmail, and I'd use them
| again for my personal email, but it doesn't feel like a slam
| dunk.
| mlfreeman wrote:
| I moved from my own colocated 1U running Mailcow to Fastmail
| and don't regret it one bit. This was an interesting read, glad
| to see they think things through nice and carefully.
|
| The only things I wish FM had are all software:
|
| 1. A takeout-style API to let me grab a complete snapshot once
| a week with one call
|
| 2. The ability to be an IdP for Tailscale.
| xerp2914 wrote:
| 100% this. I migrated from Gmail to Fastmail about 5 years ago
| and it has been rock solid. My only regret is that I didn't do
| it sooner.
| tucnak wrote:
| Yeah, Cloud is a bit of a scam innit? Oxide is looking more and
| more attractive every day as the industry corrects itself from
| overspending on capabilities they would never need.
| klysm wrote:
| It's trading time for money
| jgb1984 wrote:
| Fake news. I've got my bare metal server deployed and
| installed with my ansible playbook even before you manage to
| log into the bazillion layers of abstraction that is AWS.
| acedTrex wrote:
| But can you do that on demand in minutes for 1000
| application teams that have unique snowflake needs. Because
| terraform or bicep can.
| klysm wrote:
| In multiple regions?
| rob_c wrote:
| Yes, welcome to business. But frankly an email provider needs
| to have their own metal, if they don't they're not worth
| doing business with
| mgaunard wrote:
| Why is it surprising? It's well known cloud is 3 times the price.
| diggan wrote:
| Because the default for companies today is cloud, even though
| it almost never makes sense. Sure, if you have really spikey
| load, need to dynamically scale at any point and don't care
| about your spend, it might make sense.
|
| Ive even worked in companies where the engineering team spent
| effort and time on building "scalable infrastructure" before
| the product itself even found product-market fit...
| dewey wrote:
| Nobody said it's surprising though, they are well aware of it
| having done it for more than two decades. Many newcomers are
| not aware of it though, as their default is "cloud" and they
| never even shopped for servers, colocation or looked around on
| the dedicated server market.
| aimanbenbaha wrote:
| I don't think they're not just aware. But purely from scaling
| and distribution perspective it'd be wiser to start on cloud
| while you're still on the product-market fit phase. Also
| 'bare metal' requires more on the capex end and with how our
| corporate tax system is set it's just discouraging to go on
| this lane first and it'd be better off to spend on acquiring
| clients.
|
| Also I'd guess a lot of technical founders are more familiar
| with cloud/server-side than with dealing or delegating
| sysadmin taks that might require adding members to the team.
| dewey wrote:
| I agree, the cloud definitely has a lot of use cases and
| when you are building more complicated systems it makes
| sense to just have to do a few clicks to get a new stack
| setup vs. having someone evaluate solutions and getting
| familiar with operating them on a deep level (backups
| etc.).
| rrgok wrote:
| I would like to know the tech stack behind it.
| antihero wrote:
| I've started to host my own sites and stuff on an old MacBook in
| a cupboard with a shit old external hardware Ava microk8s and
| it's great!
| theoreticalmal wrote:
| Another homelabber joins the ranks!!
| tndibona wrote:
| But what about the cost and complexity of a room with the racks
| and the cooling needs of running these machines? And the
| uninterrupted power setup? The wiring mess behind the racks.
| hyhconito wrote:
| I'm not fastmail but this is not rocket science. Has everyone
| forgotten how datacentre services work in 2024?
| rob_c wrote:
| Yes they have and they feel they deserve credit for
| discovering a WiFi cable is more reliable to the new shiny
| kit that was sold to them by a vendor...
| jonatron wrote:
| Even for cloud providers, these are mostly other people's
| problems, eg: Equinix
| 7952 wrote:
| Do colocation facilities solve that?
| bradfa wrote:
| There is a very competitive market for colo providers in
| basically every major metropolitan area in the US, Europe, and
| Asia. The racks, power, cooling, and network to your machines
| is generally very robust and clearly documented on how to
| connect. Deploying servers in house or in a colo is a well
| understood process with many experts who can help if you don't
| have these skills.
| rob_c wrote:
| Colo offers the ability to ship and deploy and keep latencies
| down if you're global, but if you're local yes you should
| just get someone on site and the modern equivalent of a T1
| line setup to your premises if you're running "online"
| services.
| grishka wrote:
| Own hardware doesn't mean own data center. Many data centers
| offer colocation.
| lokimedes wrote:
| A mail-cloud provider uses its own hardware? Well, that's to be
| expected, it would be a refreshing article if it was written by
| one of their customers.
| tuananh wrote:
| gmail does spam filtering very well for me. fastmail on the other
| hands, puts lots of legit emails into spam folder. manually
| marking "not spam" doesn't help
|
| other than that, i'm happy with fastmail.
| jacobdejean wrote:
| iCloud is just as bad, sends important things to spam
| constantly and marking as "not spam" has never done anything
| perceivable.
| ghaff wrote:
| If I look at my Gmail SPAM folder, there is very rarely
| something genuinely important in it. What there is a fair bit
| of though is random newsletters and announcements that I may
| have signed up for in some way shape or form that I don't
| really care about or generally look at. I assume they've been
| reported as SPAM by enough people rather than simply
| unsubscribed to that Google now labels them as such.
| xsc wrote:
| Are those backups geographically distributed?
| christophilus wrote:
| Yes.
| _bare_metal wrote:
| Plugging https://BareMetalSavings.com
|
| in case you want to ballpark-estimate your move off of the cloud
|
| Bonus points: I'm a Fastmail customer, so it tangentially tracks
|
| ----
|
| Quick note about the article: ZFS encryption can be flaky, be
| sure you know what you're doing before deploying for your
| infrastructure.
|
| Relevant Reddit discussion:
| https://www.reddit.com/r/zfs/comments/1f59zp6/is_zfs_encrypt...
|
| A spreadsheet of related issues that I can't remember who made:
|
| https://docs.google.com/spreadsheets/d/1OfRSXibZ2nIE9DGK6sww...
| brongondwana wrote:
| Yeah, we know about the ZFS encryption with send/receive bug,
| it's frustrating our attempts to get really nice HA support on
| our logging system... but so far it appears that just deleting
| the offsending snapshot and creating a new one works, and we're
| funding some research into the issue as well.
|
| This is the current script - it runs every minute for each pool
| synced between the two log servers:
| https://gist.github.com/brong/6a23fee1480f2d62b8a18ade5aea66...
| ackshi wrote:
| I'm a little surprised it seems they didn't have some existing
| compression solution before moving to zfs. With so much
| repetitive text across emails I would think there would be a LOT
| to gain, such as from dictionaries, compressing many emails into
| bigger blobs, and fine-tuning compression options.
| silvestrov wrote:
| They use ZFS with zstd which likely compresses well enough.
|
| Custom compression code can introduce bugs that can kill
| Fastmail's reputation of reliability.
|
| It's better to use a well tested solution that cost a bit more.
| rob_c wrote:
| Hosts online service seems to think deserving of medal for
| discovering that S3 buckets from a cloud provider are crap and
| cost a fortune.
|
| The heading in this space makes your think they're running custom
| FPGAs such as with Gmail, not just running on metal... As for
| drive failures, welcome to storage at scale. Build your solution
| so it's a weekly task to replace 10disks at a time not critical
| at 2am when a single disk dies...
|
| Storing/Accessing tonnes of <4kB files is difficult, but other
| providers are doing this on their own metal with CEPH at the PB
| scale.
|
| I love ZFS, it's great with per-disk redundancy but CEPH is
| really the only game in town for inter-rack/DC resilience which I
| would hope my email provider has.
| johnklos wrote:
| The whole push to the cloud has always fascinated me. I get it -
| most people aren't interested in babysitting their own hardware.
| On the other hand, a business of just about any size that has any
| reasonable amount of hosting is better off with their own systems
| when it comes purely to cost.
|
| All the pro-cloud talking points are just that - talking points
| that don't persuade anyone with any real technical understanding,
| but serve to introduce doubt to non-technical people and to trick
| people who don't examine what they're told.
|
| What's particularly fascinating to me, though, is how some people
| are so pro-cloud that they'd argue with a writeup like this with
| silly cloud talking points. They don't seem to care much about
| data or facts, just that they love cloud and want everyone else
| to be in cloud, too. This happens much more often on sites like
| Reddit (r/sysadmin, even), but I wouldn't be surprised to see a
| little of it here.
|
| It makes me wonder: how do people get so sold on a thing that
| they'll go online and fight about it, even when they lack facts
| or often even basic understanding?
|
| I can clearly state why I advocate for avoiding cloud: cost,
| privacy, security, a desire to not centralize the Internet. The
| reason people advocate for cloud for others? It puzzles me.
| "You'll save money," "you can't secure your own machines," "it's
| simpler" all have worlds of assumptions that those people can't
| possibly know are correct.
|
| So when I read something like this from Fastmail which was
| written without taking an emotional stance, I respect it. If I
| didn't already self-host email, I'd consider using Fastmail.
|
| There used to be so much push for cloud everything that an
| article like this would get fanatical responses. I hope that it's
| a sign of progress that that fanaticism is waning and people
| aren't afraid to openly discuss how cloud isn't right for many
| things.
| mjburgess wrote:
| 1. People are credulous
|
| 2. People therefore repeat talking points which seem in their
| interest
|
| 3. With enough repetition these become their beliefs
|
| 4. People will defend their beliefs as _theirs_ against attack
|
| 5. Goto 1
| anotherhue wrote:
| They spent time and career points learning cloud things and
| dammit it's going to matter!
|
| You can't even blame them too much, the amount of cash poured
| into cloud marketing is astonishing.
| sgarland wrote:
| The thing that frustrates me is it's possible to know how to
| do both. I have worked with multiple people who are quite
| proficient in both areas.
|
| Cloud has definite advantages in some circumstances, but so
| does self-hosting; moreover, understanding the latter makes
| the former much, much easier to reason about. It's silly to
| limit your career options.
| noworriesnate wrote:
| Being good at both is twice the work, because even if some
| concepts translate well, IME people won't hire someone
| based on that. "Oh you have experience with deploying
| RabbitMQ but not AWS SQS? Sorry, we're looking for someone
| more qualified."
| sgarland wrote:
| That's a great filter for places I don't want to work at,
| then.
| cpursley wrote:
| The fact is, managing your own hardware is a pita and a
| distraction from focusing on the core product. I loathe messing
| with servers and even opt for "overpriced" paas like fly,
| render, vercel. Because every minute messing with and
| monitoring servers is time not spent on product. My tune might
| change past a certain size and a massive cloud bill and there's
| room for full time ops people, but to offset their salary, it
| would have to be huge.
| cpursley wrote:
| Anecdotal - but I once worked for a company where the product
| line I built for them after acquisition was delayed by 5
| months because that's how long it took to get the hardware
| ordered and installed in the datacenter. Getting it up on AWS
| would have been a days work, maybe two.
| stubish wrote:
| Yes, it is death by 1000 cuts. Speccing, negotiating with
| hardware vendors, data center selection and negotiating, DC
| engineer/remote hands, managing security cage access,
| designing your network, network gear, IP address ranges,
| BGP, secure remote console access, cables, shipping,
| negotiating with bandwidth providers (multiple, for
| redundancy), redundant hardware, redundant power sources,
| UPS. And then you get to plug your server in. Now duplicate
| other stuff your cloud might provide, like offsite backups,
| recovery procedures, HA storage, geographic redundancy. And
| do it again when you outgrown your initial DC. Or build
| your own DC (power, climate, fire protection, security,
| fiber, flooring, racks)
| sgarland wrote:
| Much of this is still required in cloud. Also, I think
| you're missing the middle ground where 99.99% of
| companies could happily exist indefinitely: colo. It
| makes little to no financial or practical sense for most
| to run their own data centers.
| sroussey wrote:
| Oh, absolutely, with your own hardware you need planning.
| Time to deployment is definitely a thing.
|
| Really, the one major thing that bites on cloud providers
| in there 99.9% margin on egress. The markup is insane.
| fhd2 wrote:
| I'm with you there, with stuff like fly.io, there's really no
| reason to worry about infrastructure.
|
| AWS, on the other hand, seems about as time consuming and
| hard as using root servers. You're at a higher level of
| abstraction, but the complexity is about the same I'd say. At
| least that's my experience.
| cpursley wrote:
| I agree with this position and actively avoid AWS
| complexity.
| noprocrasted wrote:
| That argument makes sense for PaaS services like the ones you
| mention. But for bare "cloud" like AWS, I'm not convinced it
| is saving any effort, it's merely swapping one kind of
| complexity with another. Every place I've been in had full-
| time people messing with YAML files or doing "something" with
| the infrastructure - generally trying to work around the
| (self-inflicted) problems introduced by their cloud provider
| - whether it's the fact you get 2010s-era hardware or that
| you get nickel & dimed on absolutely arbitrary actions that
| have no relationship to real-world costs.
| jeffbee wrote:
| In what sense is AWS "bare cloud"? S3, DynamoDB, Lambda,
| ECS?
| inemesitaffia wrote:
| EC2
| bsder wrote:
| I would actually argue that EC2 is a "cloud smell"--if
| you're using EC2 you're doing it wrong.
| noprocrasted wrote:
| How do you configure S3 access control? You need to learn
| & understand how their IAM works.
|
| How do you even point a pretty URL to a lambda? Last time
| I looked you need to stick an "API gateway" in front
| (which I'm sure you also get nickel & dimed for).
|
| How do you go from "here's my git repo, deploy this on
| Fargate" with AWS? You need a CI pipeline which will run
| a bunch of awscli commands.
|
| And I'm not even talking about VPCs, security groups,
| etc.
|
| Somewhat different skillsets than old-school sysadmin
| (although once you know sysadmin basics, you realize a
| lot of these are just the same concepts under a branded
| name and arbitrary nickel & diming sprinkled on top), but
| equivalent in complexity.
| sgarland wrote:
| Counterpoint: if you're never "messing with servers," you
| probably don't have a great understanding of how their
| metrics map to those of your application's, and so if you
| bottleneck on something, it can be difficult to figure out
| what to fix. The result is usually that you just pay more
| money to vertically scale.
|
| To be fair, you did say "my tune might change past a certain
| size." At small scale, nothing you do within reason really
| matters. World's worst schema, but your DB is only seeing 100
| QPS? Yeah, it doesn't care.
| tokioyoyo wrote:
| I don't think you're correct. I've watched junior/mid-level
| engineers figure things out solely by working on the cloud
| and scaling things to a dramatic degree. It's really not a
| rocket science.
| sgarland wrote:
| I didn't say it's rocket science, nor that it's
| impossible to do without having practical server
| experience, only that it's more difficult.
|
| Take disks, for example. Most cloud-native devs I've
| worked with have no clue what IOPS are. If you saturate
| your disk, that's likely to cause knock-on effects like
| increased CPU utilization from IOWAIT, and since "CPU is
| high" is pretty easy to understand for anyone, the
| seemingly obvious solution is to get a bigger instance,
| which depending on the application, may inadvertently
| solve the problem. For RDBMS, a larger instance means a
| bigger buffer pool / shared buffers, which means fewer
| disk reads. Problem solved, even though actually solving
| the root cause would've cost 1/10th or less the cost of
| bumping up the entire instance.
| tokioyoyo wrote:
| > Most cloud-native devs
|
| You might be making some generalizations from your
| personal experience. Since 2015, at all of my jobs,
| everything has been running on some sort of a cloud. I'm
| yet to meet a person who doesn't understand IOPS. If I
| was a junior (and from my experience, that's what they
| tend to do), I'd just google "slow X potential reasons".
| You'll most likely see some references to IOPS and
| continue your research from there.
|
| We've learned all these things one way or another. My
| experience started around 2007ish when I was renting out
| cheap servers from some hosting providers. Others might
| be dipping their feet into readily available cloud-
| infrastructure, and learning it from that end. Both
| works.
| icedchai wrote:
| Writing piles of IaC code like Terraform and CloudFormation
| is also a PITA and a distraction from focusing on your core
| product.
|
| PaaS is probably the way to go for small apps.
| UltraSane wrote:
| But that effort has a huge payoff in that it can be used to
| disaster recovery in a new region and to spin up testing
| environments.
| sgarland wrote:
| A small app (or a larger one, for that matter) can quite
| easily run on infra that's instantiated from canned IaC,
| like TF AWS Modules [0]. If you can read docs, you should
| be able to quite trivially get some basic infra up in a
| day, even with zero prior experience managing it.
|
| [0]: https://github.com/terraform-aws-modules
| icedchai wrote:
| Yes, I've used several of these modules myself. They save
| tons of time! Unfortunately, for legacy projects, I
| inherited a bunch of code from individuals that built
| everything "by hand" then copy-pasted everything. No re-
| usability.
| xorcist wrote:
| > every minute messing with and monitoring servers
|
| You're not monitoring your deployments because "cloud"?
| jeffbee wrote:
| The problem with your claims here is they can only be right if
| the entire industry is experiencing mass psychosis. I reject a
| theory that requires that, because my ego just isn't that
| large.
|
| I once worked for several years at a publicly traded firm well-
| known for their return-to-on-prem stance, and honestly it was a
| complete disaster. The first-party hardware designs didn't work
| right because they didn't have the hardware designs staffing
| levels to have de-risked to possibility that AMD would fumble
| the performance of Zen 1, leaving them with a generation of
| useless hardware they nonetheless paid for. The OEM hardware
| didn't work right because they didn't have the chops to qualify
| it either, leaving them scratching their heads for months over
| a cohort of servers they eventually discovered were
| contaminated with metal chips. And, most crucially, for all the
| years I worked there, the only thing they wanted to accomplish
| was failover from West Coast to East Coast, which never worked,
| not even once. When I left that company they were negotiating
| with the data center owner who wanted to triple the rent.
|
| These experiences tell me that cloud skeptics are sometimes
| missing a few terms in their equations.
| johnklos wrote:
| > The problem with your claims here is they can only be right
| if the entire industry is experiencing mass psychosis.
|
| What's the market share of Windows again? ;)
| mardifoufs wrote:
| You're proving their point though. Considering that there
| are tons of reasons to use windows, some people just don't
| see them and think that everyone else is crazy :^) (I know
| you're joking but some people actually unironically have
| the same sentiment)
| noprocrasted wrote:
| There's however a middle-ground between run your own
| colocated hardware and cloud. It's called "dedicated" servers
| and many hosting providers (from budget bottom-of-the-barrel
| to "contact us" pricing) offer it.
|
| Those take on the liability of sourcing, managing and
| maintaining the hardware for a flat monthly fee, and would
| take on such risk. If they make a bad bet purchasing
| hardware, you won't be on the hook for it.
|
| This seems like a point many pro-cloud people
| (intentionally?) overlook.
| floating-io wrote:
| "Vendor problems" is a red herring, IMO; you can have those
| in the cloud, too.
|
| It's been my experience that those who can build good,
| reliable, high-quality systems, can do so either in the cloud
| or on-prem, generally with equal ability. It's just another
| platform to such people, and they will use it appropriately
| and as needed.
|
| Those who can only make it work in the cloud are either
| building very simple systems (which is one place where the
| cloud can be appropriate), or are building a house of cards
| that will eventually collapse (or just cost them obscene
| amounts of money to keep on life support).
|
| Engineering is engineering. Not everyone in the business does
| it, unfortunately.
|
| Like everything, the cloud has its place -- but don't
| underestimate the number of decisions that get taken out of
| the hands of technical people by the business people who went
| golfing with their buddy yesterday. He just switched to
| Azure, and it made his accountants really happy!
|
| The whole CapEx vs. OpEx issue drives me batty; it's the
| number one cause of cloud migrations in my career. For
| someone who feels like spent money should count as spent
| money regardless of the bucket it comes out of, this twists
| my brain in knots.
|
| I'm clearly not a finance guy...
| sgarland wrote:
| > or are building a house of cards that will eventually
| collapse (or just cost them obscene amounts of money to
| keep on life support)
|
| Ding ding ding. It's this.
|
| > The whole CapEx vs. OpEx issue drives me batty
|
| Seconded. I can't help but feel like it's not just a "I
| don't understand money" thing, but more of a "the way Wall
| Street assigns value is fundamentally broken." Spending
| $100K now, once, vs. spending $25K/month indefinitely does
| not take a genius to figure out.
| krsgjerahj wrote:
| you forgot cogs
|
| it's all about painting the right picture for your
| investors, so you make up shit and classify as cogs or opex
| depending on what is most beneficial for you in the moment
| marcosdumay wrote:
| > The problem with your claims here is they can only be right
| if the entire industry is experiencing mass psychosis.
|
| Yes. Mass psychosis explains an incredible number of
| different and apparently unrelated problems with the
| industry.
| onli wrote:
| The one convincing argument from technical people I saw, that
| would be repeated to your comment, is that by now, you dont
| find enough experienced engineers to reliably setup some really
| big systems. Because so much went to the cloud, a lot of the
| knowledge is buried there.
|
| That came from technical people who I didn't perceive as being
| dogmatically pro-cloud.
| zosima wrote:
| Cloud expands the capabilities of what one team can manage by
| themselves, enabling them to avoid a huge amount of internal
| politics.
|
| This is worth astronomical amounts of money in big corps.
| acedTrex wrote:
| I have said for years the value of cloud is mainly its api,
| thats the selling point in large enterprise.
| sgarland wrote:
| Self-hosted software also has APIs, and Terraform
| libraries, and Ansible playbooks, etc. It's just that you
| have to know what it is you're trying to do, instead of
| asking AWS what collection of XaaS you should use.
| sgarland wrote:
| I'm not convinced this is entirely true. The upfront cost if
| you don't have the skills, sure - it takes time to learn
| Linux administration, not to mention management tooling like
| Ansible, Puppet, etc.
|
| But once those are set up, how is it different? AWS is quite
| clear with their responsibility model that you still have to
| tune your DB, for example. And for the setup, just as there
| are Terraform modules to do everything under the sun, there
| are Ansible (or Chef, or Salt...) playbooks to do the same.
| For both, you _should_ know what all of the options are
| doing.
|
| The only way I see this sentiment being true is that a dev
| team, with no infrastructure experience, can more easily spin
| up a lot of infra - likely in a sub-optimal fashion - to run
| their application. When it inevitably breaks, they can then
| throw money at the problem via vertical scaling, rather than
| addressing the root cause.
| the__alchemist wrote:
| Do you need those tools? It seems that for fundamental web
| hosting, you need your application server, nginx or
| similar, postgres or similar, and a CLI. (And an
| interpreter etc if your application is in an interpreted
| lang)
| sgarland wrote:
| I suppose that depends on your RTO. With cloud providers,
| even on a bare VM, you can to some extent get away with
| having no IaC, since your data (and therefore config) is
| almost certainly on networked storage which is redundant
| by design. If an EC2 fails, or even if one of the drives
| in your EBS drive fails, it'll probably come back up as
| it was.
|
| If it's your own hardware, if you don't have IaC of some
| kind - even something as crude as a shell script - then a
| failure may well mean you need to manually set everything
| up again.
| noprocrasted wrote:
| Get two servers (or three, etc)?
| sgarland wrote:
| Well, sure - I was trying to do a comparison in favor of
| cloud, because the fact that EBS Volumes can magically
| detach and attach is admittedly a neat trick. You can of
| course accomplish the same (to a certain scale) with
| distributed storage systems like Ceph, Longhorn, etc. but
| then you have to have multiple servers, and if you have
| multiple servers, you probably also have your application
| load balanced with failover.
| zbentley wrote:
| For fundamentals, that list is missing:
|
| - Some sort of firewall or network access control. Being
| able to say "allow http/s from the world (optionally
| minus some abuser IPs that cause problems), and allow SSH
| from developers (by IP, key, or both)" at a separate
| layer from nginx is prudent. Can be ip/tables config on
| servers or a separate firewall appliance.
|
| - Some mechanism of managing storage persistence for the
| database, e.g. backups, RAID, data files stored on fast
| network-attached storage, db-level replication. Not
| losing all user data if you lose the DB server is table
| stakes.
|
| - Something watching external logging or telemetry to let
| administrators know when errors (e.g. server failures,
| overload events, spikes in 500s returned) occur. This
| could be as simple as Pingdom or as involved as automated
| alerting based on load balancer metrics. Relying on users
| to report downtime events is not a good approach.
|
| - Some sort of CDN, for applications with a frontend
| component. This isn't required for fundamental web
| hosting, but for sites with a frontend and even moderate
| (10s/sec) hit rates, it can become required for
| cost/performance; CDNs help with egress congestion (and
| fees, if you're paying for metered bandwidth).
|
| - Some means of replacing infrastructure from nothing. If
| the server catches fire or the hosting provider nukes it,
| having a way to get back to where you were is important.
| Written procedures are fine if you can handle long
| downtime while replacing things, but even for a handful
| of application components those procedures get pretty
| lengthy, so you start wishing for automation.
|
| - Some mechanism for deploying new code, replacing
| infrastructure, or migrating data. Again, written
| procedures are OK, but start to become unwieldy very
| early on ('stop app, stop postgres, upgrade the postgres
| version, start postgres, then apply application
| migrations to ensure compatibility with new version of
| postgres, then start app--oops, forgot to take a postgres
| backup/forgot that upgrading postgres would break the
| replication stream, gotta write that down for net
| time...').
|
| ...and that's just for a very, very basic web hosting
| application--one that doesn't need caches, blob stores,
| the ability to quickly scale out application server or
| database capacity.
|
| Each of those things can be accomplished the traditional
| way--and you're right, that sometimes that way is easier
| for a given item in the list (especially if your
| maintainers have expertise in that item)! But in
| aggregate, having a cloud provider handle each of those
| concerns tends to be easier overall and not require
| nearly as much in-house expertise.
| zosima wrote:
| You are focusing on technology. And sure of course you can
| get most of the benefits of AWS a lot cheaper when self-
| hosting.
|
| But when you start factoring internal processes and
| incompetent IT departments, suddenly that's not actually a
| viable option in many real-world scenarios.
| jeffbee wrote:
| Exactly. With the cloud you can suddenly do all the
| things your tyrannical Windows IT admin has been saying
| are impossible for the last 30 years.
| the_arun wrote:
| It is similar to cooking at home vs ordering cooked food
| everyday. If some guarantees the taste & quality people
| would happy to outsource it.
| tylerchurch wrote:
| I think this is only true for teams and apps of a certain
| size.
|
| I've worked on plenty of teams with relatively small apps,
| and the difference between:
|
| 1. Cloud: "open up the cloud console and start a VM"
|
| 2. Owned hardware: "price out a server, order it, find a
| suitable datacenter, sign a contract, get it racked, etc."
|
| Is quite large.
|
| #1 is 15 minutes for a single team lead.
|
| #2 requires the team to agree on hardware specs, get
| management approval, finance approval, executives signing
| contracts. And through all this you don't have anything
| online yet for... weeks?
|
| If your team or your app is large, this probably all
| averages out in favor of #2. But small teams often don't
| have the bandwidth or the budget.
| noprocrasted wrote:
| 3. "Dedicated server" at any hosting provider
|
| Open their management console, press order now, 15 mins
| later get your server's IP address.
| zbentley wrote:
| For purposes of this discussion, isn't AWS just a very
| large hosting provider?
|
| I.e. most hosting providers give you the option for
| virtual or dedicated hardware. So does Amazon (metal
| instances).
|
| Like, "cloud" was always an ill-defined term, but in the
| case of "how do I provision full servers" I think there's
| no qualitative difference between Amazon and other
| hosting providers. Quantitative, sure.
| noprocrasted wrote:
| > Amazon (metal instances)
|
| But you still get nickel & dimed and pay insane costs,
| including on bandwidth (which is free in most
| conventional hosting providers, and overages are 90x
| cheaper than AWS' costs).
| irunmyownemail wrote:
| Qualitatively, AWS is greedy and nickle and dime you to
| death. Their Route53 service doesn't even have all the
| standard DNS options I need and I can get everywhere else
| or even on my own running bind9. I do not use IPv6 for
| several reasons, when AWS decided charge for IPv4, I went
| looking elsewhere to get my VM's.
|
| I can't even imagine how much the US Federal Government
| is charging American taxpayers to pay AWS for hosting
| there, it has to be astronomical.
| everfrustrated wrote:
| Out of curiosity, which DNS record types do you need that
| Route53 doesn't support?
| goodpoint wrote:
| More like 15 seconds.
| AnthonyMouse wrote:
| You're assuming that hosting something in-house implies
| that each application gets its own physical server.
|
| You buy a couple of beastly things with dozens of cores.
| You can buy twice as much capacity as you actually use
| and still be well under the cost of cloud VMs. Then it's
| still VMs and adding one is just as fast. When the load
| gets above 80% someone goes through the running VMs and
| decides if it's time to do some house cleaning or it's
| time to buy another host, but no one is ever waiting on
| approval because you can use the reserve capacity
| immediately while sorting it out.
| necovek wrote:
| Before the cloud, you could get a VM provisioned (virtual
| servers) or a couple of apps set up (LAMP stack on a
| shared host ;)) in a few minutes over a web interface
| already.
|
| "Cloud" has changed that by providing an API to do this,
| thus enabling IaC approach to building combined hardware
| and software architectures.
| layer8 wrote:
| The SMB I work for runs a small on-premise data center
| that is shared between teams and projects, with maybe 3-4
| FTEs managing it (the respective employees also do dev
| and other work). This includes self-hosting email,
| storage, databases, authentication, source control, CI,
| ticketing, company wiki, chat, and other services. The
| current infrastructure didn't start out that way and
| developed over many years, so it's not necessarily
| something a small startup can start out with, but beyond
| a certain company size (a couple dozen employees or more)
| it shouldn't really be a problem to develop that, if
| management shares the philosophy. I certainly find it
| preferable culturally, if not technically, to maximize
| independence in that way, have the local expertise and
| much better control over everything.
|
| One (the only?) indisputable benefit of cloud is the
| ability to scale up faster (elasticity), but most
| companies don't really need that. And if you do end up
| needing it after all, then it's a good problem to have,
| as they say.
| SoftTalker wrote:
| Your last paragraph identifies the reason that running
| their own hardware makes sense for Fastmail. The demand
| for email is pretty constant. Everyone does roughly the
| same amount of emailing every day. Daily load is
| predictable, and growth is predictable.
|
| If your load is very spiky, it might make more sense to
| use cloud. You pay more for the baseline, but if your
| spikes are big enough it can still be cheaper than
| provisioning your own hardware to handle the highest
| loads.
|
| Of course there's also possibly a hybrid approach, you
| run your own hardware for base load and augment with
| cloud for spikes. But that's more complicated.
| maccard wrote:
| I work for a 50 person subsidiary of a 30k person
| organisation. I needed a domain name. I put in the
| purchase request and 6 months later eventually gave up,
| bought it myself and expensed it.
|
| Our AWS account is managed by an SRE team. It's a 3 day
| turnaround process to get any resources provisioned, and
| if you don't get the exact spec right (you forgot to
| specify the iops on the volume? Oops) 3 day turnaround.
| Already started work when you request an adjustment?
| Better hope as part of your initial request you specified
| backups correctly or you're starting again.
|
| The overhead is absolutely enormous, and I actually don't
| even have billing access to the AWS account that I'm
| responsible for.
| j45 wrote:
| Manageability of cloud without a dedicated resource is a
| form of resource creep, and shadow labour costs that
| aren't factored in.
|
| How many things don't end up happening because of this?
| When they need a sliver of resources in the start?
| cyberax wrote:
| > Our AWS account is managed by an SRE team.
|
| That's an anti-pattern (we call it "the account") in the
| AWS architecture.
|
| AWS internally just uses multiple accounts, so a team can
| get their own account with centrally-enforced guardrails.
| It also greatly simplifies billing.
| maccard wrote:
| That's not something that I have control over or
| influence over.
| mbesto wrote:
| > 3 day turnaround process to get any resources
| provisioned
|
| Now imagine having to deal with procurement to purchase
| hardware for your needs. 6 months later you have a
| server. Oh you need a SAN for object storage? There goes
| another 6 months.
| maccard wrote:
| At a previous job we had some decent on prem resources
| for internal services. The SRE guys had a bunch of extra
| compute and you would put in a ticket for a certain
| amount of resources (2 cpu, SSD, 8GB memory x2 on
| different hosts). There wasn't a massive amount of
| variability between the hardware, and you just requested
| resources to be allocated from a bunch of hypervisors.
| Turnaround time was about 3 days too. Except, you were t
| required to be self sufficient in AWS terminology to
| request exactly what you needed .
| xorcist wrote:
| There is a large gap between "own the hardware" and "use
| cloud hosting". Many people rent the hardware, for
| example, and you can use managed databases which is one
| step up than "starting a vm".
|
| But your comparison isn't fair. The difference between
| running your own hardware and using the cloud (which is
| perhaps not even the relevant comparison but let's run
| with it) is the difference between:
|
| 1. Open up the cloud console, and
|
| 2. You already have the hardware so you just run "virsh"
| or, more likely, do nothing at all because you own the
| API so you have already included this in your Ansible or
| Salt or whatever you use for setting up a server.
|
| Because ordering a new physical box isn't really
| comparable to starting a new VM, is it?
| sanderjd wrote:
| I've always liked the theory of #2, I just haven't worked
| anywhere yet that has executed it well.
| Symbiote wrote:
| You have omitted the option between the two, which is
| renting a server. No hardware to purchase, maintain or
| set up. Easily available in 15 minutes.
| tylerchurch wrote:
| While I did say "VM" in my original comment, to me this
| counts as "cloud" because the UI is functionally the
| same.
| amluto wrote:
| I've never worked at a company with these particular
| problems, but:
|
| #1: A cloud VM comes with an obligation for someone at
| the company to maintain it. The cloud does not excuse
| anyone from doing this.
|
| #2: Sounds like a dysfunctional system. Sure, it may be
| common, but a medium sized org could easily have some
| datacenter space and allow any team to rent a server or
| an instance, or to buy a server and pay some nominal
| price for the IT team to keep it working. This isn't
| actually rocket science.
|
| Sure, keeping a fifteen year old server working safely is
| a chore, but so is maintaining a fifteen-year-old VM
| instance!
| icedchai wrote:
| Obligation? Far from it. I've worked at some poorly
| staffed companies. Nobody is maintaining old VMs or
| container images. If it works, nobody touches it.
|
| I worked at a supposedly properly staffed company that
| had raised 100's of millions in investment, and it was
| the same thing. VMs running 5 year old distros that
| hadn't been updated in years. 600 day uptimes, no kernel
| patches, ancient versions of Postgres, Python 2.7 code
| everywhere, etc. This wasn't 10 years ago. This was 2
| years ago!
| j45 wrote:
| The cloud is someone else's computer.
|
| Having redirected of a vm provider or installing a hyper
| visor on equipment is another thing.
| j45 wrote:
| There is. Middle ground between the extremes of those
| pendulums of all cloud or physical metal.
|
| You can start with using a cloud only for VMs and only
| run services on it using IaaS or PaaS. Very serviceable.
| warner25 wrote:
| You gave me flashbacks to a far worse bureaucratic
| nightmare with #2 in my last job.
|
| I supported an application with a team of about three
| people for a regional headquarters in the DoD. We had one
| stack of aging hardware that was racked, on a handshake
| agreement with another team, in a nearby facility under
| that other team's control. We had to periodically request
| physical access for maintenance tasks and the facility
| routinely lost power, suffered local network outages,
| etc. So we decided that we needed new hardware and more
| of it spread across the region to avoid the shaky single-
| point-of-failure.
|
| That began a three _year_ process of: waiting for budget
| to be available for the hardware / license / support
| purchases; pitching PowerPoints to senior management to
| argue for that budget (and getting updated quotes every
| time from the vendors); working out agreements with other
| teams at new facilities to rack the hardware; traveling
| to those sites to install stuff; and working through the
| cybersecurity compliance stuff for each site. I left
| before everything was finished, so I don't know how they
| ultimately dealt with needing, say, someone to physically
| reseat a cable in Japan (an international flight away).
| bonoboTP wrote:
| You can get pretty far without any of that fancy stuff. You
| can get plenty done by using parallel-ssh and then focusing
| on the actual thing you develop instead of endless tooling
| and docker and terraform and kubernetes and salt and puppet
| and ansible. Sure, if you know why you need them and know
| what value you get from them OK. But many people just do it
| because it's the thing to do...
| marcosdumay wrote:
| All of that is... completely unrelated to the GP's post.
|
| Did you reply to the right comment? Do you think "politics"
| is something you solve with Ansible?
| sgarland wrote:
| > Cloud expands the capabilities of what one team can
| manage by themselves, enabling them to avoid a huge
| amount of internal politics.
|
| It's related to the first part. Re: the second, IME if
| you let dev teams run wild with "managing their own
| infra," the org as a whole eventually pays for that when
| the dozen bespoke stacks all hit various bottlenecks, and
| no one actually understands how they work, or how to
| troubleshoot them.
|
| I keep being told that "reducing friction" and
| "increasing velocity" are good things; I vehemently
| disagree. It might be good for short-term profits, but it
| is poison for long-term success.
| sanderjd wrote:
| I have never ever worked somewhere with one of these
| "cloud-like but custom on our own infrastructure" setups
| that didn't leak infrastructure concerns through the
| abstraction, to a significantly larger degree than AWS.
|
| I believe it can work, so maybe there are really successful
| implementations of this out there, I just haven't seen it
| myself yet!
| daemonologist wrote:
| Our big company locked all cloud resources behind a
| floating/company-wide DevOps team (git and CI too). We have
| an old on-prem server that we jealously guard because it
| allows us to create remotes for new git repos and deploy
| prototypes without consulting anyone.
|
| (To be fair, I can see why they did it - a lot of deployments
| were an absolute mess before.)
| mark242 wrote:
| This is absolutely spot on.
|
| What do you mean, I can't scale up because I've used my
| hardware capex budget for the year?
| glitchc wrote:
| Cloud solves one problem quite well: Geographic redundancy.
| It's extremely costly with on-prem.
| sgarland wrote:
| Only if you're literally running your own datacenters, which
| is in no way required for the majority of companies. Colo
| giants like Equinix already have the infrastructure in place,
| with a proven track record.
|
| If you enable Multi-AZ for RDS, your bill doubles until you
| cancel. If you set up two servers in two DCs, your initial
| bill doubles from the CapEx, and then a very small percentage
| of your OpEx goes up every month for the hosting. You very,
| very quickly make this back compared to cloud.
| Cyph0n wrote:
| But reliable connectivity between regions/datacenters
| remains a challenge, right? Compute is only one part of the
| equation.
|
| Disclaimer: I work on a cloud networking product.
| sgarland wrote:
| It depends on how deep you want to go. Equinix for one
| (I'm sure others as well, but I'm most familiar with
| them) offers managed cross-DC fiber. You will probably
| need to manage the networking, to be fair, and I will
| readily admit that's not trivial.
| irunmyownemail wrote:
| I use Wireguard, pretty simple, where's the challenge?
| Cyph0n wrote:
| I am referring to the layer 3 connectivity that Wireguard
| is running on top of. Depending on your use case and
| reliability and bandwidth requirements, routing
| everything over the "public" internet won't cut it.
|
| Not to mention setting up and maintaining your physical
| network as the number of physical hosts you're running
| scales.
| dietr1ch wrote:
| Does it? I've seen outages around "Sorry, us-west_carolina-3
| is down". AWS is particularly good at keeping you aware of
| their datacenters.
| bdangubic wrote:
| if you see that you are doing it wrong :)
| sgarland wrote:
| AWS has had multiple outages which were caused by a
| single AZ failing.
| dietr1ch wrote:
| Yup, I was referring to, I guess, one of these,
|
| - https://news.ycombinator.com/item?id=29473630:
| (2021-12-07) AWS us-east-1 outage
|
| - https://news.ycombinator.com/item?id=29648286:
| (2021-12-22) Tell HN: AWS appears to be down again
|
| Maybe things are better now, but it became apparent that
| people might be misusing cloud providers or betting that
| things work flawlessly even if they completely ignore
| AZs.
| toast0 wrote:
| It can be useful. I run a latency sensitive service with
| global users. A cloud lets me run it in 35 locations
| dealing with one company only. Most of those locations only
| have traffic to justify a single, smallish, instance.
|
| In the locations where there's more traffic, and we need
| more servers, there are more cost effective providers, but
| there's value in consistency.
|
| Elasticity is nice too, we doubled our instance count for
| the holidays, and will return to normal in January. And our
| deployment style starts a whole new cluster, moves traffic,
| then shuts down the old cluster. If we were on owned
| hardware, adding extra capacity for the holidays would be
| trickier, and we'd have to have a more sensible deployment
| method. And the minimum service deployment size would
| probably not be a little quad processor box with 2GB ram.
|
| Using cloud for the lower traffic locations and a cost
| effective service for the high traffic locations would
| probably save a bunch of money, but add a lot of deployment
| pain. And a) it's not my decision and b) the cost
| difference doesn't seem to be quite enough to justify the
| pain at our traffic levels. But if someone wants to make a
| much lower margin, much simpler service with lots of
| locations and good connectivity, be sure to post about it.
| But, I think the big clouds have an advantage in geographic
| expansion, because their other businesses can provide
| capital and justification to build out, and high margins at
| other locations help cross subsidize new locations when
| they start.
| dietr1ch wrote:
| I agree it can be useful (latency, availability, using
| off-peak resources), but running globally should be a
| default and people should opt-in into fine-grained
| control and responsibility.
|
| From outside it seems that either AWS picked the wrong
| default to present their customers, or that it's
| unreasonably expensive and it drives everyone into the
| in-depth handling to try to keep cloud costs down.
| icedchai wrote:
| Except, almost nobody, outside of very large players, does
| cross region redundancy. us-east-1 is like a SPOF for the
| entire Internet.
| liontwist wrote:
| Cloud noob here. But if I have a central database what can I
| distribute across geographic regions? Static assets? Maybe a
| cache?
| sgarland wrote:
| Yep. Cross-region RDBMS is a hard problem, even when you're
| using a managed service - you practically always have to
| deal with eventual consistency, or increased latency for
| writes.
| ayuhito wrote:
| My company used to do everything on-prem. Until a literal
| earthquake and tsunami took down a bunch of systems.
|
| After that, yeah we'll let AWS do the hard work of enabling
| redundancy for us.
| sgarland wrote:
| > What's particularly fascinating to me, though, is how some
| people are so pro-cloud that they'd argue with a writeup like
| this with silly cloud talking points.
|
| I'm sure I'll be downvoted to hell for this, but I'm convinced
| that it's largely their insecurities being projected.
|
| Running your own hardware isn't tremendously difficult, as
| anyone who's done it can attest, but it does require a much
| deeper understanding of Linux (and of course, any services
| which previously would have been XaaS), and that's a vanishing
| trait these days. So for someone who may well be quite skilled
| at K8s administration, serverless (lol) architectures, etc. it
| probably is seen as an affront to suggest that their skill set
| is lacking something fundamental.
| TacticalCoder wrote:
| > So for someone who may well be quite skilled at K8s
| administration ...
|
| And running your own hardware is not incompatible with
| Kubernetes: on the contrary. You can fully well have your
| infra spin up VMs and then do container orchestration if
| that's your thing.
|
| And part your hardware monitoring and reporting tool can work
| perfectly fine from containers.
|
| Bare metal -> Hypervisor -> VM -> container orchestration ->
| a container running a "stateless" hardware monitoring
| service. And VMs themselves are "orchestrated" too.
| Everything can be automated.
|
| Anyway say a harddisk being to show errors? Notifications
| being sent (email/SMS/Telegram/whatever) by another service
| in another container, dashboard shall show it too (dashboards
| are cool).
|
| Go to the machine once the spare disk as already been
| resilvered, move it where the failed disk was, plug in a new
| disk that becomes the new spare.
|
| Boom, done.
|
| I'm not saying all self-hosted hardware should do container
| orchestration: there are valid use cases for bare metal too.
|
| But something as to be said about controlling _everything_ on
| your own infra: from the bare metal to the VMs to container
| orchestration. To even potentially your own IP address space.
|
| This is all within reach of an _individual_ , both skill-wise
| and price-wise (including obtaining your own IP address
| space). People who drank the cloud kool-aid should ponder
| this and wonder how good their skills truly are if they
| cannot get this up and working.
| sgarland wrote:
| Fully agree. And if you want to take it to the next level
| (and have a large budget), Oxide [0] seems to have neatly
| packaged this into a single coherent product. They don't
| quite have K8s fully running, last I checked, but there are
| of course other container orchestration systems.
|
| > Go to the machine once the spare disk as already been
| resilvered
|
| Hi, fellow ZFS enthusiast :-)
|
| [0]: https://oxide.computer
| noprocrasted wrote:
| > And running your own hardware is not incompatible with
| Kubernetes: on the contrary
|
| Kubernetes actually makes so much more sense on bare-metal
| hardware.
|
| On the cloud, I think the value prop is dubious - your
| cloud provider is already giving you VMs, why would you
| need to subdivide them further and add yet another layer of
| orchestration?
|
| Not to mention that you're getting 2010s-era performance on
| those VMs, so subdividing them is terrible from a
| performance point of view too.
| sgarland wrote:
| > Not to mention that you're getting 2010s-era
| performance on those VMs, so subdividing them is terrible
| from a performance point of view too.
|
| I was trying in vain to explain to our infra team a
| couple of weeks ago why giving my team a dedicated node
| of a newer instance family with DDR5 RAM would be
| beneficial for an application which is heavily
| constrained by RAM speed. People seem to assume that
| compute is homogenous.
| theideaofcoffee wrote:
| I would wager that the same kind of people that were
| arguing against your request for a specific hardware
| config are the same ones in this comment section railing
| against any sort of self-sufficiency by hosting it
| yourself on hardware. All they know is cloud, all they
| know how to do is "ScAlE Up thE InStanCE!" when shit hits
| the fan. It's difficult to argue against that and make
| real progress. I understand your frustration completely.
| irunmyownemail wrote:
| I agree, I run PROD, TEST and DEV kube clusters all in
| VM's, works great.
| luplex wrote:
| In the public sector, cloud solves the procurement problem. You
| just need to go through the yearlong process once to use a
| cloud service, instead of for each purchase > 1000EUR.
| moltar wrote:
| Cloud is more than instances. If all you need is a bunch of
| boxes, then cloud is a terrible fit.
|
| I use AWS cloud a lot, and almost never use any VMs or
| instances. Most instances I use are along the lines of a simple
| anemic box for a bastion host or some such.
|
| I use higher level abstractions (services) to simplify
| solutions and outsource maintenance of these services to AWS.
| TacticalCoder wrote:
| > All the pro-cloud talking points are just that - talking
| points that don't persuade anyone with any real technical
| understanding ...
|
| And moreover most of the actual interesting things, like having
| VM templates and stateless containers, orchestration, etc. is
| very easy to run yourself and gets you 99.9% of the benefits of
| the cloud.
|
| About just any and every service is available as container file
| already written for you. And if it doesn't exist, it's not hard
| to plumb up.
|
| A friend of mine runs more than 700 containers (yup, seven
| hundreds), split over his own rack at home (half of them) and
| the other half on dedicated servers (he runs stuff like
| FlightRadar, AI models, etc.). He'll soon get his own IP
| addresses space. Complete "chaos monkey" ready infra where you
| can cut any cable and the thing shall keep working: everything
| is duplicated, can be spun up on demand, etc. Someone could
| still his entire rack and all his dedicated server, he'd still
| be back operational in no time.
|
| If an individual can do that, a company, no matter its size,
| can do it too. And arguably 99.9% of all the companies out
| there don't have the need for an infra as powerful as the one
| most homelab enthusiast have.
|
| And another thing: there's even two in-betweens between "cloud"
| and "our own hardware located at our company". First is
| colocating your own hardware but in a datacenter. Second is
| renting dedicated servers from a datacenter.
|
| They're often ready to accept cloud-init directly.
|
| And it's not hard. I'd say learning to configure hypervisors on
| bare metal, then spin VMs from templates, then running
| containers inside the VMs is actually much easier than learning
| all the idiosyncrasies of all the different cloud vendors APIs
| and whatnots.
|
| Funnily enough when the pendulum swung way too far on the
| "cloud all the things" side, those saying at some point we'd
| read story about repatriation were being made fun of.
| sgarland wrote:
| > If an individual can do that, a company, no matter its
| size, can do it too.
|
| Fully agreed. I don't have physical HA - if someone stole my
| rack, I would be SOL - but I can easily ride out a power
| outage for as long as I want to be hauling cans of gasoline
| to my house. The rack's UPS can keep it up at full load for
| at least 30 minutes, and I can get my generator running and
| hooked up in under 10. I've done it multiple times. I can
| lose a single server without issue. My only SPOF is internet,
| and that's only by choice, since I can get both AT&T and
| Spectrum here, and my router supports dual-WAN with auto-
| failover.
|
| > And arguably 99.9% of all the companies out there don't
| have the need for an infra as powerful as the one most
| homelab enthusiast have.
|
| THIS. So many people have no idea how tremendously fast
| computers are, and how much of an impact latency has on
| speed. I've benchmarked my 12-year old Dells against the
| newest and shiniest RDS and Aurora instances on both MySQL
| and Postgres, and the only ones that kept up were the ones
| with local NVMe disks. Mine don't even technically have
| _local_ disks; they're NVMe via Ceph over Infiniband.
|
| Does that scale? Of course not; as soon as you want geo-
| redundant, consistent writes, you _will_ have additional
| latency. But most smaller and medium companies don't _need_
| that.
| dan-robertson wrote:
| Well cloud providers often give more than just VMs in a data
| enter somewhere. You may not be able to find good equivalents
| if you aren't using the cloud. Some third-party products are
| also only available on clouds. How much of a difference those
| things make will depend on what you're trying to do.
|
| I think there are accounting reasons for companies to prefer
| paying opex to run things on the cloud instead of more capex-
| intensive self-hosting, but I don't understand the dynamics
| well.
|
| It's certainly the case that clouds tend to be more expensive
| than self-hosting, even when taking account of the discounts
| that moderately sized customers can get, and some of the
| promises around elastic scaling don't really apply when you are
| bigger.
|
| To some of your other points: the main customers of companies
| like AWS are businesses. Businesses generally don't care about
| the centralisation of the internet. Businesses are capable of
| reading the contracts they are signing and not signing them if
| privacy (or, typically more relevant to businesses, their IP)
| cannot be sufficiently protected. It's not really clear to me
| that using a cloud is going to be less secure than doing things
| on-prem.
| tyingq wrote:
| I think part of it was a way for dev teams to get an infra team
| that was not empowered to say no. Plus organizational theory,
| empire building, etc.
| sgarland wrote:
| Yep. I had someone tell me last week that they didn't want a
| more rigid schema because other teams rely on it, and
| anything adding "friction" to using it would be poorly
| received.
|
| As an industry, we are largely trading correctness and
| performance for convenience, and this is not seen as a
| negative by most. What kills me is that at every cloud-native
| place I've worked at, the infra teams were both responsible
| for maintaining and fixing the infra that product teams
| demanded, but were not empowered to push back on unreasonable
| requests or usage patterns. It's usually not until either the
| limits of vertical scaling are reached, or a SEV0 occurs
| where these decisions were the root cause does leadership
| even begin to consider changes.
| tzs wrote:
| There was a time when cloud was significantly cheaper then
| owning.
|
| I'd expect that there are people who moved to the cloud then,
| and over time started using services offered by their cloud
| provider (e.g., load balancers, secret management, databases,
| storage, backup) instead of running those services themselves
| on virtual machines, and now even if it would be cheaper to run
| everything on owned servers they find it would be too much
| effort to add all those services back to their own servers.
| toomuchtodo wrote:
| The cloud wasn't about cheap, it was about _fast_. If you're
| VC funded, time is everything, and developer velocity above
| all else to hyperscale and exit. That time has passed (ZIRP),
| and the public cloud margin just doesn't make sense when you
| can own and operate (their margin is your opportunity) on
| prem with similar cloud primitives around storage and
| compute.
|
| Elasticity is a component, but has always been from a batch
| job bin packing scheduling perspective, not much new there.
| Before k8s and Nomad, there was Globus.org.
|
| (Infra/DevOps in a previous life at a unicorn, large worker
| cluster for a physics experiment prior, etc; what is old is a
| new again, you're just riding hype cycle waves from junior to
| retirement [mainframe->COTS on prem->cloud->on prem cloud,
| and so on])
| dboreham wrote:
| That was never true except in the case that the required
| hardware resources were significantly smaller than a typical
| physical machine.
| tomrod wrote:
| <ctoHatTime> Dunno man, it's really really easy to set up an S3
| and use it to share datasets for users authorized with IAM....
|
| And IAM and other cloud security and management considerations
| is where the opex/capex and capability argument can start to
| break down. Turns out, the "cloud" savings comes from not
| having capabilities in house to manage hardware. Sometimes, for
| most businesses, you want some of that lovely reliability.
|
| (In short, I agree with you, substantially).
|
| Like code. It is easy to get something basic up, but
| substantially more resources are needed for non-trivial things.
| hamandcheese wrote:
| I feel like IAM may be the sleeper killer-app of cloud.
|
| I self-host a lot of things, but boy oh boy if I were running
| a company it would be a helluvalotta work to get IAM properly
| set up.
| sanderjd wrote:
| I strongly agree with this and also strongly lament it.
|
| I find IAM to be a terrible implementation of a
| foundationally necessary system. It feels tacked on to me,
| except now it's tacked onto thousands of other things and
| there's no way out.
| andrewfromx wrote:
| like terraform! isn't pulumi 100% better but there's no
| way out of terraform.
| pphysch wrote:
| That's essentially why "platform engineering" is a hot
| topic. There are great FOSS tools for this, largely in the
| Kubernetes ecosystem.
|
| To be clear, authentication could still be outsourced, but
| authorizing access to (on-prem) resources in a multi-tenant
| environment is something that "platforms" are frequently
| designed for.
| necovek wrote:
| But isn't _using_ Fastmail akin to using a cloud provider
| (managed email vs managed everything else)? They are similarly
| a service provider, and as a customer, you don 't really care
| "who their ISP is?"
|
| The discussion matters when we are talking about _building_
| things: whether you self-host or use managed services is a set
| of interesting trade-offs.
| citrin_ru wrote:
| Yes, FastMail is a SAAS. But there adepts of a religion which
| would tell you that companies like FastMail should be built
| on top of AWS and it is the only true way. It is good to have
| some counter narrative to this.
| j45 wrote:
| Being cloud compatible (packaged well) can be as important
| as being cloud-agnostic (work on any cloud).
|
| Too many projects become beholden to one cloud.
| UltraSane wrote:
| "All the pro-cloud talking points are just that - talking
| points that don't persuade anyone with any real technical
| understanding,"
|
| This is false. AWS infrastructure is vastly more secure than
| almost all company data centers. AWS has a rule that the same
| person cannot have logical access and physical access to the
| same storage device. Very few companies have enough IT people
| to have this rule. The AWS KMS is vastly more secure than what
| almost all companies are doing. The AWS network is vastly
| better designed and operated than almost all corporate
| networks. AWS S3 is more reliable and scalable than anything
| almost any company could create on their own. To create
| something even close to it you would need to implement
| something like MinIO using 3 separate data centers.
| gooosle wrote:
| <citations needed>
| j45 wrote:
| The cloud is someone else's computer.
|
| It's like putting something in someone's desk drawer under
| the guise of convenience at the expense of security.
|
| Why?
|
| Too often, someone other than the data owner has or can get
| access to the drawer directly or indirectly.
|
| Also, Cloud vs self hosted to me is a pendulum that has swung
| back and forth for a number of reasons.
|
| The benefits of the cloud outlined here are often a lot of
| open source tech packaged up and sold as manageable from a
| web browser, or a command line.
|
| One of the major reasons the cloud became popular was
| networking issues in Linux to manage volume at scale. At the
| time the cloud became very attractive for that reason, plus
| being able to virtualize bare metal servers to put into any
| combination of local to cloud hosting.
|
| Self-hosting has become easier by an order of magnitude or
| two for anyone who knew how to do it, except it's something
| people who haven't done both self-hosting and cloud can
| really discuss.
|
| Cloud has abstracted away the cost of horsepower, and
| converted it to transactions. People are discovering a
| fraction of the horsepower is needed to service their
| workloads than they thought.
|
| At some point the horsepower got way beyond what they needed
| and it wasn't noticed. But paying for a cloud is convenient
| and standardized.
|
| Company data centres can be reasonably secured using a number
| of PaaS or IaaS solutions readily available off the shelf.
| Tools from VMware, Proxmox and others are tremendous.
|
| It may seem like there's a lot to learn, except most problems
| they are new to someone have often been thought of a ton by
| both people with and without experience that is beyond cloud
| only.
| the_arun wrote:
| > The cloud is someone else's computer
|
| Isn't it more like leasing in a public property? Meaning it
| is yours as long as you are paying the lease? Analogous to
| renting an apartment instead of owning a condo?
| adamtulinius wrote:
| Not at all. You can inspect the apartment you rent. The
| cloud is totally opaque in that regard.
| j45 wrote:
| Totally opaque is a really nice way to describe it.
| j45 wrote:
| Nope. It's literally putting private data in a shared
| drawer in someone else's desk where you have your area of
| the drawer.
| jameshart wrote:
| Literally?
|
| I would just like to point out that most of us who have
| ever had a job at an office, attended an academic
| institution, or lived in rented accommodation have kept
| stuff in someone else's desk drawer from time to time.
| Often a leased desk in a building rented from a random
| landlord.
|
| Keeping things in someone else's desk drawer can be
| convenient and offer a sufficient level of privacy for
| many purposes.
|
| And your proposed alternative to using 'someone else's
| desk drawer' is, what, make your own desk?
|
| I guess, since I'm not a carpenter, I can buy a flatpack
| desk from ikea and assemble it and keep my stuff in that.
| I'm not sure that's an improvement to my privacy posture
| in any meaningful sense though.
| j45 wrote:
| It doesn't have to be entirely literal, or not literal at
| all.
|
| A single point of managed/shared access to a drawer
| doesn't fit all levels of data sensitivity and security.
|
| I understand this kind of wording and analogy might be
| triggering for the drive by down voters.
|
| A comment like the above though allows both people to
| openly consider viewpoints that may not be theirs.
|
| For me it shed light on something simpler.
|
| Shared access to shared infrastructure is not always
| secure as we want to tell ourselves. It's important to be
| aware when it might be security through abstraction.
|
| The dual security and convenience of self-hosting IaaS
| and PaaS even at a dev, staging or small scale production
| has improved dramatically, and allows for things to be
| built in a cloud agnostic way to allow switching clouds
| to be much easier. It can also easily build a business
| case to lower cloud costs. Still, it doesn't have to be
| for everyone either, where the cloud turns to be
| everything.
|
| A small example? For a stable homeland - their a couple
| of usff small servers running proxmox or something
| residential fibre behind a tailscale or cloudflare funnel
| and compare the cost for uptime. It's surprising how much
| time servers and apps spend idling.
|
| Life and the real world is more than binary. Be it all
| cloud or no cloud.
| MadnessASAP wrote:
| > Keeping things in someone else's desk drawer can be
| convenient and offer a sufficient level of privacy for
| many purposes.
|
| Too torture a metaphor to death, are you going to keep
| your bank passwords in somebody else's desk drawer? Are
| you going to keep 100 million people's bank passwords in
| that drawer?
|
| > I guess, since I'm not a carpenter, I can buy a
| flatpack desk from ikea and assemble it and keep my stuff
| in that. I'm not sure that's an improvement to my privacy
| posture in any meaningful sense though.
|
| If you're not a carpenter I would recommend you stay out
| of the business of building safe desk drawers all
| together. Although you should probably still be able to
| recognize that the desk drawer you own, that is inside
| your own locked house is a safer option then the one at
| the office accessible by any number of people.
| UltraSane wrote:
| > The cloud is someone else's computer.
|
| And in the case of AWS it is someone else's extremely well
| designed and managed computer and network.
| j45 wrote:
| Generally I look to people who could build an AWS on the
| value of it or doing it themselves because they can do
| both.
|
| Happy to hear more.
| AtlasBarfed wrote:
| One of the ways the NSA and security services get so much
| intelligence on targets isn't by direct decryption of what
| they are storing in data or listening in. A great deal with
| their intelligence is simply metadata intelligence. They
| watch what you do. They watch the amount of data you
| transport. They watch your patterns of movement.
|
| So even if eight of us is providing direct security and
| encryption in the sense of what most security professionals
| are concerned with key strength etc etc etc, Eddie of us
| still has a great deal about of information about what you
| do, because they get to watch how much data moves from
| where to where and other information about what those
| machines are
| fulafel wrote:
| OTOH:
|
| 1. big clouds are very lucrative targets for spooks, your
| data seem pretty likely to be hoovered up as "bycatch" (or
| maybe main catch depending on your luck) by various agencies
| and then traded around as currency
|
| 2. you never hear about security probems (incidents or
| exposure) in the platforms, there's no transparency
|
| 3. better than most coporate stuff is a low bar
| sfilmeyer wrote:
| >3. better than most corporate stuff is a low bar
|
| I think it's a very relevant bar, though. The top level
| commenter made points about "a business of just about any
| size", which seems pretty exactly aligned with "most
| corporate stuff".
| likeabatterycar wrote:
| > you never hear about security probems (incidents or
| exposure) in the platforms
|
| Except that one time...
|
| https://www.seattlemet.com/news-and-city-
| life/2023/04/how-a-...
| noprocrasted wrote:
| If I remember right, the attacker's AWS employment is
| irrelevant - no privileged AWS access was used in that
| case. The attacker working for AWS was a pure
| coincidence, it could've been anyone.
| stefan_ wrote:
| 4. we keep hitting hypervisor bugs and having to work
| around the fact that your software coexists on the same
| machine with 3rdparty untrusted software who might in fact
| be actively trying to attack you. All this silliness with
| encrypted memory buses and the various debilitating
| workarounds for silicon bugs.
|
| So yes, the cloud is very secure, except for the very thing
| that makes it the cloud that is not secure at all and has
| just been papered over because questioning it means the
| business model is bust.
| mardifoufs wrote:
| Most corporations (which is the vast majority of cloud
| users) absolutely don't care about spooks, sadly enough. If
| that's the threat model, then it's a very very rare case to
| care about it. Most datacenters/corporations won't even
| fight or care about sharing data with local
| spooks/cops/three letter agencies. The actual threat is
| data leaks, security breaches, etc.
| nine_k wrote:
| If you don't want your data to be accessible to "various
| agencies", don't share it with corporations, full stop.
| Corporations are obliged by law to make it available to the
| agencies, and the agencies often overreach, while the
| corporations almost never mind the overreach. There are
| limitations for stuff like health or financial data, but
| these are not impenetrable barriers.
|
| I would just consider all your hosted data to be easily
| available to any security-related state agency; consider
| them already having a copy.
| immibis wrote:
| That depends where it's hosted and how it's encrypted.
| Cloud hosts can just reach into your RAM, but dedicated
| server hosts would need to provision that before
| deploying the server, and colocation providers would need
| to take your server offline to install it.
| nine_k wrote:
| Colocated / Dedicated is not Cloud, AFAICT. It's the
| "traditional hosting", not elastic / auto-scalable. You
| of course may put your own, highly tamper-proof boxes in
| a colocation rack, and be reasonably certain that any
| attempt to exfiltrate data from them won't be invisible
| to you.
|
| By doing so, you share nothing with your hosting
| provider, you only rent rack space / power /
| connectivity.
| noprocrasted wrote:
| > AWS infrastructure is vastly more secure than almost all
| company data centers
|
| Secure in what terms? Security is always about a threat model
| and trade-offs. There's no absolute, objective term of
| "security".
|
| > AWS has a rule that the same person cannot have logical
| access and physical access to the same storage device.
|
| Any promises they make aren't worth anything unless there's
| contractually-stipulated damages that AWS should pay in case
| of breach, those damages actually corresponding to the costs
| of said breach for the customer, and a history of actually
| paying out said damages without shenanigans. They've already
| got a track record of lying on their status pages, so it
| doesn't bode well.
|
| But I'm actually wondering what this specific rule even tries
| to defend against? You presumably care about data protection,
| so logical access is what matters. Physical access seems
| completely irrelevant no?
|
| > Very few companies have enough IT people to have this rule
|
| Maybe, but that doesn't actually mitigate anything from the
| company's perspective? The company itself would still be in
| the same position, aka not enough people to reliably separate
| responsibilities. Just that instead of those responsibilities
| being physical, they now happen inside the AWS console.
|
| > The AWS KMS is vastly more secure than what almost all
| companies are doing.
|
| See first point about security. Secure against what - what's
| the threat model you're trying to protect against by using
| KMS?
|
| But I'm not necessarily denying that (at least some) AWS
| services are very good. Question is, is that "goodness"
| required for your use-case, is it enough to overcome its
| associated downsides, and is the overall cost worth it?
|
| A pragmatic approach would be to evaluate every component on
| its merits and fitness to the problem at hand instead of
| going all in, one way or another.
| cyberax wrote:
| > They've already got a track record of lying on their
| status pages, so it doesn't bode well.
|
| ???
| nine_k wrote:
| Physical access is pretty relevant if you could bribe an
| engineer to locate some valuable data's physical location,
| then go service the particular machine, copy the disk
| (during servicing "degraded hardware"), and thus exflitrate
| the data without any traces of a breach.
| Brian_K_White wrote:
| Physical access and logical root access can't hide things
| form each other. It takes both to hide an activity. If you
| only have one, then the other can always be used to uncover
| or detect in the first place, or at least diagnose after.
| Aachen wrote:
| AWS is so complicated, we usually find more impactful
| permission problems than in any company using their own
| hardware
| rmbyrro wrote:
| about security, most businesses using AWS invest little to
| nothing in securing their software, or even adopt basic
| security practices for their employees
|
| having the most secure data center doesn't matter if you load
| your secrets as env vars in a system that can be easily
| compromised by a motivated attacker
|
| so i don't buy this argument as a general reason pro-cloud
| dajonker wrote:
| This exactly, most leaks don't involve any physical access.
| Why bother with something hard when you can just get in
| through an unmaintained Wordpress/SharePoint/other legacy
| product that some department can't live without.
| evantbyrne wrote:
| Making API calls from a VM on shared hardware to KMS is
| vastly more secure than doing AES locally? I'm skeptical to
| say the least.
| UltraSane wrote:
| Encrypting data is easy, securely managing keys is the hard
| part. KMS is the Key Management Service. And AWS put a lot
| of thought and work into it.
|
| https://docs.aws.amazon.com/kms/latest/cryptographic-
| details...
| evantbyrne wrote:
| KMS access is granted by either environment variables or
| by authorizing the instance itself. Either way, if the
| instance is compromised, then so is access to KMS. So
| unless your threat model involves preventing the
| government from looking at your data through some
| theoretical sophisticated physical attack, then your
| primary concerns are likely the same as running a box in
| another physically secure location. So the same rules of
| needing to design your encryption scheme to minimize
| blowout from a complete hostile takeover still apply.
| Xylakant wrote:
| An attacker gaining temporary capability to
| encrypt/decrypt data through a compromised instance is
| painful. An attacker gaining a copy of a private key is
| still an entirely different world of pain.
| evantbyrne wrote:
| Painful is an understatement. Keys for sensitive customer
| data should be derived from customer secrets either way.
| Almost nobody does that though, because it requires
| actual forethought. Instead they just slap secrets in KMS
| and pretend it's better than encrypted environment
| variables or other secrets services. If an attacker can
| read your secrets with the same level of penetration into
| your system, then it's all the same security wise.
| Xylakant wrote:
| There are many kinds of secrets that are used for
| purposes where they cannot be derived from customer
| secrets, and those still need to be secured. TLS private
| keys for example.
|
| I do disagree on the second part - there's a world of a
| difference whether an attacker obtains a copy of your
| certificates private key and can impersonate you quietly
| or whether they gain the capability to perform signing
| operations on your behalf temporarily while they maintain
| access to a compromised instance.
| AtlasBarfed wrote:
| It's now been two years since I used KMS, but at the time
| it seemed little more than S3 API interface with Twitter
| size limitations
|
| Fundamentally why would KMS be more secure than S3
| anyway? Both ultimately have the same fundamental
| security requirements and do the same thing.
|
| So the big whirlydoo is KMS has hardware keygen. im
| sorry, that sounds like something almost guaranteed to
| have nsa backdoor, or has so much nsa attention it has
| been compromised.
| scrose wrote:
| If your threat model is the NSA and you're worried about
| backdoors then don't use any cloud provider?
|
| Maybe I'm just jaded from years doing this, but two
| things have never failed me for bringing me peace of mind
| in the infrastructure/ops world:
|
| 1. Use whatever your company has already committed to.
| Compare options and bring up tradeoffs when committing to
| a cloud-specific service(ie. AWS Lambdas) versus more
| generic solutions around cost, security and maintenance.
|
| 2. Use whatever feels right to you for anything else.
|
| Preventing the NSA from cracking into your system is a
| fun thought exercise, but life is too short to make that
| the focus of all your hosting concerns
| gauravphoenix wrote:
| one of my greatest learnings in life is to differentiate
| between facts and opinions- sometimes opinions are presented
| as facts and vice-versa. if you think about it- the statement
| "this is false" is a response to an opinion (presented as a
| fact) but not a fact. there is no way one can objectively
| define and defend what does "real technical understanding"
| means. the cloud space is vast with millions of people having
| varied understanding and thus opinions.
|
| so let's not fight the battle that will never be won. there
| is no point in convincing pro-cloud people that cloud isn't
| the right choice and vice-versa. let people share stories
| where it made sense and where it didn't.
|
| as someone who has lived in cloud security space since 2009
| (and was founder of redlock - one of the first CSPMs), in my
| opinion, there is no doubt that AWS is indeed superiorly
| designed than most corp. networks- but is that you really
| need? if you run entire corp and LOB apps on aws but have
| poor security practices, will it be right decision? what if
| you have the best security engineers in the world but they
| are best at Cisco type of security - configuring VLANS and
| managing endpoints but are not good at detecting someone
| using IMDSv1 in ec2 exposed to the internet and running a
| vulnerable (to csrf) app?
|
| when the scope of discussion is as vast as cloud vs on-prem,
| imo, it is a bad idea to make absolute statements.
| fulafel wrote:
| Great points. Also if you end up building your apps as rube
| goldberg machines living up to "AWS Well Architected"
| criteria (indoctrinated by staff lots of AWS
| certifications, leading to a lot of AWS certified staff
| whose paycheck now depends on following AWS recommended
| practices) the complexity will kill your security, as
| nobody will understand the systems anymore.
| dehrmann wrote:
| The other part is that when us-east-1 goes down, you can
| blame AWS, and a third of your customer's vendors will be
| doing the same. When you unplug the power to your colo rack
| while installing a new server, that's on you.
| throwawaysxcd0 wrote:
| OTOH, when your company's web site is down you can do
| something about it. When the CEO asks about it, you can
| explain _why_ its offline and more importantly _what is
| being done_ to bring it back.
|
| The equivalent situation for those who took a cloud based
| approach is often... -\\_(tsu)_/-
| szundi wrote:
| Hey boss, I go to sleep now, site should be up anytime.
| Cheers
| Xylakant wrote:
| The more relevant question is whether my efforts to do
| something lead to a better and faster result than my
| cloud providers efforts to do something. I get it - it
| feels powerless to do nothing, but for a lot of
| organizations I've seen the average downtime would still
| be higher.
| lukevp wrote:
| With the cloud, in a lot of cases you can have additional
| regions that incur very little cost as they scale
| dynamically with traffic. It's hard to do that with on-
| prem. Also many AWS services come cross-AZ (AZ is a data
| center), so their arch is more robust than a single Colo
| server even if you're in a single region.
| brandon272 wrote:
| It's not always a full availability zone going down that is
| the problem. Also, despite the "no one ever got fired for
| buying Microsoft" logic, in practice I've never actually
| found stakeholders to be reassured by "its AWS and everyone
| is affected" when things are down. People want things back
| up and they want some informed answers about when that
| might happen, not "ehh its AWS, out of our control".
| wslh wrote:
| From a critical perspective, your comment made me think about
| the risks posed by rogue IT personnel, especially at scale in
| the cloud. For example, Fastmail is a single point of failure
| as a DoS target, whereas attacking an entire datacenter can
| impact multiple clients simultaneously. It all comes down to
| understanding the attack vectors.
| UltraSane wrote:
| Cloud providers are very big targets but have enormous
| economic incentive to be secure and thus have very large
| teams of very competent security experts.
| wslh wrote:
| You can have full security competence but be a rogue
| actor at the same time.
| portaouflop wrote:
| You can also have rogue actors in your company, you don't
| need 3rd parties for that
| wslh wrote:
| That doesn't sum up my comments in the thread. A rogue
| actor in a datacenter could attack zillions of companies
| at the same time while rogue actors in a single company
| only once.
| likeabatterycar wrote:
| AWS hires the same cretins that inhabit every other IT
| department, they just usually happen to be more technically
| capable. That doesn't make them any more or less trustworthy
| or reliable.
| sanderjd wrote:
| > _All the pro-cloud talking points are just that - talking
| points that don 't persuade anyone with any real technical
| understanding, but serve to introduce doubt to non-technical
| people and to trick people who don't examine what they're
| told._
|
| This feels like "no true scotsman" to me. I've been building
| software for close to two decades, but I guess I don't have
| "any real technical understanding" because I think there's a
| compelling case for using "cloud" services for many (honestly I
| would say most) businesses.
|
| Nobody is "afraid to openly discuss how cloud isn't right for
| many things". This is extremely commonly discussed. We're
| discussing it right now! I truly cannot stand this modern
| innovation in discourse of yelling "nobody can talk about XYZ
| thing!" while noisily talking about XYZ thing on the lowest-
| friction publishing platforms ever devised by humanity. Nobody
| is afraid to talk about your thing! People just disagree with
| you about it! That's ok, differing opinions are normal!
|
| Your comment focuses a lot on cost. But that's just not really
| what this is all about. Everyone knows that on a long enough
| timescale with a relatively stable business, the total cost of
| having your own infrastructure is usually lower than cloud
| hosting.
|
| But cost is simply not the only thing businesses care about.
| Many businesses, especially new ones, care more about time to
| market and flexibility. Questions like "how many servers do we
| need? with what specs? and where should we put them?" are a
| giant distraction for a startup, or even for a new product
| inside a mature firm.
|
| Cloud providers provide the service of "don't worry about all
| that, figure it out after you have customers and know what you
| actually need".
|
| It is also true that this (purposefully) creates lock-in that
| is expensive either to leave in place or unwind later, and it
| definitely behooves every company to keep that in mind when
| making architecture decisions, but lots of products never make
| it to that point, and very few of those teams regret the time
| they didn't spend building up their own infrastructure in order
| to save money later.
| mmcwilliams wrote:
| It seems that the preference is less about understanding or
| misunderstanding the technical requirements but more that it
| moves a capital expenditure with some recurring operational
| expenditure entirely into the opex column.
| sanderjd wrote:
| Also, by the way, I found it interesting that you framed your
| side of this disagreement as the technically correct one, but
| then included this:
|
| > _a desire to not centralize the Internet_
|
| This is an ideological stance! I happen to share this desire.
| But you should be aware of your own non-technical - "emotional"
| - biases when dismissing the arguments of others on the grounds
| that they are "emotional" and+l "fanatical".
| johnklos wrote:
| I never said that my own reasons were neither personal nor
| emotional. I was just pointing out that my reasons are easy
| to articulate.
|
| I do think it's more than just emotional, though, but most
| people, even technical people, haven't taken the time to
| truly consider the problems that will likely come with
| centralization. That's a whole separate discussion, though.
| JOnAgain wrote:
| As someone who ran a startup with 100's of hosts. As soon as I
| start to count the salaries, hiring, desk space, etc of the
| people needed to manage the hosts AWS would look cheap again.
| Yea, hardware costs they are aggressively expensive. But TCO
| wise, they're cheap for any decent sized company.
|
| Add in compliance, auditing, etc. all things that you can set
| up out of the box (PCI, HIPPA, lawsuit retention). Gets even
| cheaper.
| browningstreet wrote:
| Most companies severely understaff ops, infra, and security.
| Your talking points might be good but, in practice, won't apply
| in many cases because of the intractability of that management
| mindset. Even when they should know better.
|
| I've worked at _tech_ companies with hundreds of developers and
| single digit ops staff. Those people will struggle to build and
| maintain mature infra. By going cloud, you get access to mature
| infra just by including it in build scripts. Devops is an
| effective way to move infra back to project teams and cut out
| infra orgs (this isn't great but I see it happen everywhere).
| Companies will pay cloud bills but not staffing salaries.
| j45 wrote:
| Using a commercial cloud provider only cements understaffing
| in, in too many cases.
| lelanthran wrote:
| > On the other hand, a business of just about any size that has
| any reasonable amount of hosting is better off with their own
| systems when it comes purely to cost
|
| From a cost PoV, sure, but when you're taking money out of
| capex it represents a big hit to the cash flow, while taking
| out twice that amount from opex has a lower impact on the
| company finances.
| swiftcoder wrote:
| > All the pro-cloud talking points... don't persuade anyone
| with any real technical understanding
|
| This is a very engineer-centric take. The cloud has some big
| advantages that are entirely non-technical:
|
| - You don't need to pay for hardware upfront. This is critical
| for many early-stage startups, who have no real ability to
| predict CapEx until they find product/market fit.
|
| - You have someone else to point the SOC2/HIPAA/etc auditors
| at. For anyone launching a company in a regulated space, being
| able to checkbox your entire infrastructure based on
| AWS/Azure/etc existing certifications is huge.
| shortsunblack wrote:
| You can over-provision your own baremetal resources 20x and
| it will be still cheaper than cloud. The capex talking point
| is just that, a talking point.
| swiftcoder wrote:
| As an early-stage startup?
|
| Your spend in the first year on AWS is going to be very
| close to zero for something like a SaaS shop.
|
| Nor can you possibly scale in-house baremetal fast enough
| if you hit the fabled hockey stick growth. By the time you
| sign a colocation contract and order hardware, your day in
| the sun may be over.
| rakoo wrote:
| > You have someone else to point the SOC2/HIPAA/etc auditors
| at.
|
| I would assume you still need to point auditors to your
| software in any case
| bluedino wrote:
| I want to see an article like this, but written from a Fortune
| 500 CTO perspective
|
| It seems like they all abandoned their VMware farms or physical
| server farms for Azure (they love Microsoft).
|
| Are they actually saving money? Are things faster? How's
| performance? What was the re-training/hiring like?
|
| In one case I know we got rid of our old database greybeards
| and replaced them with "DevOps" people that knew nothing about
| performance etc
|
| And the developers (and many of the admins) we had knew nothing
| about hardware or anything so keeping the physical hardware
| around probably wouldn't have made sense anyways
| ndriscoll wrote:
| Complicating this analysis is that computers have still been
| making exponential improvements in capability as clouds
| became popular (e.g. disks are 1000-10000x faster than they
| were 15 years ago), so you'd naturally expect things to
| become easier to manage over time as you need fewer machines,
| assuming of course that your developers focus on e.g.
| learning how to use a database well instead of how to scale
| to use massive clusters.
|
| That is, even if things became cheaper/faster, they might
| have been even better without cloud infrastructure.
| jrs235 wrote:
| >we got rid of our old database greybeards and replaced them
| with "DevOps" people that knew nothing about performance etc
|
| Seems a lot of those DevOps people just see Azures
| recommendations for adding indexes and either just allow auto
| applying them or just adding them without actually reviewing
| it understanding what use loads require them and why. This
| also lands a bit on developers/product that don't critically
| think about and communicate what queries are common and
| should have some forethought on what indexes should be
| beneficial and created. (Yes followup monitoring of actual
| index usage and possible missing indexes is still needed.)
| Too many times I've seen dozens of indexes on tables in the
| cloud where one could cover all of them. Yes, there still
| might be worthwhile reasons to keep some narrower/smaller
| indexes but again DBA and critical query analysis seems to be
| a forgotten and neglected skill. No one owns monitoring and
| analysing db queries and it only comes up after a fire has
| already broken out.
| awholescammy wrote:
| There is a whole ecosystem that pushes cloud to ignorant/fresh
| graduates/developers. Just take a look at the sponsors for all
| the most popular frameworks. When your system is super complex
| and depends on the cloud they make more money. Just look at the
| PHP ecosystem, Laravel needs 4 times the servers to server
| something that a pure PHP system would need. Most projects
| don't need the cloud. Only around 10% of projects actually need
| what the cloud provides. But they were able to brainwash a
| whole generation of developers/managers to think that they do.
| And so it goes.
| gjsman-1000 wrote:
| Having worked with Laravel, this is absolutely bull.
| irunmyownemail wrote:
| > If I didn't already self-host email, I'd consider using
| Fastmail.
|
| Same sentiment all of what you said.
| slothtrop wrote:
| The bottom line > babysitting hardware. Businesses are
| transitioning to cloud because it's better for business.
| irunmyownemail wrote:
| Actually, there's been a reversal trend going on, for many
| companies, better is often on premises or hybrid now.
| twoparachute45 wrote:
| >What's particularly fascinating to me, though, is how some
| people are so pro-cloud that they'd argue with a writeup like
| this with silly cloud talking points. They don't seem to care
| much about data or facts, just that they love cloud and want
| everyone else to be in cloud, too.
|
| The irony is absolutely dripping off this comment, wow.
|
| Commenter makes emotionally charge comment with no data or
| facts and decries anyone who disagrees with them as "silly
| talking points" for not caring about data and facts.
|
| Your comment is entirely talking about itself.
| dehrmann wrote:
| The real cost wins of self-hosted are that anything using new
| hardware becomes an ordeal, and engineers won't use high-cost,
| value-added services. I agree that there's often too little
| restraint in cloud architectures, but if a business truly
| believes in a project, it shouldn't be held up for six months
| waiting for server budget with engineers spending doing ops
| work to get three nines of DB reliability.
|
| There is a size where self-hosting makes sense, but it's much
| larger than you think.
| mark242 wrote:
| I'm curious about what "reasonable amount of hosting" means to
| you, because from my experience, as your internal network's
| complexity goes up, it's far better for your to move systems to
| a hyperscaler. The current estimate is >90% of Fortune 500
| companies are cloud-based. What is it that you know that they
| don't?
| motorest wrote:
| > All the pro-cloud talking points are just that - talking
| points that don't persuade anyone with any real technical
| understanding,(...)
|
| This is where you lose all credibility.
|
| I'm going to focus on a single aspect: performance. If you're
| serving a global user base and your business, like practically
| all online businesses, is greatly impacted by performance
| problems, the only solution to a physics problem is to deploy
| your application closer to your users.
|
| With any cloud provider that's done with a few clicks and an
| invoice of a few hundred bucks a month. If you're running your
| hardware... What solution do you have to show for? Do you hope
| to create a corporate structure to rent a place to host your
| hardware manned by a dedicated team? What options f you have?
| stefan_ wrote:
| Is everyone running online FPS gaming servers now? If you
| want your page to load faster, tell your shitty frontend
| engineers to use less of the latest frameworks. You are not
| limited by physics, 99% aren't.
|
| I ping HN, it's 150ms away, it still renders in the same time
| that the Google frontpage does and that one has a 130ms
| advantage.
| pixelesque wrote:
| Erm, 99%'s clearly wrong and I think you know it, even if
| you are falling into the typical trap of "only Americans
| matter"...
|
| As someone in New Zealand, latency does really matter
| sometimes, and is painfully obvious at times.
|
| HN's ping for me is around: 330 ms.
|
| Anyway, ping doesn't really describe the latency of the
| full DNS lookup propogation, TCP connection establishment
| and TLS handshake: full responses for HN are around 900 ms
| for me till last byte.
| justsomehnguy wrote:
| > latency does really matter sometimes
|
| Yes, _sometimes_.
|
| You know what matters way more?
|
| If you throw 12MBytes to the client in a multiple
| connections on multiple domains to display 1KByte of
| information. Eg: 'new' Reddit.
| johnklos wrote:
| > This is where you lose all credibility.
|
| People who write that, well...
|
| If you're greatly impacted by performance problems, how does
| that become a physics problem that has as a solution which is
| being closer to your users?
|
| I think you're mixing up your sales points. One, how do you
| scale hardware? Simple: you buy some more, and/or you plan
| for more from the beginning.
|
| How do you deal with network latency for users on the other
| side of the planet? Either you plan for and design for long
| tail networking, and/or you colocate in multiple places,
| and/or you host in multiple places. Being aware of cloud
| costs, problems and limitations doesn't mean you can't or
| shouldn't use cloud at all - it just means to do it where it
| makes sense.
|
| You're making my point for me - you've got emotional
| generalizations ("you lose all credibility"), you're using
| examples that people use often but that don't even go
| together, plus you seem to forget that hardly anyone
| advocates for all one or all the other, without some kind of
| sensible mix. Thank you for making a good example of exactly
| what I'm talking about.
| noprocrasted wrote:
| The complexity of scaling out an application to be closer to
| the users has never been about getting the hardware closer.
| It's always about how do you get the data there and dealing
| with the CAP theorem, which requires hard tradeoffs to be
| decided on when designing the application and can't be just
| tacked on - there is no magic button to do this, in the AWS
| console or otherwise.
|
| Getting the _hardware_ closer to the users has always been
| trivial - call up any of the many hosting providers out there
| and get a dedicated server, or a colo and ship them some
| hardware (directly from the vendor if needed).
| jread wrote:
| If have a global user base, depending on your workload, a
| simple CDN in front of your hardware can often go a long ways
| with minimal cost and complexity.
| motorest wrote:
| > If have a global user base, depending on your workload, a
| simple CDN in front of your hardware can often go a long
| ways with minimal cost and complexity.
|
| Let's squint hard enough to pretend a CDN does not qualify
| as "the cloud". That alone requires a lot of goodwill.
|
| A CDN distributes read-only content. Any usecase that
| requires interacting with a service is automatically
| excluded.
|
| So, no.
| jread wrote:
| > Any usecase that requires interacting with a service is
| automatically excluded
|
| This isn't correct. Many applications consist of a mix of
| static and dynamic content. Even dynamic content is often
| cacheable for a time. All of this can be served by a CDN
| (using TTLs) which is a much simpler and more cost
| effective solution than multi-region cloud infra, with
| the same performance benefits.
| kevin_thibedeau wrote:
| Capital expenditures are kryptonite to financial engineers. The
| cloud selling point was to trade those costs for operational
| expenses and profit in phase 3.
| jandrewrogers wrote:
| This trivializes some real issues.
|
| The biggest problem the cloud solves is hardware supply chain
| management. To realize the full benefits of doing your own
| build at any kind of non-trivial scale you will need to become
| an expert in designing, sourcing, and assembling your hardware.
| Getting hardware delivered when and where you need it is not
| entirely trivial -- components are delayed, bigger customers
| are given priority allocation, etc. The technical parts are
| relatively straightforward; managing hardware vendors,
| logistics, and delivery dates on an ongoing basis is a giant
| time suck. When you use the cloud, you are outsourcing this
| part of the work.
|
| If you do this well and correctly then yes, you will reduce
| costs several-fold. But most people that build their own data
| infrastructure do a half-ass job of it because they
| (understandably) don't want to be bothered with any of these
| details and much of the nominal cost savings evaporate.
|
| Very few companies do security as well as the major cloud
| vendors. This isn't even arguable.
|
| On the other hand, you will need roughly the same number of
| people for operations support whether it is private data
| infrastructure or the cloud, there is little or no savings to
| be had here. The fixed operations people overhead scales to
| such a huge number of servers that it is inconsequential as a
| practical matter.
|
| It also depends on your workload. The types of workloads that
| benefit most from private data infrastructure are large-scale
| data-intensive workloads. If your day-to-day is sling tens or
| hundreds of PB of data for analytics, the economics of private
| data infrastructure is extremely compelling.
| swozey wrote:
| I have about 30 years as a linux eng, starting with openbsd and
| have spent a LOT of time with hardware building webhosts and
| CDNs until about 2020 where my last few roles have been 100%
| aws/gcloud/heroku.
|
| I love building the cool edge network stuff with expensive
| bleeding edge hardware, smartnics, nvmeOF, etc but its
| infinitely more complicated and stressful than terraforming an
| AWS infra. Every cluster I set up I had to interact with
| multiple teams like networking, security, storage sometimes
| maintenance/electrical, etc. You've got some random tech you
| have to rely on across the country in one of your POPs with a
| blown server. Every single hardware infra person has had a NOC
| tech kick/unplug a server at least once if they've been in long
| enough.
|
| And then when I get the hardware sometimes you have different
| people doing different parts of setup, like NOC does the boot,
| maybe boostraps the hardware with something that works over ssh
| before an agent is installed (ansible, etc), then your linux
| eng invokes their magic with a ton of bash or perl, then your
| k8s person sets up the k8s clusters with usually something like
| terraform/puppet/chef/salt probably calling helm charts. Then
| your monitoring person gets it into OTEL/grafana, etc. This all
| organically becomes more automated as time goes on, but I've
| seen it from a brand new infra where you've got no automation
| many times.
|
| Now you're automating 90% of this via scripts and IAC, etc, but
| you're still doing a lot of tedious work.
|
| You also have a much more difficult time hiring good engineers.
| The markets gone so heavily AWS (I'm no help) that its rare
| that I come across an ops resume that's ever touched hardware,
| especially not at the CDN distributed systems level.
|
| So.. aws is the chill infra that stays online and you can
| basically rely on 99.99something%. Get some terraform
| blueprints going and your own developers can self serve. Don't
| need hardware or ops involved.
|
| And none of this is even getting into supporting the clusters.
| Failing clusters. Dealing with maintenance, zero downtime
| kernel upgrades, rollbacks, yaddayadda.
| jhwhite wrote:
| > It makes me wonder: how do people get so sold on a thing that
| they'll go online and fight about it, even when they lack facts
| or often even basic understanding?
|
| I feel like this can be applied to anything.
|
| I had a manager take one SAFe for Leaders class then came back
| wanting to implement it. They had no previous AGILE classes or
| experience. And the Enterprise Agile Office was saying DON'T
| USE SAFe!!
|
| But they had one class and that was the only way they would
| agree to structure their group.
| cookiengineer wrote:
| My take on this whole cloud fatigue is that system maintenance
| got overly complex over the last couple years/decades. So much
| that management people now think that it's too expensive in
| terms of hiring people that can do it compared to the higher
| managed hosting costs.
|
| DevOps and kubernetes come to mind. A lot of people using
| kubernetes don't know what they're getting into, and k0s or
| another single machine solution would have been enough for 99%
| of SMEs.
|
| In terms of cyber security (my field) everything got so
| ridiculously complex that even the folks that use 3 different
| dashboards in parallel will guess the answers as to whether or
| not they're affected by a bug/RCE/security flaw/weakness
| because all of the data sources (even the expensively paid for
| ones) are human-edited text databases. They're so buggy that
| they even have Chinese idiom symbols instead of a dot character
| in the version fields without anyone ever fixing it upstream in
| the NVD/CVE process.
|
| I started to build my EDR agent for POSIX systems specifically,
| because I hope that at some point this can help companies to
| ditch the cloud and allows them to selfhost again - which in
| return would indirectly prevent 13 year old kids like from
| LAPSUS to pwn major infrastructure via simple tech support
| hotline calls.
|
| When I think of it in terms of hosting, the vertical
| scalability of EPYC machines is so high that most of the time
| when you need its resources you are either doing something
| completely wrong and you should refactor your code or you are a
| video streaming service.
| cyberax wrote:
| > The whole push to the cloud has always fascinated me. I get
| it - most people aren't interested in babysitting their own
| hardware.
|
| For businesses, it's a very typical lease-or-own decision.
| There's really nothing too special about cloud.
|
| > On the other hand, a business of just about any size that has
| any reasonable amount of hosting is better off with their own
| systems when it comes purely to cost.
|
| Nope. Not if you factor-in 24/7 support, geographic redundancy,
| and uptime guarantees. With EC2 you can break even at about
| $2-5m a year of cloud spending if you want your own hardware.
| hnthrowaway6543 wrote:
| > a desire to not centralize the Internet
|
| > If I didn't already self-host email
|
| this really says all that needs to be said about your
| perspective. you have an engineer and OSS advocate's mindset.
| which is fine, but most business leaders (including technical
| leaders like CTOs) have a business mindset, and their goal is
| to build a business that makes money, not avoid contributing to
| the centralization of the internet
| ants_everywhere wrote:
| ...but your post reads like you _do_ have an emotional reaction
| to this question and you 're ready to believe someone who
| shares your views.
|
| There's not nearly enough in here to make a judgment about
| things like security or privacy. They have the bare minimum
| encryption enabled. That's better than nothing. But how is key
| access handled? Can they recover your email if the entire
| cluster goes down? If so, then someone has access to the
| encryption keys. If not, then how do they meet reliability
| guarantees?
|
| Three letter agencies and cyber spies like to own switches and
| firewalls with zero days. What hardware are they using, and how
| do they mitigate against backdoors? If you really cared about
| this you would have to roll your own networking hardware down
| to the chips. Some companies do this, but you need to have a
| whole lot of servers to make it economical.
|
| It's really about trade-offs. I think the big trade-offs
| favoring staying off cloud are cost (in some applications),
| distrust of the cloud providers,and avoiding the US Government.
|
| The last two are arguably judgment calls that have some
| inherent emotional content. The first is calculable in
| principle, but people may not be using the same metrics. For
| example if you don't care that much about security breaches or
| you don't have to provide top tier reliability, then you can
| save a ton of money. But if you do have to provide those
| guarantees, it would be hard to beat Cloud prices.
| fnord77 wrote:
| capex vs opex
| ttul wrote:
| My firm belief after building a service at scale (tens of
| millions of end users, > 100K tps) is that AWS is unbeatable.
| We don't even think about building our own infrastructure.
| There's no way we could ever make it reliable enough, secure
| enough, and future-proof enough to ever pay back the cost
| difference.
|
| Something people neglect to mention when they tout their home
| grown cloud is that AWS spends significant cycles constantly
| eliminating technical debt that would absolutely destroy most
| companies - even ones with billion dollar services of their
| own. The things you rely on are constantly evolving and
| changing. It's hard enough to keep up at the high level of a
| SaaS built on top of someone else's bulletproof cloud. But
| imagine also having to keep up with the low level stuff like
| networking and storage tech?
|
| No thanks.
| RainyDayTmrw wrote:
| I hear this debate repeated often, and I think there's another
| important factor. It took me some time to figure out how to
| explain it, and the best I came up with was this: It is
| extremely difficult to bootstrap from zero to baseline
| competence, in general, and especially in an existing
| organization.
|
| In particular, there is a limit to paying for competence, and
| paying more money doesn't automatically get you more
| competence, which is especially perilous if your organization
| lacks the competence to judge competence. In the limit case,
| this gets you the Big N consultancies like PWC or EY. It's
| entirely reasonable to hire PWC or EY to run your accounting or
| compliance. Hiring PWC or EY to run your software development
| lifecycle is almost guaranteed doom, and there is no shortage
| of stories on this site to support that.
|
| In comparison, if you're one of these organizations, who don't
| yet have baseline competence in technology, then what the
| public cloud is selling is nothing short of magical: You pay
| money, and, in return, you receive a baseline set of tools,
| which all do more or less what they say they will do. If no
| amount of money would let you bootstrap this competence
| internally, you'd be much more willing to pay a premium for it.
|
| As an anecdote, my much younger self worked in mid-sized tech
| team in a large household brand in a legacy industry. We were
| building out a web product that, for product reasons, had
| surprisingly high uptime and scalability requirements, relative
| to legacy industry standards. We leaned heavily on public cloud
| and CDNs. We used a lot of S3 and SQS, which allowed us to
| build systems with strong reliability characteristics, despite
| none of us having that background at the time.
| ksec wrote:
| Even as an Anti-Cloud ( Or more accurately Anti-everything
| Cloud ) person I still think there are many benefits to cloud.
| Just most of the them are over sold and people dont need it.
|
| Number one is company bureaucracy and politics. No one wants to
| beg another person or department, go on endless meetings just
| to have extra hardware provisioned. For engineers that alone is
| worth perhaps 99% of all current cloud margins.
|
| Number two is also company bureaucracy and politics. CFOs dont
| like CapX. Turning it into OpeX makes things easier for them.
| Along with end of year company budget turning into Cloud
| credits for different departments. Especially for companies
| with government fundings.
|
| Number three is really company bureaucracy and politics.
| Dealing with either Google, AWS and Microsoft meant you no
| longer have to deal with dozens of different vendors from on
| server, networking hardware, software licenses etc. Instead it
| is all pre-approved into AWS, GCP or Azure. This is especially
| useful for things that involves Government contracts or
| fundings.
|
| There are also things like instant worldwide deployment. You
| can have things up and running in any regions within seconds.
| And useful when you have site that gets 10 to 1000x the normal
| traffic from time to time.
|
| But then a lot of small business dont have these sort of
| issues. Especially non-consumer facing services. Business or
| SaaS are highly unlikely to get 10x more customers within short
| period of time.
|
| I continue to wish there is a middle ground somewhere. You rent
| dedicated server for cheap as base load and use cloud for
| everything else.
| tiffanyh wrote:
| FYI - Fastmail web client has Offline support in beta right now.
|
| https://www.fastmail.com/blog/offline-in-beta/
| ForHackernews wrote:
| Very confused by this. What is in beta? I've had "offline"
| email access for 25 years. It's called an IMAP client.
| mdaniel wrote:
| And if anyone is curious, I actually live on their
| https://betaapp.fastmail.com release and find it just as stable
| as the "mainline" one but with the advantage of getting to play
| with all the cool toys earlier. Bonus points (for me) in that
| they will periodically conduct surveys to see how you like
| things
| DarkCrusader2 wrote:
| I have seen a common sentiment that self hosting is almost always
| better than cloud. What these discussions does not mention is how
| to effectively run your business applications on this
| infrastructure.
|
| Things like identity management (AAD/IAM), provisioning and
| running VMs, deployments. Network side of things like VNet, DNS,
| securely opening ports etc. Monitoring setup across the stack.
| There is so much functionalities that will be required to safely
| expose an application externally that I can't even coherently
| list them out here. Are people just using Saas for everything
| (which I think will defeat the purpose of on-prem infra) or a
| competent Sys admin can handle all this to give a cloud like
| experience for end developers?
|
| Can someone share their experience or share any write ups on this
| topic?
|
| For more context, I worked at a very large hedge fund briefly
| which had a small DC worth of VERY beefy machines but absolutely
| no platform on top of it. Hosting application was done by copying
| the binaries on a particular well known machine and running npm
| commands and restarting nginx. Log a ticket with sys admin to
| create a DNS entry to point a reserve and point a internal DNS to
| this machine (no load balancer). Deployment was a shell script
| which rcp new binaries and restarts nginx. No monitoring or
| observability stack. There was a script which will log you into a
| random machine for you to run your workloads (be ready to get
| angry IMs from more senior quants running their workload in that
| random machine if your development build takes up enough
| resources to effect their work). I can go on and on but I think
| you get the idea.
| noprocrasted wrote:
| > identity management (AAD/IAM)
|
| Do you mean for administrative access to the machines (over
| SSH, etc) or for "normal" access to the hosted applications?
|
| Admin access: Ansible-managed set of UNIX users & associated
| SSH public keys, combined with remote logging so every access
| is audited and a malicious operator wiping the machine can't
| cover their tracks will generally get you pretty far. Beyond
| that, there are commercial solutions like Teleport which
| provide integration with an IdP, management web UI, session
| logging & replay, etc.
|
| Normal line-of-business access: this would be managed by
| whatever application you're running, not much different to the
| cloud. But if your application isn't auth-aware or is unsafe to
| expose to the wider internet, you can stick it behind various
| auth proxies such as Pomerium - it will effectively handle auth
| against an IdP and only pass through traffic to the underlying
| app once the user is authenticated. This is also useful for
| isolating potentially vulnerable apps.
|
| > provisioning and running VMs
|
| Provisioning: once a VM (or even a physical server) is up and
| running enough to be SSH'd into, you should have a
| configuration management tool (Ansible, etc) apply whatever
| configuration you want. This would generally involve
| provisioning users, disabling some stupid defaults (SSH
| password authentication, etc), installing required packages,
| etc.
|
| To get a VM to an SSH'able state in the first place, you can
| configure your hypervisor to pass through "user data" which
| will be picked up by something like cloud-init (integrated by
| most distros) and interpreted at first boot - this allows you
| to do things like include an initial SSH key, create a user,
| etc.
|
| To run VMs on self-managed hardware: libvirt, proxmox in the
| Linux world. bhyve in the BSD world. Unfortunately most of
| these have rough edges, so commercial solutions there are worth
| exploring. Alternatively, consider if you actually _need_ VMs
| or if things like containers (which have much nicer tooling and
| a better performance profile) would fit your use-case.
|
| > deployments
|
| Depends on your application. But let's assume it can fit in a
| container - there's nothing wrong with a systemd service that
| just reads a container image reference in /etc/... and uses
| `docker run` to run it. Your deployment task can just SSH into
| the server, update that reference in /etc/ and bounce the
| service. Evaluate Kamal which is a slightly fancier version of
| the above. Need more? Explore cluster managers like Hashicorp
| Nomad or even Kubernetes.
|
| > Network side of things like VNet
|
| Wireguard tunnels set up (by your config management tool)
| between your machines, which will appear as standard network
| interfaces with their own (typically non-publicly-routable) IP
| addresses, and anything sent over them will transparently be
| encrypted.
|
| > DNS
|
| Generally very little reason not to outsource that to a cloud
| provider or even your (reputable!) domain registrar. DNS is
| mostly static data though, which also means if you do need to
| do it in-house for whatever reason, it's just a matter of
| getting a CoreDNS/etc container running on multiple machines
| (maybe even distributed across the world). But really, there's
| no reason not to outsource that and hosted offerings are super
| cheap - so go open an AWS account and configure Route53.
|
| > securely opening ports
|
| To begin with, you shouldn't have anything listening that you
| don't want to be accessible. Then it's not a matter of
| "opening" or closing ports - the only ports that actually
| listen are the ones you _want_ open by definition because it 's
| your application listening for outside traffic. But you can
| configure iptables/nftables as a second layer of defense, in
| case you accidentally start something that unexpectedly exposes
| some control socket you're not aware of.
|
| > Monitoring setup across the stack
|
| collectd running on each machine (deployed by your
| configuration management tool) sending metrics to a central
| machine. That machine runs Grafana/etc. You can also explore
| "modern" stuff that the cool kids play with nowadays like
| VictoriaMetrics, etc, but metrics is mostly a solved problem so
| there's nothing wrong with using old tools if they work and fit
| your needs.
|
| For logs, configure rsyslogd to log to a central machine - on
| that one, you can have log rotation. Or look into an ELK stack.
| Or use a hosted service - again nothing prevents you from
| picking the best of cloud _and_ bare-metal, it 's not one or
| the other.
|
| > safely expose an application externally
|
| There's a lot of snake oil and fear-mongering around this.
| First off, you need to differentiate between vulnerabilities of
| your application and vulnerabilities of the underlying
| infrastructure/host system/etc.
|
| App vulnerabilities, in your code or dependencies: cloud won't
| save you. It runs your application just like it's been told. If
| your app has an SQL injection vuln or one of your dependencies
| has an RCE, you're screwed either way. To manage this you'd do
| the same as you do in cloud - code reviews, pentesting,
| monitoring & keeping dependencies up to date, etc.
|
| Infrastructure-level vulnerabilities: cloud providers are
| responsible for keeping the host OS and their provided services
| (load balancers, etc) up to date and secure. You can do the
| same. Some distros provide unattended updates (which your
| config management tool) can enable. Stuff that doesn't need to
| be reachable from the internet shouldn't be (bind internal
| stuff to your Wireguard interfaces). Put admin stuff behind
| some strong auth - TLS client certificates are the gold
| standard but have management overheads. Otherwise, use an IdP-
| aware proxy (like mentioned above). Don't always trust app-
| level auth. Beyond that, it's the usual - common sense,
| monitoring for "spooky action at a distance", and luck. Not too
| much different from your cloud provider, because they won't
| compensate you either if they do get hacked.
|
| > For more context, I worked at a very large hedge fund briefly
| which had a small DC worth of VERY beefy machines but
| absolutely no platform on top of it...
|
| Nomad or Kubernetes.
| rtfusgihkuj wrote:
| No, using Ansible to distribute public keys does _not_ get
| you very far. It 's fine for a personal project or even a
| team of 5-6 with a handful, but beyond that you really need a
| better way to onboard, offboard, and modify accounts. If
| you're doing anything but a toy project, you're better off
| starting off with something like IPA for host access
| controls.
| noprocrasted wrote:
| What's the risk you're trying to protect against, that a
| "better" (which one?) way would mitigate that this one
| wouldn't?
|
| > IPA
|
| Do you mean https://en.wikipedia.org/wiki/FreeIPA ? That
| seems like a huge amalgamation of complexity in a non-
| memory-safe language that I feel like would introduce a
| much bigger security liability than the problem it's trying
| to solve.
|
| I'd rather pony up the money and use Teleport at that
| point.
| dpe82 wrote:
| It's basically Kerberos and an LDAP server, which are
| technologies old and reliable as dirt.
|
| This sort of FUD is why people needlessly spend so much
| money on cloud.
| noprocrasted wrote:
| > which are technologies old and reliable as dirt.
|
| Technologies, sure. Implementations? Not so much.
|
| I can trust OpenSSH because it's deployed everywhere and
| I can be confident all the low-hanging fruits are gone by
| now, and if not, its widespreadness means I'm unlikely to
| be the most interesting target, so I am more likely to
| escape a potential zero-day unscathed.
|
| What't the marketshare of IPA in comparison? Has it seen
| any meaningful action in the last decade years, and the
| same attention, from both white-hats (audits, pentesting,
| etc) as well as black-hats (trying to break into every
| exposed service)? I very much doubt it, so the safe thing
| to assume is that it's nowhere as bulletproof as OpenSSH
| and that it's more likely for a dedicated attacker to
| find a vuln there.
| dpe82 wrote:
| MIT's Kerberos 5 implementation is 30 years old and has
| been _very_ widely deployed.
| xorcist wrote:
| Why do think that? I did something similar at a previous
| work for something bordering on 1k employees.
|
| User administration was done by modifying a yaml file in
| git. Nothing bad to say about it really. It sure beats
| point-and-click Active Directory any day of the week.
| Commit log handy for audits.
|
| If there are no externalities demanding anything else, I'd
| happily do it again.
| kasey_junk wrote:
| There is nothing _wrong_ with it, and so long as you can
| prove that your offboarding is consistent and quick then
| feel free to use it.
|
| But a central system that uses the same identity/auth
| everywhere is much easier to keep consistent and fast.
| That's why auditors and security professionals will harp
| on idp/sso solutions as some of the first things to
| invest in.
| xorcist wrote:
| I found that the commit log made auditing on- and
| offboarding easier, not harder. Of course it won't help
| you if your process is dysfunctional. You still have to
| trigger the process somehow, which can be a problem in
| itself when growing from a startup, but once you do that
| it's smooth.
|
| However git _is_ a central system, a database if you
| will, where you can keep identities globally consistent.
| That 's the whole point. In my experience, the reason
| people leave it is because you grow the need to
| interoperate with third party stuff which only supports
| AD or Okta or something. Should I get to grow past that
| phase myself I would feed my chosen IdM with that data
| instead.
| briHass wrote:
| The biggest win with running your own infra is disk/IO speeds, as
| noted here and in DHH's series on leaving cloud
| (https://world.hey.com/dhh/we-have-left-the-cloud-251760fb)
|
| The cloud providers really kill you on IO for your VMs. Even if
| 'remote' SSDs are available with configurable ($$) IOPs/bandwidth
| limits, the size of your VM usually dictates a pitiful max IO/BW
| limit. In Azure, something like a 4-core 16GB RAM VM will be
| limited to 150MB/s across all attached disks. For most hosting
| tasks, you're going to hit that limit far before you max out '4
| cores' of a modern CPU or 16GB of RAM.
|
| On the other hand, if you buy a server from Dell and run your own
| hypervisor, you get a massive reserve of IO, especially with
| modern SSDs. Sure, you have to share it between your VMs, but you
| own all of the IO of the hardware, not some pathetic slice of it
| like in the cloud.
|
| As is always said in these discussions, unless you're able to
| move your workload to PaaS offerings in the cloud (serverless),
| you're not taking advantage of what large public clouds are good
| at.
| noprocrasted wrote:
| Biggest issue isn't even sequential speed but latency. In the
| cloud all persistent storage is networked and has significantly
| more latency than direct-attached disks. This is a physical
| (speed of light) limit, you can't pay your way out of it, or
| throw more CPU at it. This has a huge impact for certain
| workloads like relational databases.
| sgarland wrote:
| Yep. This is why my 12-year old Dell R620s with Ceph on NVMe
| via Infiniband outperform the newest RDS and Aurora
| instances: the disk latency is measured in microseconds.
| Locally attached is of course even faster.
| briHass wrote:
| I ran into this directly trying to use Azure's SMB as a
| service offering (Azure Files) for a file-based DB. It
| currently runs on a network share on-prem, but moving it to
| an Azure VM using that service killed performance. SMB is
| chatty as it is, and the latency of tons of small file IO was
| horrendous.
|
| Interestingly, creating a file share VM deployed in the same
| proximity group has acceptable latency.
| Axsuul wrote:
| Anyone know what are some good data centers or providers to host
| your bare metal servers?
| klysm wrote:
| You're probably looking for the term "colo"
| nisa wrote:
| Love this article and I'm also running some stuff on old
| enterprise servers in some racks somehwere. Now over the last
| year I've had to dive into Azure Cloud as we have customers using
| this (b2b company) and I finally understood why everyone is doing
| cloud despite the price:
|
| Global permissions, seamless organization and IaC. If you are
| Fastmail or a small startup - go buy some used dell poweredge
| with epycs in some Colo rack with 10Gbe transit and save tons of
| money.
|
| If you are a company with tons of customers, ton's of
| requirements it's powerful to put each concern into a landing
| zone, run some bicep/terraform - have a ressource group to
| control costs and get savings on overall core-count and be done
| with it.
|
| Assign permissions into a namespace for your employe or customer
| - have some back and forth about requirements and it's done. No
| need to sysadmin across servers. No need to check for broken
| disks.
|
| I'm also blaming the hell of vmware and virtual machines for
| everything that is a PITA to maintain as a sysadmin but is loved
| because it's common knowledge. I would only do k8s on bare-metal
| today and skip the whole virtualization thing completly. I guess
| it's also these pains that are softened in the cloud.
| akpa1 wrote:
| The fact that Fastmail work like this, are transparent about what
| they're up to and how they're storing my email and the fact that
| they're making logical decisions and have been doing so for quite
| a long time is exactly the reason I practically trip over myself
| to pay them for my email. Big fan of Fastmail.
| xyst wrote:
| They are also active in contributing to cyrus-imap
| pammf wrote:
| Cost isn't always the most important metric. If that was the
| case, people would always buy the cheapest option of everything.
| veidr wrote:
| "WHY we use our own hardware..."
|
| The why is is the interesting part of this article.
| veidr wrote:
| I take that back; _this_ is (to me)t he most interesting part:
|
| "Although we've only ever used datacenter class SSDs and HDDs
| failures and replacements every few weeks were a regular
| occurrence on the old fleet of servers. Over the last 3+ years,
| we've only seen a couple of SSD failures in total across the
| entire upgraded fleet of servers. This is easily less than one
| tenth the failure rate we used to have with HDDs."
| indulona wrote:
| I am working on a personal project(some would call it startup,
| but i have no intention of getting external financing and other
| americanisms) where i have set up my own cdn and video encoding,
| among other things. These days, whenever you have a problem,
| everyone answers "just use cloud" and that results in people
| really knowing nothing any more. It is saddening. But on the
| other hand it ensures all my decades of knowledge will be very
| well paid in the future, if i'd need to get a job.
| Beijinger wrote:
| I was told Fastmail is excellent, and I am not a big fan of
| gmail. Once locked out for good in gmail, your email and apps
| associated with it, are gone forever. Source? Personal
| experience.
|
| "A private inbox $60 for 12 months". I assume it is USD, not AU$
| (AFAIK, Fastmail is based in Australia.) Still pricey.
|
| At https://www.infomaniak.com/ I can buy email service for an (in
| my case external) domain for 18 Euro a year and I get 5 inboxes.
| And it is based in Switzerland, so no EU or US jurisdiction.
|
| I have a few websites and fastmail would just be prohibitive
| expensive for me.
| qingcharles wrote:
| You can have as many domains as you want for free in your
| Fastmail account. There are no extra fees.
|
| I've used them for 20 years now. Highly recommended.
| steve_adams_86 wrote:
| Wait, really? I pay for two separate domains. What am I
| missing?
|
| I'm happy to pay them because I love the service (and it's
| convenient for taxes), but I feel like I should know how to
| configure multiple domains under one account.
| xerp2914 wrote:
| Under Settings => Domains you can add additional domains.
| If you use Fastmail as domain registrar you have to pay for
| each additional domain, of course.
| mariusor wrote:
| My suggestion would be to try Purelymail. They don't offer much
| in the way of a web interface to email, but if you bring your
| own client, it's a very good provider.
|
| I'm paying something like $10 per year for multiple domains
| with multiple email addresses (though with little traffic).
| I've been using them for about 5 years and I had absolutely no
| issues.
| aquariusDue wrote:
| Personally I prefer Migadu and tend to recommend them to tech
| savvy people. Their admin panel is excellent and
| straightforward to use, prices are based on usage limits
| (amount of emails sent/received) instead of number of
| mailboxes.
|
| Migadu is just all around good, only downsides I can find are
| subjective. The fact that they're based in Switzerland and
| unless you're "good with computers" something like Fastmail
| will probably be better.
| Amfy wrote:
| Seems Migadu is hosted on OVH though? Huge red flag.. no
| control over infrastructure (think of Hetzner shutting down
| customers with little to no warning)
| throw0101b wrote:
| > _So after the success of our initial testing, we decided to go
| all in on ZFS for all our large data storage needs. We've now
| been using ZFS for all our email servers for over 3 years and
| have been very happy with it. We've also moved over all our
| database, log and backup servers to using ZFS on NVMe SSDs as
| well with equally good results._
|
| If you're looking at ZFS on NVMe you may want to look at Alan
| Jude's talk on the topic, "Scaling ZFS for the future", from the
| 2024 OpenZFS User and Developer Summit:
|
| * https://www.youtube.com/watch?v=wA6hL4opG4I
|
| * https://openzfs.org/wiki/OpenZFS_Developer_Summit_2024
|
| There are some bottlenecks that get in the way of getting all the
| performance that the hardware often is capable of.
| rmbyrro wrote:
| if you don't have high bandwidth requirements, like for
| background / batch processing, the ovh eco family [1] of bare
| metal servers is incredibly cheap
|
| [1] https://eco.ovhcloud.com/en/
| xiande04 wrote:
| Aside: Fastmail was the best email provider I ever used. The
| interface was intuitive and responsive, both on mobile and web.
| They have extensive documentation for everything. I was able to
| set up a custom domain and and a catch-all email address in a few
| minutes. Customer support is great, too. I emailed them about an
| issue and they responded within the hour (turns out it was my
| fault). I feel like it's a really mature product/company and they
| really know what they're doing, and have a plan for where they're
| going.
|
| I ended up switching to Protonmail, because of privacy (Fastmail
| is within the Five Eyes (Australia)), which is the only thing I
| really like about Protonmail. But I'm considering switching back
| to Fastmail, because I liked it so much.
| gausswho wrote:
| I also chose Proton for the same reason. It hurts that their
| product development is glacial but that's a crucial component
| that I don't understand why Fastmail doesn't try to offer.
| kevin_thibedeau wrote:
| Their Android client has been less than stellar in the past but
| recent releases are significantly improved. Uploading files, in
| particular, was a crapshoot.
| dorongrinstein wrote:
| We at Control Plane (https://cpln.com) make it easy to repatriate
| from the cloud, yet leverage the union of all the services
| provided by AWS, GCP and Azure. Many of our customers moved from
| cloud A to cloud B, and often to their own colocation cage, and
| in one case their own home cluster. Check out
| https://repatriate.cloud
| 0xbadcafebee wrote:
| I've been doing this job for almost as long as they have. I work
| with companies that do on-prem, and I work with companies in the
| cloud, and both. Here's the low down:
|
| 1. The cost of the server is not the cost of on-prem. There are
| so many different _kinds_ of costs that aren 't just monetary.
| ("we have to do more ourselves, including _planning, choosing,
| buying, installing, etc,_ ") Those are tasks that require
| expertise (which 99% of "engineers" do not possess at more than a
| junior level), and time, and staff, and correct execution. They
| are much more expensive than you will ever imagine. Doing any of
| them wrong will causes issues that will eventually cost you
| business (customers fleeing, avoiding). That's much worse than a
| line-item cost.
|
| 2. You have to develop relationships for good on-prem. In order
| to get good service in your rack (assuming you don't hire your
| own cage monkey), in order to get good repair people for your
| hardware service accounts, in order to ensure when you order a
| server that it'll actually arrive, in order to ensure the DC
| won't fuck up the power or cooling or network, etc. This is not
| something you can just read reviews on. You have to actually
| physically and over time develop these relationships, or you will
| suffer.
|
| 3. What kind of load you have and how you maintain your gear is
| what makes a difference between being able to use one server for
| 10 years, and needing to buy 1 server every year. For some use
| cases it makes sense, for some it really doesn't.
|
| 4. Look at all the complex details mentioned in this article.
| These people go _deep_ , building loads of technical expertise at
| the OS level, hardware level, and DC level. It takes a long time
| to build that expertise, and you usually cannot just hire for it,
| because it's generally hard to find. This company is very unique
| (hell, their stack is based on Perl). Your company won't be that
| unique, and you won't have their expertise.
|
| 5. If you hire someone who actually knows the cloud really well,
| and they build out your cloud env based on published well-
| architected standards, you gain not only the benefits of rock-
| solid hardware management, but benefits in security, reliability,
| software updates, automation, and tons of unique features like
| added replication, consistency, availability. You get a lot more
| for your money than just "managed hardware", things that you
| literally could never do yourself without 100 million dollars and
| five years, but you only pay a few bucks for it. The _value_ in
| the cloud is insane.
|
| 6. Everyone does cloud costs wrong the first time. If you hire
| somebody who does have cloud expertise (who hopefully did the
| well-architected buildout above), they can save you 75% off your
| bill, by default, with nothing more complex than checking a box
| and paying some money up front (the same way you would for your
| on-prem server fleet). Or they can use spot instances, or
| serverless. If you choose software developers who care about
| efficiency, they too can help you save money by not needing to
| over-allocate resources, and right-sizing existing ones.
| (Remember: you'd be doing this cost and resource optimization
| already with on-prem to make sure you don't waste those servers
| you bought, and that you know how many to buy and when)
|
| 7. The major takeaway at the end of the article is _" when you
| have the experience and the knowledge"_. If you don't, then
| attempting on-prem can end calamitously. I have seen it several
| times. In fact, just one week ago, a business I work for had
| _three days of downtime_ , due to hardware failing, and not being
| able to recover it, their backup hardware failing, and there
| being no way to get new gear in quickly. Another business I
| worked for literally hired and fired four separate teams to build
| an on-prem OpenStack cluster, and it was the most unstable,
| terrible computing platform I've used, that constantly caused
| service outages for a large-scale distributed system.
|
| If you're not 100% positive you have the expertise, just don't do
| it.
| herf wrote:
| zfs encryption is still corrupting datasets when using zfs
| send/receive for backup (huge win for mail datasets), would be
| cautious about using it in production:
|
| https://github.com/openzfs/zfs/issues/12014
| klysm wrote:
| I'll never use ZFS in production after I was on a team that
| used it at petabyte scale. It's too complex and tries to solve
| problems that should be solved at higher layers.
| TheFlyingFish wrote:
| Lots of people here mentioning reasons to both use and avoid the
| cloud. I'll just chip in one more on the pro-cloud side:
| reliability at low scale.
|
| To expand: At $dayjob we use AWS, and we have no plans to switch
| because we're _tiny_ , like ~5000 DAU last I checked. Our AWS
| bill is <$600/mo. To get anything remotely resembling the
| reliability that AWS gives us we would need to spend tens of
| thousands up-front buying hardware, then something approximating
| our current AWS bill for colocation services. Or we could host
| fully on-prem, but then we're paying even more up-front for site-
| level stuff like backup generators and network multihoming.
|
| Meanwhile, RDS (for example) has given us something like one
| unexplained 15-minute outage in the last six years.
|
| Obviously every situation is unique, and what works for one won't
| work for another. We have no expectation of ever having to
| suddenly 10x our scale, for instance, because we our growth is
| limited by other factors. But at our scale, given our business
| realities, I'm convinced that the cloud is the best option.
| jjeaff wrote:
| This is a common false dichotomy I see constantly. Cloud vs,
| buy and build your own hardware from scratch and colocate/build
| own datacenter.
|
| Very few non-cloud users are buying their own hardware. You can
| simply rent dedicated hardware in a datacenter. For
| significantly cheaper than anything in the cloud. That being
| said, certain things like object storage, if you don't need
| very large amounts of data, are very handy and inexpensive from
| cloud services considering the redundancy and uptime they
| offer.
| ttul wrote:
| This works even at $1M/mo AWS spend. As you scale, the
| discounts get better. You get into the range of special pricing
| where they will make it work against your P&L. If you're
| venture funded, they have a special arm that can do backflips
| for you.
|
| I should note that Microsoft also does this.
| kayson wrote:
| Any ideas how they manage the ZFS encryption key? I've always
| wondered what you'd do in an enterprise production setting.
| Typing the password in at a prompt as any seem scalable (but
| maybe they have few enough servers that it's manageable) and
| keeping it in a file on disk or on removable storage would seem
| to defeat the purpose...
| ttul wrote:
| I think mailbox hosting is a special use case. The primary cost
| is storage and bandwidth and you can indeed do better on storage
| and bandwidth than what Amazon offers. That being said, if
| Fastmail asked Amazon for special pricing to make the move, they
| would get it.
| jph00 wrote:
| The original answer to "why does FastMail use their own hardware"
| is that when I started the company in 1999 there weren't many
| options. I actually originally used a single bare metal server at
| Rackspace, which at that time was a small scrappy startup. IIRC
| it cost $70/month. There weren't really practical VPS or SaaS
| alternatives back then for what I needed.
|
| Rob (the author of the linked article) joined a few months later,
| and when we got too big for our Rackspace server, we looked at
| the cost of buying something and doing colo instead. The biggest
| challenge was trying to convince a vendor to let me use my
| Australian credit card but ship the server to a US address (we
| decided to use NYI for colo, based in NY). It turned out that IBM
| were able to do that, so they got our business. Both IBM and NYI
| were great for handling remote hands and hardware issues, which
| obviously we couldn't do from Australia.
|
| A little bit later Bron joined us, and he automated absolutely
| everything, so that we were able to just have NYI plug in a new
| machine and it would set itself up from scratch. This all just
| used regular Linux capabilities and simple open source tools,
| plus of course a whole lot of Perl.
|
| As the fortunes of AWS et al rose and rose and rose, I kept
| looking at their pricing at features and kept wondering what I
| was missing. They seemed orders of magnitude more expensive for
| something that was more complex to manage and would have locked
| us into a specific vendor's tooling. But everyone seemed to be
| flocking to them.
|
| To this day I still use bare metal servers for pretty much
| everything, and still love having the ability to use simple
| universally-applicable tools like plain Linux, Bash, Perl,
| Python, and SSH, to handle everything cheaply and reliably.
|
| I've been doing some planning over the last couple of years on
| teaching a course on how to do all this, although I was worried
| that folks are too locked in to SaaS stuff -- but perhaps things
| are changing and there might be interest in that after all?...
| basilgohar wrote:
| Please do this course. It's still needed and a lot of people
| would benefit from it. It's just that the loudest voices are
| all in on Cloud that it seems otherwise.
| ksec wrote:
| >But everyone seemed to be flocking to them.
|
| To the point we have young Devs today that dont know what VPS
| and Colo ( Colocation) meant.
|
| Back to the article, I am surprised it was only a "A few years
| ago" Fastmail adopted SSD. Which certainly seems late in the
| cycle for the benefits of what SSD offers.
|
| Price for Colo on the order of $3000/2U/year. That is $125
| /U/month.
| flemhans wrote:
| HDDs are still the best option for many workloads, including
| email.
| matt-p wrote:
| Colo is typically sold on power not space, from your example
| you're either getting ripped off if it's for low power
| servers or massively undercharged for a 4xa100 machine
| justsomehnguy wrote:
| > Which certainly seems late in the cycle for the benefits of
| what SSD offers.
|
| 90% of emails are never read, 9% are read once. What SSD
| could offer for this use case except at least 2x cost ?
| bluGill wrote:
| Don't forget that fastmail is through an internet transport
| with enough latency to make hdd seek times noise
| brongondwana wrote:
| We adopted SSD for the current week's email and rust for the
| deeper storage many years ago. A few years ago we switched to
| everything on NVMe, so there's no longer two tiers of
| storage. That's when the pricing switched to make it
| worthwhile.
| milesvp wrote:
| As someone who lived through that era, I can tell you there are
| legions of devs and dev adjacent people who have no idea what
| it's like to automate mission critical hardware. Everyone had
| to do it in the early 2000s. But it's been long enough that
| there are people in the workforce who just have no idea about
| running your own hardware since they never had to. I suspect
| there is a lot of interest, especially since we're likely
| approaching the bring it back in house cycle, as CTOs try to
| reign in their cloud spend.
| packtreefly wrote:
| > although I was worried that folks are too locked in to SaaS
| stuff
|
| For some people the cloud is straight magic, but for many of
| us, it just represents work we don't have to do. Let "the
| cloud" manage the hardware and you can deliver a SaaS product
| with all the nines you could ask for...
|
| > teaching a course on how to do all this ... there might be
| interest in that after all?
|
| Idk about a course, but I'd be interested in a blog post or
| something that addresses the pain points that I conveniently
| outsource to AWS. We have to maintain SOC 2 compliance, and
| there's a good chunk of stuff in those compliance requirements
| around physical security and datacenter hygiene that I get to
| just point at AWS for.
|
| I've run physical servers for production resources in the past,
| but they weren't exactly locked up in Fort Knox.
|
| I would find some in-depth details on these aspects
| interesting, but from a less-clinical viewpoint than the ones
| presented in the cloud vendors' SOC reports.
| dijit wrote:
| I've never visited a datacenter that wasn't SOC2 compliant.
| Bahnhof, SAVVIS, Telecity, Equinox etc.
|
| Of course, their SOC 2 compliance doesn't mean we are
| absolved of securing our databases and services.
|
| Theres a big gap between throwing some compute in a closet
| and having someone "run the closet" for you.
|
| There is, a significantly larger gap between having someone
| "run the closet" and building your own datacenter from
| scratch.
| benterix wrote:
| > As the fortunes of AWS et al rose and rose and rose, I kept
| looking at their pricing at features and kept wondering what I
| was missing.
|
| You are not the only one. There are several factors at play but
| I believe one of the strongest today is the generational
| divide: the people lost the ability to manage their own infra
| or don't know it well enough to do it well so it's true when
| they say "It's too much hassle". I say this as an AWS guy who
| occasionally works on on-prem infra.[0]
|
| [0] As a side note, I don't believe the lack of skills is the
| main reason organizations have problem - skills can be learned,
| but if you mess up the initial architecture design, fixing that
| can easily take years.
| riezebos wrote:
| As a customer of Fastmail and a fan of your work at FastAI and
| FastHTML I feel a bit stupid now for not knowing you started
| Fastmail.
|
| Now I'm wondering how much you'd look like tiangolo if you wore
| a moustache.
| brongondwana wrote:
| Jeremy is all the Fast things!
| llm_trw wrote:
| >As the fortunes of AWS et al rose and rose and rose, I kept
| looking at their pricing at features and kept wondering what I
| was missing. They seemed orders of magnitude more expensive for
| something that was more complex to manage and would have locked
| us into a specific vendor's tooling. But everyone seemed to be
| flocking to them.
|
| In 2006 when the first aws instances showed up it would take
| you two years of on demand bills to match the cost of buying
| the hardware from a retail store and using it continuously.
|
| Today it's between 2 weeks for ML workloads to three months for
| the mid sized instances.
|
| AWS made sense in big Corp when it would take you six months to
| get approval for buying the hardware and another six for the
| software. Today I'd only use it to do a prototype that I move
| on prem the second it looks like it will make it past one
| quarter.
| bluGill wrote:
| Aws is useful if you have uneven loads. why pay for the
| number of servers you need for christmas the rest of the
| year? But if your load is more even it doesn't make as much
| sense.
| 0xbadcafebee wrote:
| You know how to set up a rock-solid remote hands console to all
| your servers, I take it? Dial-up modem to a serial console
| server, serial cables to all the servers (or IPMI on a
| segregated network and management ports). Then you deal with
| varying hardware implementations, OSes, setting that up in all
| your racks in all your colos.
|
| Compare that to AWS, where there are 6 different kinds of
| remote hands, that work on all hardware and OSes, with no need
| for expertise, no time taken. No planning, no purchases, no
| shipment time, no waiting for remote hands to set it up, no
| diagnosing failures, etc, etc, etc...
|
| That's just _one thing_. There 's a _thousand_ more things,
| just for a plain old VM. And the cloud provides way more than
| VMs.
|
| The number of failures you can have on-prem is insane. Hardware
| can fail for all kinds of reasons (you must know this), and you
| have to have hot backup/spares, because otherwise you'll find
| out your spares don't work. Getting new gear in can take weeks
| (it "shouldn't" take that long, but there's little things like
| pandemics and global shortages on chips and disks that you
| can't predict). Power and cooling can go out. There's so many
| things that can (and eventually will) go wrong.
|
| Why expose your business to that much risk, and have to build
| that much expertise? To save a few bucks on a server?
| switch007 wrote:
| This. All of this and more. I've got friends who worked for a
| hosting providers who over the years have echoed this
| comment. It's endless.
| jread wrote:
| > Hardware can fail for all kinds of reasons
|
| Complex cloud infra can also fail for all kinds of reasons,
| and they are often harder to troubleshoot than a hardware
| failure. My experience with server grade hardware in a
| reliable colo with a good uplink is it's generally an
| extremely reliable combination.
| likeabatterycar wrote:
| > The number of failures you can have on-prem is insane.
| Hardware can fail for all kinds of reasons (you must know
| this)
|
| Cloud vendors are not immune from hardware failure. What do
| you think their underlying infrastructure runs on, some
| magical contraption made from Lego bricks, Swiss chocolate,
| and positive vibes?
|
| It's the same hardware, prone to the same failures. You've
| just outsourced worrying about it.
| jasode wrote:
| _> As the fortunes of AWS et al rose and rose and rose, I kept
| looking at their pricing at features and kept wondering what I
| was missing. They seemed orders of magnitude more expensive
| [...] To this day I still use bare metal servers for pretty
| much everything, [...] plain Linux, Bash, Perl, Python, and
| SSH, to handle everything cheaply _
|
| Your FastMail use case of (relatively) predictable server
| workload and product roadmap combined with agile Linux admins
| who are motivated to use close-to-bare-metal tools isn't an
| optimal cost fit for AWS. You're not missing anything and
| FastMail would have been overpaying for cloud.
|
| Where AWS/GCP/Azure shine is organizations that need _higher-
| level PaaS_ like managed DynamoDB, RedShift, SQS, etc that run
| on top of bare metal. Most _non-tech_ companies with internal
| IT departments cannot create /operate "internal cloud services"
| that's on par with AWS.[1] Some companies like Facebook and
| Walmart can run internal IT departments with advanced
| capabilities like AWS but most non-tech companies can't. This
| means paying AWS' fat profit margins _can actually be cheaper_
| than paying internal IT salaries to "reinvent AWS badly" by
| installing MySQL, Kafka, etc on bare metal Linux. E.g. Netflix
| had their own datacenters in 2008 but a 3-day database outage
| that stopped them from shipping DVDs was one of the reasons
| they quit running their datacenters and migrated to AWS.[2]
| Their complex workload isn't a good fit for bare-metal Linux
| and bash scripts; Netflix uses a ton of high-level PaaS managed
| services from AWS.
|
| If bare metal is the layer of abstraction the IT & dev
| departments are comfortable working at, then self-host on-
| premise, or co-lo, or Hetzner are all cheaper than AWS.
|
| [1]
| https://web.archive.org/web/20160319022029/https://www.compu...
|
| [2] https://media.netflix.com/en/company-blog/completing-the-
| net...
| e12e wrote:
| I used to help manage a couple of racks worth of on premise hw
| in early to mid 2000.
|
| We had some old Compaq (?) servers, most of the newer stuff was
| Dell. Mix of windows and Linux servers.
|
| Even with the Dell boxes, things wasn't really standard across
| different server generations, and every upgrade was bespoke,
| except in cases when we bought multiple boxes for
| redundancy/scaling of a particular service.
|
| What I'd like to see is something like oxide computer servers
| that scales way _down_ at least down to quarter rack. Like some
| kind of Supermicro meets backlblaze storage pod - but riffing
| on Joyent 's idea of colocating storage and compute. A sort of
| composable mainframe for small businesses in the 2020s.
|
| I guess maybe that is part of what Triton is all about.
|
| But anyway - somewhere to start, and grow into the future with
| sensible redundancies and open source bios/firmware/etc.
|
| Not typical situation for today, where you buy two (for
| redundancy) "big enough" boxes - and then need to reinvent your
| setup/deployment when you need two bigger boxes in three years.
| lukevp wrote:
| To me, Cloud is all about the shift left of DevOps. It's not a
| cost play. I'm a Dev Lead / Manager and have worked in both types
| of environments over the last 10 years. It's immeasurable the
| velocity difference as far as system provisioning between the two
| approaches. In the hardware space, it took months to years to
| provision new machines or upgrade OSes. In the cloud, it's a new
| terraform script and a CI deploy away. Need more storage? It's
| just there, available all the time. Need to add a new firewall
| between machines or redo the network topology? Free. Need a warm
| standby in 4 different regions that costs almost nothing but can
| scale to full production capacity within a couple of minutes?
| Done. Those types of things are difficult to do with physical
| hardware. And if you have an engineering culture where the
| operational work and the development work are at odds (think the
| old style of Dev / QA / Networking / Servers / Security all being
| separate teams), processes and handoffs eat your lunch and it
| becomes crippling to your ability to innovate. Cloud and DevOps
| are to me about reducing the differentiation between these roles
| so that a single engineer can do any part of the stack, which
| cuts out the communication overhead and the handoff time and the
| processes significantly.
|
| If you have predictable workloads, a competent engineering
| culture that fights against process culture, and are willing to
| spend the money to have good hardware and the people to man it
| 24x7x365 then I don't think cloud makes sense at all. Seems like
| that's what y'all have and you should keep up with it.
| Jenk wrote:
| Exactly this. It is culture and organisation (structure)
| dependent. I'm in the throes of the same discussion with my
| leader ship team, some of whom have built themselves an
| ops/qa/etc. empire and want to keep their moat.
|
| Are you running a well understood and predictable (as in,
| little change, growth, nor feature additions) system? Are your
| developers handing over to central platform/infra/ops teams?
| You'll probably save some cash by buying and owning the
| hardware you need for your use case(s). Elasticity is
| (probably) not part of your vocabulary, perhaps outside of "I
| wish we had it" anyway.
|
| Have you got teams and/or products that are scaling rapidly or
| unpredictably? Have you still got a lot of learning and
| experimenting to do with how your stack will work? Do you need
| flexibility but can't wait for that flexibility? Then cloud is
| for you.
|
| n.b. I don't think I've ever felt more validated by a
| post/comment than yours.
| comprev wrote:
| Our CI pipelines can spin up some seriously meaty hardware, run
| some very resource intensive tests, and destroy the
| infrastructure when finished.
|
| Bonus points: they can do it with spot pricing to further lower
| the bill.
|
| The cloud offers immense flexibility and empowers _developers_
| to easily manage their own infrastructure without depending on
| other teams.
|
| Speed of development is the primary reason $DayJob is moving
| into the cloud, while maintaining bare-metal for platforms
| which rarely change.
| drdaeman wrote:
| > In the hardware space, it took months to years to provision
| new machines or upgrade OSes.
|
| If it takes this long to manage a machine, I strongly suspect
| it means that when initially designing the system engineers had
| failed to account for those for some reason. Was that true in
| your case?
|
| Back in late '00s until mid '10s, I worked for an ISP startup
| as a SWE. We had a few core machines (database, RADIUS server,
| self-service website, etc) - ugly mess TBH - initially
| provisioned and originally managed entirely by hand as we
| didn't knew any better back then. Naturally, maintaining those
| was a major PITA, so they sat on the same dated distro for
| years. That was before Ansible was a thing, and we haven't
| really heard about Salt or Chef before we started to feel the
| pains and started to search for solutions. Virtualization
| (OpenVZ, then Docker) helped to soften a lot of issues, making
| it significantly easier to maintain the components, but the
| pains from our original sins were felt for a long time.
|
| But we also had a fleet of other machines, where we understood
| our issues with the servers enough to design new nodes to be as
| stateless as possible, with automatic rollout scripts for
| whatever we were able to automate. Provisioning a new host took
| only a few hours, with most time spent unpacking, driving,
| accessing the server room, and physically connecting things.
| Upgrades were pretty easy too - reroute customers to another
| failover node, write a new system image to the old one, reboot,
| test, re-route traffic back, done.
|
| So it's not like self-owned bare metal is harder to manage -
| the lesson I learned is that one just gotta think ahead of time
| what the future would require. Same as the clouds, I guess, one
| has to follow best practices or they'll end up with crappy
| architectures that will be painful to rework. Just different
| set of practices, because of the different nature of the
| systems.
| eddsolves wrote:
| My first job in tech was building servers for companies when
| they needed more compute, physically building them from our
| warehouse of components, driving them to their site, and
| setting it up in their network.
|
| You could get same day builds deployed on prem with the right
| support bundle!
| nprateem wrote:
| Yeah and some people reckon web frameworks are bad too. Sometimes
| it might make sense to host your on your own hardware but almost
| certainly not for startups.
| lakomen wrote:
| You also terminate accounts at your sole discretion
| awinter-py wrote:
| everyone is 'cattle not pets' except the farm vet who is
| shoulder-deep in a cow
|
| (my experience with managed kubernetes)
| EdJiang wrote:
| I was a bit confused by the section on backups. How do they
| manage moving the data offsite with the on-premises backup
| servers? Wouldn't that be a cost savings by going cloud?
| kwakubiney wrote:
| If I remember correctly, StackOverflow does something similar.
| The then Director of Engineering speaks about it on here[1]
|
| [1]https://hanselminutes.com/847/engineering-stack-overflow-
| wit...
| e12e wrote:
| They also have a SaaS product that lives in the cloud:
|
| https://stackoverflow.blog/2023/08/30/journey-to-the-cloud-p...
___________________________________________________________________
(page generated 2024-12-22 23:00 UTC)