hngopher.com

       [HN Gopher] Use one big server
       ___________________________________________________________________
        
       Use one big server
        
       Author : pclmulqdq
       Score  : 841 points
       Date   : 2022-08-02 14:43 UTC (8 hours ago)
        
 (HTM) web link (specbranch.com)
 (TXT) w3m dump (specbranch.com)
        
       | londons_explore wrote:
       | Hybrid!
       | 
       | If you are at all cost sensitive, you should have some of your
       | own infrastructure, some rented, and some cloud.
       | 
       | You should design your stuff to be relatively easily moved and
       | scaled between these. Build with docker and kubernetes and that's
       | pretty easy to do.
       | 
       | As your company grows, the infrastructure team can schedule which
       | jobs run where, and get more computation done for less money than
       | just running everything in AWS, and without the scaling headaches
       | of on-site stuff.
        
       | dekhn wrote:
       | Science advances as RAM on a single machine increases.
       | 
       | For many years, genomics software was non-parallel and depending
       | on having a lot of RAM- often a terabyte or more- to store data
       | in big hash tables. Converting that to distributed computing was
       | a major effort and to this day many people still just get a Big
       | Server With Lots of Cores, RAM, and SSD.
       | 
       | Personally after many years of working wiht distributed, I
       | absolutely enjoy working on a big fat server that I have all to
       | myself.
        
         | bee_rider wrote:
         | On the other hand in science, it sure is annoying that the size
         | of problems that fit in a single node is always increasing.
         | PARDISO running on a single node will always be nipping at your
         | heels if you are designing a distributed linear system
         | solver...
        
         | notacoward wrote:
         | > Science advances as RAM on a single machine increases.
         | 
         | Also as people learn that correlation does not equal causation.
         | ;)
        
       | rstephenson2 wrote:
       | It seems like lots of companies start in the cloud due to low
       | commitments, and then later when they have more stability and
       | demand and want to save costs, making bigger cloud commitments
       | (RIs, enterprise agreements etc) are a turnkey way to save money
       | but always leave you on the lower-efficiency cloud track. Has
       | anyone had good experiences selectively offloading workloads from
       | the cloud to bare metal servers nearby?
        
       | reillyse wrote:
       | Nope. Multiple small servers.
       | 
       | 1) you need to get over the hump and build in multiple servers
       | into your architecture from the get go (the author says you need
       | two servers minimum), so really we are talking about two big
       | servers.
       | 
       | 2) having multiple small servers allows us to spread our service
       | into different availability zones
       | 
       | 3) multiple small servers allows us to do rolling deploys without
       | bringing down our entire service
       | 
       | 4) once we use the multiple small servers approach it's easy to
       | scale up and down our compute by adding or removing machines.
       | Having one server it's difficult to scale up or down without
       | buying more machines. Small servers we can add incrementally but
       | with the large server approach scaling up requires downtime and
       | buying a new server.
        
         | zhte415 wrote:
         | It completely depends on what you doing. This was pointed out
         | in the first paragraph of the article:
         | 
         | > By thinking about the real operational considerations of our
         | systems, we can get some insight into whether we actually need
         | distributed systems for most things.
        
         | Nextgrid wrote:
         | > you need to get over the hump and build in multiple servers
         | into your architecture from the get go (the author says you
         | need two servers minimum), so really we are talking about two
         | big servers.
         | 
         | Managing a handful of big servers can be done manually if
         | needed - it's not pretty but it works and people have been
         | doing it just fine before the cloud came along. If you
         | intentionally plan on having dozens/hundreds of small servers,
         | manual management becomes unsustainable and now you need a
         | control plane such as Kubernetes, and all the complexity and
         | failure modes it brings.
         | 
         | > having multiple small servers allows us to spread our service
         | into different availability zones
         | 
         | So will 2 big servers in different AZs (whether cloud AZs or
         | old-school hosting providers such as OVH).
         | 
         | > multiple small servers allows us to do rolling deploys
         | without bringing down our entire service
         | 
         | Nothing prevents you from starting multiple instances of your
         | app on one big server nor doing rolling deploys with big bare-
         | metal assuming one server can handle the peak load (so you take
         | out your first server out of the LB, upgrade it, put it back in
         | the LB, then do the same for the second and so on).
         | 
         | > once we use the multiple small servers approach it's easy to
         | scale up and down our compute by adding or removing machines.
         | Having one server it's difficult to scale up or down without
         | buying more machines. Small servers we can add incrementally
         | but with the large server approach scaling up requires downtime
         | and buying a new server.
         | 
         | True but the cost premium of the cloud often offsets the
         | savings of autoscaling. A bare-metal capable of handling peak
         | load is often cheaper than your autoscaling stack at low load,
         | therefore you can just overprovision to always meet peak load
         | and still come out ahead.
        
           | SoftTalker wrote:
           | I manage hundreds of servers, and use Ansible. It's simple
           | and it gets the job done. I tried to install Kubernetes on a
           | cluster and couldn't get it to work. I mean I know it works,
           | obviously, but I could not figure it out and decided to stay
           | with what works for me.
        
             | eastbound wrote:
             | But it's specific, and no-one will want to take over your
             | job.
             | 
             | The upside of a standard AWS CloudFormation file is that
             | engineers are replaceable. They're cargo-cult engineers,
             | but they're not worried for their career.
        
               | Nextgrid wrote:
               | > But it's specific, and no-one will want to take over
               | your job.
               | 
               | It really depends what's on the table. Offer just half of
               | the cost savings vs an equivalent AWS setup as a bonus
               | (and pocket the other half) and I'm sure you'll find
               | people who will happily do it (and you'll be happy to
               | pocket the other half). For a lot of companies even just
               | _half_ of the cost savings would be a significant sum
               | (reminds me of an old client who spent _thousands_ per
               | month on an RDS cluster that not only was slower than my
               | entry-level MacBook, but ended up crapping out and stuck
               | in an inconsistent state for 12 hours and required manual
               | intervention from AWS to recover - so much for managed
               | services - ended up restoring a backup but I wish I could
               | 've SSH'd in and recovered it in-place).
               | 
               | As someone who uses tech as a means to an end and is more
               | worried about the _output_ said tech produces than the
               | tech itself (aka I 'm not looking for a job nor resume
               | clout nor invites to AWS/Hashicorp/etc conferences,
               | instead I bank on the business problems my tech solves),
               | I'm personally very happy to get my hands dirty with old-
               | school sysadmin stuff if it means I don't spend 10-20x
               | the money on infrastructure just to make Jeff Bezos
               | richer - my end customers don't know nor care either way
               | while my wallet appreciates the cost savings.
        
             | [deleted]
        
         | rubiquity wrote:
         | The line of thinking you follow is what is plaguing this
         | industry with too much complexity and simultaneously throwing
         | away incredible CPU and PCIe performance gains in favor of
         | using the network.
         | 
         | Any technical decisions about how many instances to have and
         | how they should be spread out needs to start as a business
         | decision and end in crisp numbers about recovery point/time
         | objections, and yet somehow that nearly never happens.
         | 
         | To answer your points:
         | 
         | 1) Not necessarily. You can stream data backups to remote
         | storage and recover from that on a new single server as long as
         | that recovery fits your Recovery Time Objective (RTO).
         | 
         | 2) What's the benefit of multiple AZs if the SLA of a single AZ
         | is greater than your intended availability goals? (Have you
         | checked your provider's single AZ SLA?)
         | 
         | 3) You can absolutely do rolling deploys on a single server.
         | 
         | 4) Using one large server doesn't mean you can't compliment it
         | with smaller servers on an as-needed basis. AWS even has a
         | service for doing this.
         | 
         | Which is to say: there aren't any prescriptions when it comes
         | to such decisions. Some businesses warrant your choices, the
         | vast majority do not.
        
           | reillyse wrote:
           | Ok, so to your points.
           | 
           | "It depends" is the correct answer to the question, but the
           | least informative.
           | 
           | One Big Server or multiple small servers? It depends.
           | 
           | It always depends. There are many workloads where one big
           | server is the perfect size. There are many workloads where
           | many small servers are the perfect solution.
           | 
           | What my point is, is that the ideas put forward in the
           | article are flawed for the vast majority of use cases.
           | 
           | I'm saying that multiple small servers are a better solution
           | on a number of different axis.
           | 
           | For 1) "One Server (Plus a Backup) is Usually Plenty" Now I
           | need some kind of remote storage streaming system and some
           | kind of manual recovery, am I going to fail over to the
           | backup (and so it needs to be as big as my "One server" or
           | will I need to manually recover from my backup?
           | 
           | 2) Yes it depends on your availability goals, but you get
           | this as a side effect of having more than one small instance
           | 
           | 3) Maybe I was ambiguous here. I don't just mean rolling
           | deploys of code. I also mean changing the server code,
           | restarting, upgrading and changing out the server. What
           | happens when you migrate to a new server (when you scale up
           | by purchasing a different box). Now we have a manual process
           | that doesn't get executed very often and is bound to cause
           | downtime.
           | 
           | 4) Now we have "Use one Big Server - and a bunch of small
           | ones"
           | 
           | I'm going to add a final point on reliability. By far the
           | biggest risk factor for reliability is me the engineer. I'm
           | responsible for bringing down my own infra way more than any
           | software bug or hardware issue. The probability of me messing
           | up everything when there is one server that everything
           | depends on is much much higher, speaking from experience.
           | 
           | So. Like I said, I could have said "It depends" but instead I
           | tried to give a response that was someway illuminating and
           | helpful, especially given the strong opinions expressed in
           | the article.
           | 
           | I'll give a little color with the current setup for a site I
           | run.
           | 
           | moustachecoffeeclub.com runs on ECS
           | 
           | I have 2 on-demand instances and 3 spot instances
           | 
           | One tiny instance running my caches (redis, memcache) One
           | "permanent" small instance running my web server
           | 
           | Two small spot instances running web server One small spot
           | instance running background jobs
           | 
           | small being about 3 GB and 1024 CPU units
           | 
           | And an RDS instance with backup about $67 / month
           | 
           | All in I'm well under $200 per month including database.
           | 
           | So you can do multiple small servers inexpensively.
           | 
           | Another aspect is that I appreciate being able to go on
           | vacation for a couple of weeks, go camping or take a plane
           | flight without worrying if my one server is going to fall
           | over when I'm away and my site is going to be down for a
           | week. In a big company maybe there is someone paid to monitor
           | this, but with a small company I could come back to a smoking
           | hulk of a company and that wouldn't be fun.
        
           | bombcar wrote:
           | > Any technical decisions about how many instances to have
           | and how they should be spread out needs to start as a
           | business decision and end in crisp numbers about recovery
           | point/time objections, and yet somehow that nearly never
           | happens.
           | 
           | Nobody wants to admit that their business or their department
           | actually has a SLA of "as soon as you can, maybe tomorrow, as
           | long as it usually works". So everything is pretend-
           | engineered to be fifteen nines of reliability (when in
           | reality it sometimes explodes _because_ of the  "attempts" to
           | make it robust).
           | 
           | Being honest about the _actual_ requirements can be extremely
           | helpful.
        
             | bob1029 wrote:
             | > Nobody wants to admit that their business or their
             | department actually has a SLA of "as soon as you can, maybe
             | tomorrow, as long as it usually works". So everything is
             | pretend-engineered to be fifteen nines of reliability (when
             | in reality it sometimes explodes because of the "attempts"
             | to make it robust).
             | 
             | I have yet to see my principal technical frustrations
             | summarized so concisely. This is at the heart of
             | _everything_.
             | 
             | If the business and the engineers can get over their
             | ridiculous obsession of statistical outcomes and strict
             | determinism, they would be able to arrive at a much more
             | cost effective, simple and human-friendly solution.
             | 
             | The # of businesses that are _actually_ sensitive to  >1
             | minute of annual downtime are already running on top of IBM
             | mainframes and have been for decades. No one's business is
             | as important as the federal reserve or pentagon, but they
             | don't want to admit it to themselves or others.
        
               | marcosdumay wrote:
               | > The # of businesses that are actually sensitive to >1
               | minute of annual downtime are already running on top of
               | IBM mainframes and have been for decades.
               | 
               | Is there any?
               | 
               | My bank certainly has way less than 5 9s of availability.
               | It's not a problem at all. Credit/debit card processors
               | seem to stay around 5 nines, and nobody is losing sleep
               | over it. As long as your unavailability isn't all on the
               | Christmas promotion day, I never saw anybody losing any
               | sleep over web-store unavailability. The FED probably
               | doesn't have 5 9's of availability. It's way overkill for
               | a central bank, even if it's one that process online
               | interbank transfers (what the FED doesn't).
               | 
               | The organizations that need more than 5 9's are probably
               | all on the military and science sectors. And those aren't
               | using mainframes, they certainly use good old redundancy
               | of equipment with simple failure modes.
        
           | bob1029 wrote:
           | > simultaneously throwing away incredible CPU and PCIe
           | performance gains
           | 
           | We _really_ need to double down on this point. I worry that
           | some developers believe they can defeat the laws of physics
           | with clever protocols.
           | 
           | The amount of time it takes to round trip the network _in the
           | same datacenter_ is roughly 100,000 to 1,000,000 nanoseconds.
           | 
           | The amount of time it takes to round trip L1 cache is around
           | half a nanosecond.
           | 
           | A trip down PCIe isn't much worse, relatively speaking. Maybe
           | hundreds of nanoseconds.
           | 
           | Lots of assumptions and hand waving here, but L1 cache _can
           | be_ around 1,000,000x faster than going across the network.
           | SIX orders of magnitude of performance are _instantly_
           | sacrificed to the gods of basic physics the moment you decide
           | to spread that SQLite instance across US-EAST-1. Sure, it
           | might not wind up a million times slower on a relative basis,
           | but you 'll never get access to those zeroes again.
        
           | roflyear wrote:
           | I agree! Our "distributed cloud database" just went down last
           | night for a couple of HOURS. Well, not entirely down. But
           | there were connection issues for hours.
           | 
           | Guess what never, never had this issue? The hardware I keep
           | in a datacenter lol!
        
           | dvfjsdhgfv wrote:
           | > The line of thinking you follow is what is plaguing this
           | industry with too much complexity and simultaneously throwing
           | away incredible CPU and PCIe performance gains in favor of
           | using the network.
           | 
           | It will die out naturally once people realize how much the
           | times have changed and that the old solutions based on weaker
           | hardware are no longer optimal.
        
           | deathanatos wrote:
           | > _2) What 's the benefit of multiple AZs if the SLA of a
           | single AZ is greater than your intended availability goals?
           | (Have you checked your provider's single AZ SLA?)_
           | 
           | ... my providers single AZ SLA is less than my company's
           | intended availability goals.
           | 
           | (IMO our goals are also nuts, too, but it is what it is.)
           | 
           | Our provider, in the worse case (a VM using a managed hard
           | disk) has an SLA of 95% within a month (I ... think. Their
           | SLA page uses incorrect units on the top line items. The
           | examples in the legalese -- examples are normative, right? --
           | use a unit of % / mo...).
           | 
           | You're also assuming a provider a.) typically meets their
           | SLAs and b.) if they don't, honors them. IME, (a) is highly
           | service dependent, with some services being just _stellar_ at
           | it, and (b) is usually  "they will if you can _prove_ to them
           | with your own metrics they had an outage, and push for a
           | credit. Also (c.) the service doesn 't fail in a way that's
           | impactful, but not covered by SLA. (E.g., I had a cloud
           | provider once whose SLA was over "the APIs should return
           | 2xx", and the APIs during the outage, always returned "2xx,
           | I'm processing your request". You then polled the API and got
           | "2xx your request is pending". Nothing was happening, because
           | they were having an outage, but that outage could continue
           | indefinitely without impacting the SLA! _That_ was a fun
           | support call...)
           | 
           | There's also (d) AZs are a myth; I've seen multiple global
           | outages. E.g., when something like the global authentication
           | service falls over and takes basically every other service
           | with it. (Because nothing can authenticate. What's even
           | better is the provider then listing those services as "up" /
           | not in an outage, because _technically_ it 's not _that_
           | service that 's down, it is just the authentication service.
           | Cause God forbid you'd have to give out _that_ credit. But
           | the provider calling a service  "up" that is failing 100% of
           | the requests sent its way is just rich, from the customer's
           | view.)
        
         | ericd wrote:
         | On a big server, you would probably be running VMs rather than
         | serving directly. And then it becomes easy to do most of what
         | you're talking about - the big server is just a pool of
         | resources from which to make small, single purpose VMs as you
         | need them.
        
           | Koshkin wrote:
           | Why VMs when you can use containers?
        
             | ericd wrote:
             | If you prefer those, go for it. I like my infra tech to be
             | about as boring and battle tested as I can get it without
             | big negatives in flexibility.
        
               | Koshkin wrote:
               | In theory, VMs should only be needed to run different
               | OSes on one big box. Otherwise, what should have sufficed
               | (speaking of what I 'prefer') is a multiuser OS that does
               | not require additional layers to ensure security and
               | proper isolation of users and their work environments
               | from each other. Unfortunately, looks like UNIX and its
               | descendants could not deliver on this basic need. (I
               | wonder if Multics had something of a better design in
               | this regard.)
        
             | cestith wrote:
             | Why containers when you can use unikernel applications?
        
               | Koshkin wrote:
               | But can unikernel applications share a big server
               | (without themselves running inside VMs)?
        
               | mixmastamyk wrote:
               | Better support when at least in the neighborhood of the
               | herd.
        
       | PeterCorless wrote:
       | We have a different take on running "one big database." At
       | ScyllaDB we prefer vertical scaling because you get better
       | utilization of all your vCPUs, but we still will keep a
       | replication factor of 3 to ensure that you can maintain [at
       | least] quorum reads and writes.
       | 
       | So we would likely recommend running 3x big servers. For those
       | who want to plan for failure, though, they might prefer to have
       | 6x medium servers, because then the loss of any one means you
       | don't take as much of a "torpedo hit" when any one server goes
       | offline.
       | 
       | So it's a balance. You want to be big, but you don't want to be
       | monolithic. You want an HA architecture so that no one node kills
       | your entire business.
       | 
       | I also suggest that people planning systems create their own
       | "torpedo test." We often benchmark to tell maximal optimum
       | performance, presuming that everything is going to go right.
       | 
       | But people who are concerned about real-world outage planning may
       | want to "torpedo" a node to see how a 2-out-of-3-nodes-up cluster
       | operates, versus a 5-out-of-6-nodes-up cluster.
       | 
       | This is like planning for major jets, to see if you can work with
       | 2 of 3 engines, or 1 of 2.
       | 
       | Obviously, if you have 1 engine, there is nothing you can do if
       | you lose that single point of failure. At that point, you are
       | updating your resume, and checking on the quality of your
       | parachute.
        
         | vlovich123 wrote:
         | > At that point, you are updating your resume, and checking on
         | the quality of your parachute
         | 
         | The ordering of these events seems off but that's
         | understandable considering we're talking about distributed
         | systems.
        
         | pclmulqdq wrote:
         | I think this is the right approach, and I really admire the
         | work you do at ScyllaDB. For something truly critical, you
         | really do want to have multiple nodes available (at least 2,
         | and probably 3 is better). However, you really should want to
         | have backup copies in multiple datacenters, not just the one.
         | 
         | Today, if I were running something that absolutely needed to be
         | up 24/7, I would run a 2x2 or 2x3 configuration with async
         | replication between primary and backup sites.
        
           | PeterCorless wrote:
           | Exactly. Regional distribution can be vital. Our customer
           | Kiwi.com had a datacenter fire. 10 of their 30 nodes were
           | turned to a slag heap of ash and metal. But 20 of 30 nodes in
           | their cluster were in completely different datacenters so
           | they lost zero data and kept running non-stop. This is a rare
           | story, but you do NOT want to be one of the thousands of
           | others that only had one datacenter, and their backups were
           | also stored there and burned up with their main servers. Oof!
           | 
           | https://www.scylladb.com/2021/03/23/kiwi-com-nonstop-
           | operati...
        
       | zokier wrote:
       | If you have just two servers how are you going to load-balance
       | and fail-over them? Generally you need at least 3 nodes for any
       | sort of quorum?
        
       | titzer wrote:
       | Last year I did some consulting for a client using Google cloud
       | services such as Spanner and cloud storage. Storing and indexing
       | mostly timeseries data with a custom index for specific types of
       | queries. It was difficult for them to define a schema to handle
       | the write bandwidth needed for their ingestion. In particular it
       | required a careful hashing scheme to balance load across shards
       | of the various tables. (It seems to be a pattern with many
       | databases to suck at append-often, read-very-often patterns, like
       | logs).
       | 
       | We designed some custom in-memory data structures in Java but
       | also also some of the standard high-performance concurrent data
       | structures. Some reader/write locks. gRPC and some pub/sub to get
       | updates on the order of a few hundred or thousand qps. In the
       | end, we ended up with JVM instances that had memory requirements
       | in the 10GB range. Replicate that 3-4x for failover, and we could
       | serve queries at higher rates and lower latency than hitting
       | Spanner. The main thing cloud was good for was the storage of the
       | underlying timeseries data (600GB maybe?) for fast server
       | startup, so that they could load the index off disk in less than
       | a minute. We designed a custom binary disk format to make that
       | blazingly fast, and then just threw binary files into a cloud
       | filesystem.
       | 
       | If you need to serve < 100GB of data and most of it is
       | static...IMHO, screw the cloud, use a big server and replicate it
       | for fail-over. Unless you got really high write rates or have
       | seriously stringent transactional requirements, then man, a
       | couple servers will do it.
       | 
       | YMMV, but holy crap, servers are huge these days.
        
         | eastbound wrote:
         | When you say "screw the cloud", you mean "administer an EC2
         | machine yourself" or really "buy your own hardware"?
        
           | titzer wrote:
           | The former, mostly. You don't necessarily have to use EC2,
           | but that's easy to do. There are many other, smaller
           | providers if you really want to get out from under the big 3.
           | I have no experience managing hardware, so I personally
           | wouldn't take that on myself.
        
       | sllabres wrote:
       | I would think that it can hold 1TB of RAM _per_socket_ (with 64GB
       | DIMM), so _2TB_ total.
        
       | bob1029 wrote:
       | > 1 million IOPS on a NoSQL database
       | 
       | I have gone well beyond this figure by doing clever tricks in
       | software and batching multiple transactions into IO blocks where
       | feasible. If your average transaction is substantially smaller
       | than the IO block size, then you are probably leaving a lot of
       | throughput on the table.
       | 
       | The point I am trying to make is that even if you think "One Big
       | Server" might have issues down the road, there are always some
       | optimizations that can be made. Have some faith in the vertical.
       | 
       | This path has worked out _really_ well for us over the last
       | ~decade. New employees can pick things up much more quickly when
       | you don 't have to show them the equivalent of a nuclear reactor
       | CAD drawing to get started.
        
         | mathisonturing wrote:
         | > batching multiple transactions into IO blocks where feasible.
         | If your average transaction is substantially smaller than the
         | IO block size, then you are probably leaving a lot of
         | throughput on the table.
         | 
         | Could you expand on this? A quick Google search didn't help.
         | Link to an article or a brief explanation would be nice!
        
           | bob1029 wrote:
           | Sure. If you are using some micro-batched event processing
           | abstraction, such as the LMAX Disruptor, you have an
           | opportunity to take small batches of transactions and process
           | them as a single unit to disk.
           | 
           | For event sourcing applications, multiple transactions can be
           | coalesced into a single IO block & operation without much
           | drama using this technique.
           | 
           | Surprisingly, this technique also _lowers_ the amount of
           | latency that any given user should experience, despite the
           | fact that you are  "blocking" multiple users to take
           | advantage of small batching effects.
        
       | lanstin wrote:
       | I didn't see a point of cloudy services being easier to manage.
       | If some team gets a capital budget to buy that one big server,
       | they will put every thing on it, no matter your architectural
       | standards. Cron jobs editing state on disk, tmux sessions shared
       | between teams, random web servers doing who knows what, non-DBA
       | team Postgres installs, etc. at least in cloud you can limit
       | certain features and do charge back calculations.
       | 
       | Not sure if that is a net win for cloud or physical, of course,
       | but I think it is a factor
        
         | kgeist wrote:
         | One of our projects uses 1 big server and indeed, everyone
         | started putting everything on it (because it's powerful): the
         | project itself, a bunch of corporate sites, a code review tool,
         | and god knows what else. Last week we started having issues
         | with the projects going down because something is overloading
         | the system and they still can't find out what exactly without
         | stopping services/moving them to a different machine
         | (fortunately, it's internal corporate stuff, not user-facing
         | systems). The main problem I've found with this setup is that
         | random stuff can accumulate with time and then one
         | tool/process/project/service going out of control can bring
         | down the whole machine. If it's N small machines, there's
         | greater isolation.
        
           | pclmulqdq wrote:
           | It sounds like you need some containers.
        
       | kbenson wrote:
       | One server is for a hobby, not a business. Maybe that's fine, but
       | keep that in mind. Backups at that level are something that keeps
       | you from losing all data, not something that keeps you running
       | and gets you up in any acceptable timeframe for most businesses.
       | 
       | That doesn't mean you need to use the cloud, it just means one
       | big piece of hardware with all its single points of failure is
       | often not enough. Two servers gets you so much more than one. You
       | can make one a hot spare, or actually split services between them
       | and have each be ready to take over for specific services for the
       | other, greatly including your burst handling capability and
       | giving you time to put more resources in place to keep n+1
       | redundancy going if you're using more than half of a server's
       | resources.
        
         | secabeen wrote:
         | This is exactly the OPs recommended solution:
         | 
         | > One Server (Plus a Backup) is Usually Plenty
        
           | kbenson wrote:
           | The I guess my first sentence is about _eqally_ as click-
           | baity as the article title. ;)
        
         | vitro wrote:
         | Let's Encrypt's database server [1] would beg to differ. For
         | businesses at certain scale two servers are really an overkill.
         | 
         | [1] https://letsencrypt.org/2021/01/21/next-gen-database-
         | servers...
        
           | mh- wrote:
           | That says they use a single _database_ , as in a logical
           | MySQL database. I don't see any claim that they use a single
           | _server_. In fact, the title of the article you 've linked
           | suggests they use multiple.
        
             | simonw wrote:
             | https://letsencrypt.status.io/ shows a list of their
             | servers, which look to be spread across three data centers
             | (one "public", two "high availability").
        
               | kbenson wrote:
               | Do we know if it shows cold spares? That's all I think is
               | needed at a minimum to avoid the problems I'm talking
               | about, and I doubt they would note those if they don't
               | necessarily have a hostname.
        
           | kbenson wrote:
           | Do they actually say they don't have a slave to that database
           | ready to take over? I seriously doubt Let's Encrypt has no
           | spare.
           | 
           | Note I didn't say you shouldn't run one service (as in
           | daemon) or set of services from one box, just that one box is
           | not enough and you need that spare.
           | 
           | It Let's Encrypt actually has no spare for their database
           | server and they're one hardware failure away from being down
           | for what may be a large chunk of time (I highly doubt it),
           | then I wouldn't want to use them even if free. Thankfully, I
           | doubt your interpretation of what that article is saying.
        
             | vitro wrote:
             | You're right, from the article:
             | 
             | > The new AMD EPYC CPUs sit at about 25%. You can see in
             | this graph where we promoted the new database server from
             | replica (read-only) to primary (read/write) on September
             | 15.
        
       | kubb wrote:
       | As per usual, don't copy Google if you don't have the same
       | requirements. Google Search never goes down. HN goes down from
       | time and nobody minds. Google serves tens (hundreds?) of
       | thousands of queries per second. HN serves ten. HN is fine with
       | one server because it's small. How big is your service going to
       | be? Do that boring math :)
        
         | FartyMcFarter wrote:
         | Even Google search has gone down apparently, for five minutes
         | in 2013:
         | 
         | https://www.cnet.com/tech/services-and-software/google-goes-...
        
           | terafo wrote:
           | There were huge availability issues as recent as December
           | 14th 2020, for 45 minutes.
        
         | roflyear wrote:
         | Correct. I like to ask "how much money do we lose if the site
         | goes down for 1hr? a day?" etc.. and plan around that. If you
         | are losing 1m an hour, or 50m if it goes down for a day, hell
         | yeah you should spend a few million on making sure your site
         | stays online!
         | 
         | But, it is amazing how often c-levels cannot answer this
         | question!
        
       | _nhh wrote:
       | I agree
        
       | rbanffy wrote:
       | I wouldn't recommend one, but at least two, for redundancy.
        
       | londons_explore wrote:
       | Don't be scared of 'one big server' for reliability. I'd bet that
       | if you hired a big server today in a datacenter, the hardware
       | will have more uptime than something cloud-native with az-
       | failover hosted on AWS.
       | 
       | Just make sure you have a tested 30 minute restoration plan in
       | case of permanent hardware failure. You'll probably only use it
       | once every 50 years on average, but it will be an expensive event
       | when it happens.
        
       | cpursley wrote:
       | You've got features to ship. Stick your stuff on Render.com and
       | don't think about it again. Even a dummy like me can manage that.
        
       | alexpotato wrote:
       | My favorite summary of why not to use microservices is from Grug:
       | 
       | "grug wonder why big brain take hardest problem, factoring system
       | correctly, and introduce network call too
       | 
       | seem very confusing to grug"
       | 
       | https://grugbrain.dev/#grug-on-microservices
        
       | fleddr wrote:
       | Our industry summarized:
       | 
       | Hardware engineers are pushing the absolute physical limits of
       | getting state (memory/storage) as close as possible to compute. A
       | monumental accomplishment as impactful as the invention of
       | agriculture and the industrial revolution.
       | 
       | Software engineers: let's completely undo all that engineering by
       | moving everything apart as far as possible. Hmmm, still too fast.
       | Let's next add virtualization and software stacks with shitty
       | abstractions.
       | 
       | Fast and powerful browser? Let's completely ignore 20 years of
       | performance engineering and reinvent...rendering. Hmm, sucks a
       | bit. Let's add back server rendering. Wait, now we have to render
       | twice. Ah well, let's just call it a "best practice".
       | 
       | The mouse that I'm using right now (an expensive one) has a 2GB
       | desktop Electron app that seems to want to update itself twice a
       | week.
       | 
       | The state of us, the absolute garbage that we put out, and the
       | creative ways in which we try to justify it. It's like a mind
       | virus.
       | 
       | I want my downvotes now.
        
         | GuB-42 wrote:
         | Actually, for those who push for these cloudy solutions, they
         | do that in part to make data close to you. I am talking mostly
         | about CDNs, I don't thing YouTube and Netflix would have been
         | possible without them.
         | 
         | Google is a US company, but you don't want people in Australia
         | to connect to the other side of the globe every time they need
         | to access Google services, it would be an awful waste of
         | intercontinental bandwidth. Instead, Google has data centers in
         | Australia to serve people in Australia, and they only hit US
         | servers when absolutely needed. And that's when you need to
         | abstract things out. If something becomes relevant in
         | Australia, move it in there, and move it out when it no longer
         | matters. When something big happens, copy it everywhere, and
         | replace the copies by something else as interest wanes.
         | 
         | Big companies need to split everything, they can't centralize
         | because the world isn't centralized. The problem is when small
         | businesses try to do the same because "if Google is so
         | successful doing that, it must be right". Scale matters.
        
         | Foomf wrote:
         | You've more or less described Wirth's Law:
         | https://en.wikipedia.org/wiki/Wirth%27s_law
        
           | fleddr wrote:
           | I had no idea, thanks. Consider this a broken clock being
           | sometimes right.
        
       | kkielhofner wrote:
       | Great article overall with many good points worth considering.
       | Nothing is one size fits all so I won't get into the crux of the
       | article: "just get one big server". I recently posted a comment
       | breaking down the math for my situation:
       | 
       | https://news.ycombinator.com/item?id=32250470#32253635
       | 
       | For the most "extreme" option of buying your own $40k server from
       | Dell I'm always surprised at how many people don't consider
       | leasing. No matter what it breaks the cost into an operating
       | expense vs a capital one which is par with the other options in
       | terms of accounting and doesn't require laying out $40k.
       | 
       | Adding on that, in the US we have some absolutely wild tax
       | advantages for large "capital expenditures" that also apply to
       | leasing:
       | 
       | https://www.section179.org/section_179_leases/
        
       | phendrenad2 wrote:
       | The problem with "one big server" is, you really need good
       | IT/ops/sysadmin people who can think in non-cloud terms. (If you
       | catch them installing docker on it, throw them into a lava pit
       | immediately).
        
         | henry700 wrote:
         | What's the problem with installing Docker so you can run
         | containers of diferent distros, languages & flavors using the
         | same one big server though?
        
       | londons_explore wrote:
       | One-big-VM is another approach...
       | 
       | A big benefit is some providers will let you resize the VM bigger
       | as you grow. The behind-the-scenes implementation is they migrate
       | your VM to another machine with near-zero downtime. Pretty cool
       | tech, and takes away a big disadvantage of bare metal which is
       | growth pains.
        
       | lrvick wrote:
       | A consequence of one-big-server is decreased security. You become
       | discouraged from applying patches because you must reboot. Also
       | if one part of the system is compromised, every service is now
       | compromised.
       | 
       | Microservices on distinct systems offer damage control.
        
       | jvanderbot wrote:
       | No thanks. I have a few hobby sites, a personal vanity page, and
       | some basic CPU expensive services that I use.
       | 
       | Moving to Aws server-less has saved me so much headache with
       | system updates, certificate management, archival and backup,
       | networking, and so much more. Not to mention with my low-but-
       | spikey load, my breakeven is a long way off.
        
       | SassyGrapefruit wrote:
       | >Use the Cloud, but don't be too Cloudy
       | 
       | The number of applications I have inherited that were messes
       | falling apart at the seams because of misguided attempts to avoid
       | "vendor lockin" with the cloud can not be understated. There is
       | something I find ironic about people paying to use a platform but
       | not using it because they feel like using it too much will make
       | them feel compelled to stay there. Its basically starving
       | yourself so you don't get too familiar with eating regularly.
       | 
       | Kids this PSA is for you. Auto Scaling Groups are just fine as
       | are all the other "Cloud Native" services. Most business partners
       | will tell you a dollar of growth is worth 5x-10x the value of a
       | dollar of savings. Building a huge tall computer will be cheaper
       | but if it isn't 10x cheaper(And that is Total Cost of Ownership
       | not the cost of the metal) and you are moving more slowly than
       | you otherwise would its almost a certainty you are leaving money
       | on the table.
        
       | meeks wrote:
       | The whole argument comes down to bursty vs. non-bursty workloads.
       | What type of workloads make up the fat part of the distribution?
       | If most use cases are bursty (which I would argue they are) then
       | the author's argument only applies for specific applications.
       | Therefore, most people do indeed see cost benefits from the
       | cloud.
        
       | galkk wrote:
       | One of first experiences in my professional career was situation
       | when "one big server" that was serving the system that was making
       | money actually failed on Friday, HP's warranty was like next or 2
       | business days to get a replacement.
       | 
       | The entire situation ended up having conference call with
       | multiple department directors who were deciding which server from
       | other systems to cannibalize (even if it is underpowered) to get
       | the system going.
       | 
       | Since that time I'm quite skeptical about "one", and to me this
       | is one of big benefits of cloud provides, as, most likely, there
       | is another instance and stockouts are more rare.
        
         | jmull wrote:
         | The article is really talking about one big server plus a
         | backup vs. cloud providers.
        
       | mochomocha wrote:
       | > Why Should I Pay for Peak Load? [...] someone in that supply
       | chain is charging you based on their peak load
       | 
       | Oh it's even worse than that: this someone oversubscribe your
       | hardware a little during your peak and a lot during your trough,
       | padding their great margins at the expense of extra cache
       | misses/perf degradation of your software that most of the time
       | you won't notice if they do their job well.
       | 
       | This is one of the reasons why large companies such as my
       | employer (Netflix) are able to invest into their own compute
       | platforms to reclaim some of these gains back, so that any
       | oversubscription & collocation gains materialize into a lower
       | cloud bill - instead of having your spare CPU cycles be funneled
       | to a random co-tenant customer of your cloud provider, the latter
       | capturing the extra value.
        
       | robertlagrant wrote:
       | This is why I like Cloudflare's worker model. It feels like the
       | usefulness of cloud deployments, but with a pretty restrained
       | pricing model.
        
       | system2 wrote:
       | It blows my mind people are spending $2000+ per month for a
       | server they can get used for $4000-5000 one time only cost.
       | 
       | VMWare + Synology Business Backup + Synology C2 backup is our way
       | of doing business and never failed us for over 7 years. Why do
       | people spend so much money for cloud while they can host it
       | themselves less than 5% of the cost? (2 year usage assumed).
        
         | adlpz wrote:
         | I've tried it all except this, including renting bare metal.
         | Nowadays I'm in the cloud but not _cloudy_ camp. Still, I 'm
         | intrigued.
         | 
         | Apart from the $4-5k server, what are your running costs?
         | Licenses? Colocation? Network?
        
           | vgeek wrote:
           | https://www.he.net/colocation.html
           | 
           | They have been around forever and their $400 deal is good,
           | but that is for 42U, 1G and only 15 amps. With beefier
           | servers, you will need more current (both BW and amperage) if
           | you intend on filling the rack.
        
       | soruly wrote:
       | that's why letsencrypt use a single database on a powerful server
       | https://letsencrypt.org/2021/01/21/next-gen-database-servers...
        
       | wahnfrieden wrote:
       | I've started augmenting one big server with iCloud (CloudKit)
       | storage, specifically syncing local Realm DBs to the user's own
       | iCloud storage. Which means I can avoid taking custody of
       | PII/problematic data, can include non-custodial privacy in
       | product value/marketing, and means I can charge enough of a
       | premium for the one big server to keep it affordable. I know how
       | to scale servers in and out, so I feel the value of avoiding all
       | that complexity. This is a business approach that leans into
       | that, with a way to keep the business growing with domain
       | complexity/scope/adoption (iCloud storage, probably other good
       | APIs like this to work with along similar lines).
        
       | dugmartin wrote:
       | I think Elixir/Erlang is uniquely positioned to get more traction
       | in the inevitable microservice/kubernetes backlash and the return
       | to single server deploys (with a hot backup). Not only does it
       | usually sip server resources but it also scales naturally as more
       | cores/threads are available on a server.
        
         | lliamander wrote:
         | Going _from_ an Erlang  "monolith" to a java/k8s cluster, I was
         | amazed at how much more work it is takes to build a "modern"
         | microservice. Erlang still feels like the future to me.
        
         | dougmoscrop wrote:
         | Can you imagine if even a fraction of the effort poured in to
         | k8s tooling had gone in to the Erlang/OTP ecosystem instead?
        
         | dboreham wrote:
         | This is the norm. It's only weird things like Node.js and Ruby
         | that don't have this property.
        
           | hunterloftis wrote:
           | While individual Node.js processes are single-threaded,
           | Node.js includes a standard API that distributes its load
           | across multiple processes, and therefor cores.
           | 
           | - https://nodejs.org/api/cluster.html#cluster
        
       | throwaway787544 wrote:
       | I have been doing this for two decades. Let me tell you about
       | bare metal.
       | 
       | Back in the day we had 1,000 physical servers to run a large
       | scale web app. 90% of that capacity was used only for two months.
       | So we had to buy 900 servers just to make most of our money over
       | two events in two seasons.
       | 
       | We also had to have 900 servers because even one beefy machine
       | has bandwidth and latency limits. Your network switch simply
       | can't pump more than a set amount of traffic through its
       | backplane or your NICs, and the OS may have piss-poor packet
       | performance too. Lots of smaller machines allow easier scaling of
       | network load.
       | 
       | But you can't just buy 900 servers. You always need more
       | capacity, so you have to predict what your peak load will be, and
       | buy for that. And you have to do it well in advance because it
       | takes a long time to build and ship 900 servers and then assemble
       | them, run burn-in, replace the duds, and prep the OS, firmware,
       | software. And you have to do this every 3 years (minimum) because
       | old hardware gets obsolete and slow, hardware dies, disks die,
       | support contracts expire. But not all at once, because who knows
       | what logistics problems you'd run into and possibly not get all
       | the machines in time to make your projected peak load.
       | 
       | If back then you told me I could turn on 900 servers for 1 month
       | and then turn them off, no planning, no 3 year capital outlay, no
       | assembly, burn in, software configuration, hardware repair, etc
       | etc, I'd call you crazy. Hosting providers existed but _nobody_
       | could just give you 900 servers in an hour, _nobody_ had that
       | capacity.
       | 
       | And by the way: cloud prices are _retail prices_. Get on a
       | savings plan or reserve some instances and the cost can be half.
       | Spot instances are a quarter or less the price. Serverless is
       | pennies on the dollar with no management overhead.
       | 
       | If you don't want to learn new things, buy one big server. I just
       | pray it doesn't go down for you, as it can take up to several
       | days for some cloud vendors to get some hardware classes in some
       | regions. And I pray you were doing daily disk snapshots, and can
       | get your dead disks replaced quickly.
        
         | MrStonedOne wrote:
         | i handled a 8x increase in traffic to my website from a
         | youtuber reviewing our game, by increasing the cache timer and
         | fixing the wiki creating session table entries for logged out
         | users on a wiki that required accounts to edit it.
         | 
         | we already get multiple millions of page hits a months for this
         | happened.
         | 
         | This server had 8 cores but 5 of them were reserved for the
         | 10tb a month in bandwidth game servers running on the same
         | machine.
         | 
         | If you needed 1,000 physical computers to run your webapp, you
         | fucked up somewhere along the line.
        
         | toast0 wrote:
         | > I have been doing this for two decades. Let me tell you about
         | bare metal.
         | 
         | > Back in the day we had 1,000 physical servers to run a large
         | scale web app. 90% of that capacity was used only for two
         | months. So we had to buy 900 servers just to make most of our
         | money over two events in two seasons.
         | 
         | > We also had to have 900 servers because even one beefy
         | machine has bandwidth and latency limits. Your network switch
         | simply can't pump more than a set amount of traffic through its
         | backplane or your NICs, and the OS may have piss-poor packet
         | performance too. Lots of smaller machines allow easier scaling
         | of network load.
         | 
         | I started working with real (bare metal) servers on real
         | internet loads in 2004 and retired in 2019. While there's truth
         | here, there's also missing information. In 2004, all my servers
         | had 100M ethernet, but in 2019, all my new servers had 4x10G
         | ethernet (2x public, 2x private), actually some of them had 6x,
         | but with 2x unconnected, I dunno why. In the meantime, cpu,
         | nics, and operating systems have improved such that if you're
         | not getting line rate for full mtu packets, it's probably
         | becsause your application uses a lot of cpu, or you've hit a
         | pathological case in the OS (which happens, but if you're
         | running 1000 servers, you've probably got someone to debug
         | that).
         | 
         | If you still need 1000 beefy 10G servers, you've got a pretty
         | formidable load, but splitting it up into many more smaller
         | servers is asking for problems of different kinds. Otoh, if
         | your load really scales to 10x for a month, and you're at that
         | scale, cloud economics are going to work for you.
         | 
         | My seasonal loads were maybe 50% more than normal, but usage
         | trends (and development trends) meant that the seasonal peak
         | would become the new normal soon enough; cloud managing the
         | peaks would help a bit, but buying for the peak and keeping it
         | running for the growth was fine. Daily peaks were maybe 2-3x
         | the off-peak usage, 5 or 6 days a week; a tightly managed cloud
         | provisioning could reduce costs here, but probably not enough
         | to compete with having bare metal for the full day.
        
         | taylodl wrote:
         | That's a good point about cloud services being retail. My
         | company gets a very large discount from one of the most well-
         | known cloud providers. This is available to everybody -
         | typically if you commit to 12 months of a minimum usage then
         | you can get substantial discounts. What I know is so far
         | everything we've migrated to the cloud has resulted in
         | _significantly_ reduced total costs, increased reliability,
         | improved scalability, and is easier to enhance and remediate.
         | Faster, cheaper, better - that 's been a huge win for us!
        
         | fleddr wrote:
         | The entire point of the article is that your dated example no
         | longer applies: you can fit the vast majority of common loads
         | on a single server now, they are this powerful.
         | 
         | Redundancy concerns are also addressed in the article.
        
         | PaulDavisThe1st wrote:
         | > If you don't want to learn new things, buy one big server. I
         | just pray it doesn't go down for you
         | 
         | There's intermediate ground here. Rent one big server, reserved
         | instance. Cloudy in the sense that you get the benefits of the
         | cloud provider's infrastructure skills and experience, and
         | uptime, plus easy backup provisioning; non-cloudy in that you
         | can just treat that one server instance like your own hardware,
         | running (more or less) your own preferred OS/distro, with
         | "traditional" services running on it (e.g. in our case: nginx,
         | gitea, discourse, mantis, ssh)
        
         | yardie wrote:
         | Let me take you back to March, 2020. When millions of Americans
         | woke up to find out there was a pandemic and they would be
         | working from home now. Not a problem, I'll just call up our
         | cloud provider and request more cloud compute. You join a queue
         | of a thousand other customers calling in that morning for the
         | exact same thing. A few hours on hold and the CSR tells you
         | they aren't provisioning anymore compute resources. east-us is
         | tapped out, central-europe tapped out hours ago, California got
         | a clue and they already called to reserve so you can't have
         | that either.
         | 
         | I use cloud all the time but there are also blackswan events
         | where your IaaS can't do anymore for you.
        
           | tempnow987 wrote:
           | I never had this problem on AWS though I did see some
           | startups struggle with some more specialized instances. Are
           | midsize companies actually running into issues with non-
           | specialized compute on AWS?
        
         | kardianos wrote:
         | That sounds like you have burst load. Per the article, cloud
         | away, great fit.
         | 
         | The point was most people don't have that and even their bursts
         | can fit in a single server. This is my experience as well.
        
           | maxbond wrote:
           | The thing that confuses me is, isn't every publicly
           | accessible service bursty on a long timescale? Everything
           | looks seasonal and predictable until you hit the front page
           | of Reddit, and you don't know what day that will be. You
           | don't decide how much traffic you get, the world does.
        
             | genousti wrote:
             | Funily hitting reddit front page might ruin you if you run
             | on aws
        
             | NorwegianDude wrote:
             | Hitting the front page of reddit is insignificant, it's not
             | like you'll get anywhere near thousands upon thousands of
             | requests each second. If you have a somewhat normal website
             | and you're not doing something weird then it's easily
             | handled with a single low-end server.
             | 
             | If I get so much traffic that scaling becomes a problem
             | then I'll be happy as I would make a ton of money. No need
             | to build to be able to handle the whole world at the same
             | time, that's just a waste of money in nearly all
             | situations.
        
       | taylodl wrote:
       | If you're hosting on-prem then you have a cluster to configure
       | and manage, you have multiple data centers you need to provision,
       | you need data backups you have to manage plus the storage
       | required for all those backups. Data centers also require power,
       | cooling, real estate taxes, administration - and you need at
       | least two of them to handle systemic outages. Now you have to
       | manage and coordinate your data between those data centers. None
       | of this is impossible of course, companies have been doing this
       | everyday for decades now. But let's not pretend it doesn't all
       | have a cost - and unless your business is running a data center,
       | none of these costs are aligned with your business' core mission.
       | 
       | If you're running a start-up it's pretty much a no-brainer you're
       | going to start off in the cloud.
       | 
       | What's the real criteria to evaluate on-prem versus the cloud?
       | Load consistency. As the article notes, serverless cloud
       | architectures are perfect for bursty loads. If your traffic is
       | highly variable then the ability to quickly scale-up and then
       | scale-down will be of benefit to you - and there's a lot of
       | complexity you don't have to manage to boot! Generally speaking
       | such a solution is going to be cheaper and easier to configure
       | and manage. That's a win-win!
       | 
       | If your load isn't as variable and you therefore have cloud
       | resources always running, then it's almost always cheaper to host
       | those applications on-prem - assuming you have on-prem hosting
       | available to you. As I noted above, building data centers isn't
       | cheap and it's almost always cheaper to stay in the cloud than it
       | is to build a new data center, but if you already have data
       | center(s) then your calculus is different.
       | 
       | Another thing to keep in mind at the moment is even if you decide
       | to deploy on-prem you may not be able to get the hardware you
       | need. A colleague of mine is working on a large project that's to
       | be hosted on-prem. It's going to take 6-12 months to get all the
       | required hardware. Even prior to the pandemic the backlog was 3-6
       | months because the major cloud providers are consuming all the
       | hardware. Vendors would rather deal with buyers buying hardware
       | by the tens of thousands than a shop buying a few dozen servers.
       | You might even find your hardware delivery date getting pushed
       | out as the "big guys" get their orders filled. It happens.
        
         | ozim wrote:
         | You know you can run a server in the cellar under your stairs.
         | 
         | You know that if you are a startup you can just keep servers in
         | a closet and hope that no one turns on coffee machine while
         | airco runs because it will pop circuit breakers, which will
         | take down your server or maybe you might have UPS at least so
         | maybe not :)
         | 
         | I have read horror stories about companies having such setups.
         | 
         | While they don't need multiple data centers, power, cooling and
         | redundancy sounds for them like some kind of STD - getting
         | cheap VPS should be default for such people. That is a win as
         | well.
        
       | nostrebored wrote:
       | As someone who's worked in cloud sales and no longer has any skin
       | in the game, I've seen firsthand how cloud native architectures
       | improve developer velocity, offer enhanced reliability and
       | availability, and actually decrease lock-in over time.
       | 
       | Every customer I worked with who had one of these huge servers
       | introduced coupling and state in some unpleasant way. They were
       | locked in to persisted state, and couldn't scale out to handle
       | variable load even if they wanted to. Beyond that, hardware
       | utilization became contentious at any mid-enterprise scale.
       | Everyone views the resource pool as theirs, and organizational
       | initiatives often push people towards consuming the same types of
       | resources.
       | 
       | When it came time to scale out or do international expansion,
       | every single one of my customers who had adopted this strategy
       | had assumptions baked into their access patterns that made sense
       | given their single server. When it came time to store some part
       | of the state in a way that made sense for geographically
       | distributed consumers, it was months not sprints of time spent
       | figuring out how to hammer this in to a model that's
       | fundamentally at odds.
       | 
       | From a reliability and availability standpoint, I'd often see
       | customers tell me that 'we're highly available within a single
       | data center' or 'we're split across X data centers' without
       | considering the shared failure modes that each of these data
       | centers had. Would a fiber outage knock out both of your DCs?
       | Would a natural disaster likely knock something over? How about
       | _power grids_? People often don't realize the failure modes
       | they've already accepted.
       | 
       | This is obviously not true for every workload. It's tech, there
       | are tradeoffs you're making. But I would strongly caution any
       | company that expects large growth against sitting on a single-
       | server model for very long.
        
         | secabeen wrote:
         | The common element in the above is scaling and reliability.
         | While lots of startups and companies are focused on the 1%
         | chance that they are the next Google or Shopify, the reality is
         | that nearly all aren't, and the overengineering and redundancy-
         | first model that cloud pushes does cost them a lot of runway.
         | 
         | It's even less useful for large companies; there is no world in
         | which Kellogg is going to increase sales by 100x, or even 10x.
        
           | nostrebored wrote:
           | But most companies aren't startups. Many companies are
           | established, growing businesses with a need to be able to
           | easily implement new initiatives and products.
           | 
           | The benefits of cloud for LE are completely different. I'm
           | happy to break down why, but I addressed the smb and mid-
           | enterprise space here because most large enterprises already
           | know they shouldn't run on a single rack.
        
             | secabeen wrote:
             | > I addressed the smb and mid-enterprise space here because
             | most large enterprises already know they shouldn't run on a
             | single rack.
             | 
             | This is a straw man. No one, anywhere in this thread or in
             | the OPs original article proposed a single-rack solution.
             | 
             | From the OP: > Running a primary and a backup server is
             | usually enough, keeping them in different datacenters.
        
               | nostrebored wrote:
               | This is just a complete lack of engagement with the post.
               | Most LE's know they shouldn't run a two rack setup
               | either. That is not the size or layout of any LE that
               | I've interacted with. The closest is a bank in the
               | developing world that had a few racks split across data
               | centers in the same city and was desperately trying to
               | move away given power instability in the country.
        
         | tboyd47 wrote:
         | Could confirmation bias affect your analysis at all?
         | 
         | How many companies went cloud-first and then ran out of money?
         | You wouldn't necessary know anything about them.
         | 
         | Were the scaling problems your single-server customers called
         | you to solve unpleasant enough put their core business in
         | danger? Or was the expense just a rounding error for them?
        
           | nostrebored wrote:
           | From this and the other comment, it looks like I wasn't clear
           | about talking about SMB/ME rather than a seed/pre-seed
           | startup, which I understand can be confusing given that we're
           | on HN.
           | 
           | I can tell you that I've never seen a company run out of
           | money from going cloud-first (sample size of over 200 that I
           | worked with directly). I did see multiple businesses scale
           | down their consumption to near-zero and ride out the
           | pandemic.
           | 
           | The answer to scaling problems being unpleasant enough to put
           | the business in danger is yes, but that was also during the
           | pandemic when companies needed to make pivots to slightly
           | different markets. Doing this was often unaffordable from an
           | implementation cost perspective at the time when it had to
           | happen. I've seen acquisitions fall through due to an
           | inability to meet technical requirements because of stateful
           | monstrosities. I've also seen top-line revenue get severely
           | impacted when resource contention causes outages.
           | 
           | The only times I've seen 'cloud-native' truly backfire were
           | when companies didn't have the technical experience to move
           | forward with these initiatives in-house. There are a lot of
           | partners in the cloud implementation ecosystem who will
           | fleece you for everything you have. One such example was a
           | k8s microservices shop with a single contract developer
           | managing the infra and a partner doing the heavy lifting. The
           | partner gave them the spiel on how cloud-native provides
           | flexibility and allows for reduced opex and the customer was
           | very into it. They stored images in a RDBMS. Their database
           | costs were almost 10% of the company's operating expenses by
           | the time the customer noticed that something was wrong.
        
       | stevenjgarner wrote:
       | If you are not maxing out or even getting above 50% utilization
       | of _128 physical cores (256 threads), 512 GB of memory, and 50
       | Gbps of bandwidth for $1,318 /month_, I really like the approach
       | of multiple low-end consumable computers as servers. I have been
       | using arrays of Intel NUCs at some customer sites for years with
       | considerable cost savings over cloud offerings. Keep an extra
       | redundant one in the array ready to swap out a failure.
       | 
       | Another often overlooked option is that in several fly-over
       | states it is quite easy and cheap to register as a public
       | telecommunication utility. This allows you to place a powered
       | pedestal in the public right-of-way, where you can get situated
       | adjacent to an optical meet point and get considerable savings on
       | installation costs of optical Internet, even from a tier 1
       | provider. If your server bandwidth is peak utilized during
       | business hours and there is an apartment complex nearby you can
       | use that utility designation and competitively provide
       | residential Internet service to offset costs.
        
         | warmwaffles wrote:
         | > I have been using arrays of Intel NUCs at some customer sites
         | for years
         | 
         | Stares at the 3 NUCs on my desk waiting to be clustered for a
         | local sandbox.
        
         | titzer wrote:
         | This is pretty devious and I love it.
        
         | tzs wrote:
         | I don't understand the pedestal approach. Do you put your
         | server in the pedestal, so the pedestal is in effect your data
         | center?
        
         | saulrh wrote:
         | > competitively provide residential       > Internet service to
         | offset costs.
         | 
         | I uh. Providing residential Internet for an apartment complex
         | feels like an entire business in and of itself and wildly out
         | of scope for a small business? That's a whole extra competency
         | and a major customer support commitment. Is there something I'm
         | missing here?
        
           | stevenjgarner wrote:
           | It depends on the scale - it does not have to be a major
           | undertaking. You are right, it is _a whole extra competency
           | and a major customer support commitment_ , but for a lot of
           | the entrepreneurial folk on HN quite a rewarding and
           | accessible learning experience.
           | 
           | The first time I did anything like this was in late 1984 in a
           | small town in Iowa where GTE was the local telecommunication
           | utility. Absolutely abysmal Internet service, nothing
           | broadband from them at the time or from the MSO (Mediacom). I
           | found out there was a statewide optical provider with cable
           | going through the town. I incorporated an LLC, became a
           | utility and built out less than 2 miles of single mode fiber
           | to interconnect some of my original software business
           | customers at first. Our internal moto was "how hard can it
           | be?" (more as a rebuke to GTE). We found out. The whole 24x7
           | public utility thing was very difficult for just a couple of
           | guys. But it grew from there. I left after about 20 years and
           | today it is a thriving provider.
           | 
           | Technology has made the whole process so much easier today. I
           | am amazed more people do not do it. You can get a small rack-
           | mount sheet metal pedestal with an AC power meter and an HVAC
           | unit for under $2k. Being a utility will allow you to place
           | that on a concrete pad or vault in the utility corridor
           | (often without any monthly fee from the city or county). You
           | place a few bollards around it so no one drives into it. You
           | want to get quotes from some tier 1 providers [0]. They will
           | help you identify the best locations to engineer an optical
           | meet and those are the locations you run by the
           | city/county/state utilities board or commission.
           | 
           | For a network engineer wanting to implement a fault tolerant
           | network, you can place multiple pedestals at different
           | locations on your provider's/peer's network to create a route
           | diversified protected network.
           | 
           | After all, when you are buying expensive cloud based services
           | that literally is all your cloud provider is doing ... just
           | on a completely more massive scale. The barrier to entry is
           | not as high as you might think. You have technology offerings
           | like OpenStack [1], where multiple competitive vendors will
           | also help you engineer a solution. The government also
           | provides (financial) support [2].
           | 
           | The best perk is the number of parking spaces the requisite
           | orange utility traffic cone opens up for you.
           | 
           | [0] https://en.wikipedia.org/wiki/Tier_1_network
           | 
           | [1] https://www.openstack.org/
           | 
           | [2] https://www.usda.gov/reconnect
        
             | MockObject wrote:
             | In 1984, I am guessing the only use case for broadband
             | internet was running an NNTP server?
        
             | marktangotango wrote:
             | This is some old school stuff right here. I have a hard
             | time believing this sort of gumption and moxy are as
             | prevalent today.
             | 
             | > The best perk is the number of parking spaces the
             | requisite orange utility traffic cone opens up for you.
             | 
             | That's hilarious.
        
           | bombcar wrote:
           | You're missing "apartment complex" - you as the service
           | provider contract with the apartment management company to
           | basically cover your costs, and they handle the day-to-day
           | along with running the apartment building.
           | 
           | Done right, it'll be cheaper for them (they can advertise
           | "high speed internet included!" or whatever) and you won't
           | have much to do assuming everything on your end just works.
           | 
           | The days where small ISPs provided things like email, web
           | hosting, etc, are long gone; you're just providing a DHCP IP
           | and potentially not even that if you roll out carrier-grade
           | NAT.
        
         | erichocean wrote:
         | > _it is quite easy and cheap to register as a public
         | telecommunication utility_
         | 
         | Is North Carolina one of those states? I'm intrigued...
        
           | stevenjgarner wrote:
           | I have only done a few midwestern states. Call them and ask
           | [0] - (919) 733-7328. You may want to first call your
           | proposed county commissioner's office or city hall (if you
           | are not rural), and ask them who to talk with about a new
           | local business providing Internet service. If you can show
           | the Utilities Commission that you are working with someone at
           | the local level I have found they will treat you more
           | seriously. In certain rural counties, you can even qualify
           | for funding from the Rural Utilities Service of the USDA.
           | 
           | [0] https://www.ncuc.net/
           | 
           | EDIT: typos + also most states distinguish between
           | facilities-based ISP's (ie with physical plant in the
           | regulated public right-of-way) and other ISPs. Tell them you
           | are looking to become a facilities-based ISP.
        
             | xen2xen1 wrote:
             | What other benefits are there to being a "public
             | telecommunication utility"?
        
               | stevenjgarner wrote:
               | The benefit that is obvious to the regulators is that you
               | can charge money for services. So for example, offering
               | telephone services requires being a LEC (local exchange
               | carrier) or CLEC (competitive local exchange carrier).
               | But even telephone services have become considerably
               | unregulated through VoIP. It's just that at some point,
               | the VoIP has to terminate/interface with a (C)LEC
               | offering real dial tone and telephone numbering. You can
               | put in your own Asterisk server [0] and provide VoIP
               | service on your burgeoning optical utilities network,
               | together with other bundled services including
               | television, movies, gaming, metering etc.. All of these
               | offerings can be resold from wholesale services, where
               | all you need is an Internet feed.
               | 
               | Other benefits to being a "public telecommunication
               | utility" include the competitive right to place your own
               | facilities on telephone/power poles or underground in
               | public right-of-way under the Telecommunications Act of
               | 1996. You will need to enter into and pay for a pole
               | attachment agreement. Of course local governments can
               | reserve the right to tariff your facilities, which has
               | its own ugliness.
               | 
               | One potentially valuable thing a utility can do is place
               | empty conduit in public right of way that can be
               | used/resold in the future at a (considerable) gain. For
               | example, before highways, roadways, airports and other
               | infrastructure is built, it is orders of magnitude
               | cheaper just to plow conduit under bare ground before the
               | improvements are placed.
               | 
               | [0] https://www.asterisk.org/
        
               | eek2121 wrote:
               | > Other benefits to being a "public telecommunication
               | utility" include the competitive right to place your own
               | facilities on telephone/power poles or underground in
               | public right-of-way under the Telecommunications Act of
               | 1996. You will need to enter into and pay for a pole
               | attachment agreement. Of course local governments can
               | reserve the right to tariff your facilities, which has
               | its own ugliness.
               | 
               | Note that in many parts of the country, the
               | telcos/cablecos themselves own the poles. Google had a
               | ton of trouble with AT&T in my state thanks to this. They
               | lost to AT&T in court and gave up.
        
               | count wrote:
               | While VOIP is mostly unregulated, be acutely aware of
               | e-911 laws and requirements. This isn't the Wild West
               | shitshow it was in 2003 when I was doing similar things
               | :)
               | 
               | https://www.intrado.com/life-safety/e911-regulations has
               | a good overview and links to applicable CFR/rules.
        
             | erichocean wrote:
             | Thanks!
        
               | stevenjgarner wrote:
               | Feel free to reach out at my gmail [0]
               | 
               | [0] https://news.ycombinator.com/user?id=stevenjgarner
        
       | cfors wrote:
       | Yep, there's a premium on making your architecture more cloudy.
       | However, the best point for Use One Big Server is not necessarily
       | running your big monolithic API server, but your database.
       | 
       | Use One Big Database.
       | 
       | Seriously. If you are a backend engineer, nothing is worse than
       | breaking up your data into self contained service databases,
       | where everything is passed over Rest/RPC. Your product asks will
       | consistently want to combine these data sources (they don't know
       | how your distributed databases look, and oftentimes they really
       | do not care).
       | 
       | It is so much easier to do these joins efficiently in a single
       | database than fanning out RPC calls to multiple different
       | databases, not to mention dealing with inconsistencies, lack of
       | atomicity, etc. etc. Spin up a specific reader of that database
       | if there needs to be OLAP queries, or use a message bus. But keep
       | your OLTP data within one database for as long as possible.
       | 
       | You can break apart a stateless microservice, but there are few
       | things as stagnant in the world of software than data. It will
       | keep you nimble for new product features. The boxes that they
       | offer on cloud vendors today for managed databases are giant!
        
         | s_dev wrote:
         | >Use One Big Database.
         | 
         | It may be reasonable to have two databases e.g. a class a and
         | class b for pci compliance. So context still deeply matters.
         | 
         | Also having a dev DB with mock data and a live DB with real
         | data is a common setup in many companies.
        
         | belak wrote:
         | This is absolutely true - when I was at Bitbucket (ages ago at
         | this point) and we were having issues with our DB server
         | (mostly due to scaling), almost everyone we talked to said "buy
         | a bigger box until you can't any more" because of how complex
         | (and indirectly expensive) the alternatives are - sharding and
         | microservices both have a ton more failure points than a single
         | large box.
         | 
         | I'm sure they eventually moved off that single primary box, but
         | for many years Bitbucket was run off 1 primary in each
         | datacenter (with a failover), and a few read-only copies. If
         | you're getting to the point where one database isn't enough,
         | you're either doing something pretty weird, are working on a
         | specific problem which needs a more complicated setup, or have
         | grown to the point where investing in a microservice
         | architecture starts to make sense.
        
           | thayne wrote:
           | One issue I've seen with this is that if you have a single,
           | very large database, it can take a very, very long time to
           | restore from backups. Or for that matter just taking backups.
           | 
           | I'd be interested to know if anyone has a good solution for
           | that.
        
             | rszorness wrote:
             | Try out pg_probackup. It works on database files directly.
             | Restore is as fast as you can write on your ssd.
             | 
             | I've setup a pgsql server with timescaledb recently.
             | Continuing backup based on WAL takes seconds each hour and
             | a complete restore takes 15 minutes for almost 300 GB of
             | data because the 1 GBit connection to the backup server is
             | the bottleneck.
        
             | Svenstaro wrote:
             | I found this approach pretty cool in that regard:
             | https://github.com/pgbackrest/pgbackrest
        
             | dsr_ wrote:
             | Here's the way it works for, say, Postgresql:
             | 
             | - you rsync or zfs send the database files from machine A
             | to machine B. You would like the database to be off during
             | this process, which will make it consistent. The big
             | advantage of ZFS is that you can stop PG, snapshot the
             | filesystem, and turn PG on again immediately, then send the
             | snapshot. Machine B is now a cold backup replica of A. Your
             | loss potential is limited to the time between backups.
             | 
             | - after the previous step is completed, you arrange for
             | machine A to send WAL files to machine B. It's well
             | documented. You could use rsync or scp here. It happens
             | automatically and frequently. Machine B is now a warm
             | replica of A -- if you need to turn it on in an emergency,
             | you will only have lost one WAL file's worth of changes.
             | 
             | - after that step is completed, you give machine B
             | credentials to login to A for live replication. Machine B
             | is now a live, very slightly delayed read-only replica of
             | A. Anything that A processes will be updated on B as soon
             | as it is received.
             | 
             | You can go further and arrange to load balance requests
             | between read-only replicas, while sending the write
             | requests to the primary; you can look at Citus (now open
             | source) to add multi-primary clustering.
        
               | hamandcheese wrote:
               | Do you even have to stop Postgres if using ZFS snapshots?
               | ZFS snapshots are atomic, so I'd expect that to be fine.
               | If it wasn't fine, that would also mean Postgres couldn't
               | handle power failure or other sudden failures.
        
               | dsr_ wrote:
               | You have choices.
               | 
               | * shut down PG. Gain perfect consistency.
               | 
               | * use pg_dump. Perfect consistency at the cost of a
               | longer transaction. Gain portability for major version
               | upgrades.
               | 
               | * Don't shut down PG: here's what the manual says:
               | 
               | However, a backup created in this way saves the database
               | files in a state as if the database server was not
               | properly shut down; therefore, when you start the
               | database server on the backed-up data, it will think the
               | previous server instance crashed and will replay the WAL
               | log. This is not a problem; just be aware of it (and be
               | sure to include the WAL files in your backup). You can
               | perform a CHECKPOINT before taking the snapshot to reduce
               | recovery time.
               | 
               | * Midway: use SELECT pg_start_backup('label', false,
               | false); and SELECT * FROM pg_stop_backup(false, true); to
               | generate WAL files while you are running the backup, and
               | add those to your backup.
        
               | mgiampapa wrote:
               | This isn't really a backup, it's redundancy which is good
               | thing but not the same as a backup solution. You can't
               | get out of a drop table production type event this way.
        
               | hamandcheese wrote:
               | If you stop at the first bullet point then you have a
               | backup solution.
        
               | dsr_ wrote:
               | Precisely so.
        
               | thayne wrote:
               | It doesn't solve the problem that sending that snapshot
               | to a backup location takes a long time.
        
               | maxclark wrote:
               | Going back 20 years with Oracle DB it was common to use
               | "triple mirror" on storage to make a block level copy of
               | the database. Lock the DB for changes, flush the logs,
               | break the mirror. You now have a point in time copy of
               | the database that could be mounted by a second system to
               | create a tape backup, or as a recovery point to restore.
               | 
               | It was the way to do it, and very easy to manage.
        
               | Twisell wrote:
               | The previous commenter was probably unaware of the
               | various way to backup recent postgresql release.
               | 
               | For what you describe a "point in time recovery" backup
               | would probably be the more adequate flavor
               | https://www.postgresql.org/docs/current/continuous-
               | archiving...
               | 
               | It was first release around 2010 and gained robustness
               | with every release hence not everyone is aware of it.
               | 
               | The for instance I don't think it's really required
               | anymore to shutdown the database to do the initial sync
               | if you use the proper tooling (for instance pg_basebackup
               | if I remember correctly)
        
             | mike_hearn wrote:
             | Presumably it doesn't matter if you break your DB up into
             | smaller DBs, you still have the same amount of data to back
             | up no matter what. However, now you also have the problem
             | of snapshot consistency to worry about.
             | 
             | If you need to backup/restore just one set of tables, you
             | can do that with a single DB server without taking the rest
             | offline.
        
               | thayne wrote:
               | > you still have the same amount of data to back up no
               | matter what
               | 
               | But you can restore/back up the databases in parallel.
               | 
               | > If you need to backup/restore just one set of tables,
               | you can do that with a single DB server without taking
               | the rest offline.
               | 
               | I'm not aware of a good way to restore just a few tables
               | from a full db backup. At least that doesn't require
               | copying over all the data (because the backup is stored
               | over the network, not on a local disk). And that may be
               | desirable to recover from say a bug corrupting or
               | deleting a customer's data.
        
             | nick__m wrote:
             | On mariadb you can tell the replica to enter into a
             | snapshotable state[1] and take a simple lvm snapshot, tell
             | the the database it's over, backup your snapshot somewhere
             | else and finally delete the snapshot.
             | 
             | 1) https://mariadb.com/kb/en/storage-snapshots-and-backup-
             | stage...
        
           | altdataseller wrote:
           | What if your product simply stores a lot of data (ie a search
           | engine) How is that weird?
        
             | skeeter2020 wrote:
             | This is not typically going to be stored in an ACID-
             | compliant RDBMS, which is where the most common scaling
             | problem occurs. Search engines, document stores, adtech,
             | eventing, etc. are likely going to have a different storage
             | mechanism where consistency isn't as important.
        
             | rmbyrro wrote:
             | a search engine won't need joins, but other things (ie text
             | indexing) that can be split in a relatively easier way.
        
             | belak wrote:
             | That's fair - I added "are working on a specific problem
             | which needs a more complicated setup" to my original
             | comment as a nicer way of referring to edge cases like
             | search engines. I still believe that 99% of applications
             | would function perfectly fine with a single primary DB.
        
             | zasdffaa wrote:
             | Depends what you mean by a database I guess. I take it to
             | mean an RDBMS.
             | 
             | RDBMSs provide guarantees that web searching doesn't need.
             | You can afford to lose a pieces of data, provide not-quite-
             | perfect results for web stuff. It's just wrong for an
             | RDBMS.
        
               | altdataseller wrote:
               | What if you are using the database as a system of record
               | to index into a real search engine like Elasticsearch?
               | For a product where you have tons of data to search from
               | (ie text from web pages)
        
               | IggleSniggle wrote:
               | In regards to Elasticsearch, you basically opt-in to
               | which behavior you want/need. You end up in the same
               | place: potentially losing some data points or introducing
               | some "fuzziness" to the results in exchange for speed.
               | When you ask Elasticsearch to behave in a guaranteed
               | atomic manner across all records, performing locks on
               | data, you end up with similar constraints as in a RDBMS.
               | 
               | Elasticsearch is for search.
               | 
               | If you're asking about "what if you use an RDBMS as a
               | pointer to Elasticsearch" then I guess I would ask: why
               | would you do this? Elasticsearch can be used as a system
               | of record. You could use an RDBMS over top of
               | Elasticsearch without configuring Elasticsearch as a
               | system of record, but then you would be lying when you
               | refer to your RDBMS as a "system of record." It's not a
               | "system of record" for your actual data, just a record of
               | where pointers to actual data were at one point in time.
               | 
               | I feel like I must be missing what you're suggesting
               | here.
        
               | altdataseller wrote:
               | Having just an Elasticsearch index without also having
               | the data in a primary store like a RDMS is an anti-
               | pattern and not recommended by almost all experts.
               | Whether you want to call it a "system of record", i wont
               | argue semantics. But the point is, its recommended hacing
               | your data in a primary store where you can index into
               | elasticsearch.
        
         | ladyattis wrote:
         | At my current job we have four different databases so I concur
         | with this assessment. I think it's okay to have some data in
         | different DBs if they're significantly different like say the
         | user login data could be in its own database. But anything that
         | we do which is a combination of e-commerce and
         | testing/certification I think they should be in one big
         | database so I can do reasonable queries for information that we
         | need. This doesn't include two other databases we have on-prem
         | which one is a Salesforce setup and another is an internal
         | application system that essentially marries Salesforce to that.
         | It's a weird wild environment to navigate when adding features.
        
         | jasonwatkinspdx wrote:
         | A relative worked for a hedge fund that used this idea. They
         | were a C#/MSSQL shop, so they just bought whatever was the
         | biggest MSSQL server at the time, updating frequently. They
         | said it was a huge advantage, where the limit in scale was more
         | than offset by productivity.
         | 
         | I think it's an underrated idea. There's a lot of people out
         | there building a lot of complexity for datasets that in the end
         | are less than 100 TB.
         | 
         | But it also has limits. Infamously Twitter delayed going to a
         | sharded architecture a bit too long, making it more of an ugly
         | migration.
        
           | manigandham wrote:
           | Server hardware is so cheap and fast today that 99% of
           | companies will never hit that limit in scale either.
        
         | AtNightWeCode wrote:
         | If you get your services right there is little or no
         | communications between the services since a microservice should
         | have all the data it needs in it's own store.
        
         | HeavyStorm wrote:
         | > they don't know how your distributed databases look, and
         | oftentimes they really do not care
         | 
         | Nor should they.
        
         | markandrewj wrote:
         | Just FYI, you can have one big database, without running it on
         | one big server. As an example, databases like Cassandra are
         | designed to be scaled horizontally (i.e. scale out, instead of
         | scale up).
         | 
         | https://cassandra.apache.org/_/cassandra-basics.html
        
           | 1500100900 wrote:
           | Cassandra may be great when you have to scale your database
           | that you no longer develop significantly. The problem with
           | this DB system is that you have to know all the queries
           | before you can define the schema.
        
             | threeseed wrote:
             | > The problem with this DB system is that you have to know
             | all the queries before you can define the schema
             | 
             | Not true.
             | 
             | You just need to optimise your schema if you want the best
             | performance. Exactly the same as an RDBMS.
        
           | mdasen wrote:
           | There are trade-offs when you scale horizontally even if a
           | database is designed for it. For example, DataStax's Storage
           | Attached Indexes or Cassandra's hidden-table secondary
           | indexing allow for indexing on columns that aren't part of
           | the clustering/partitioning, but when you're reading you're
           | going to have to ask all the nodes to look for something if
           | you aren't including a clustering/partitioning criteria to
           | narrow it down.
           | 
           | You've now scaled out, but you now have to ask each node when
           | searching by secondary index. If you're asking every node for
           | your queries, you haven't really scaled horizontally. You've
           | just increased complexity.
           | 
           | Now, maybe 95% of your queries can be handled with a
           | clustering key and you just need secondary indexes to handle
           | 5% of your stuff. In that case, Cassandra does offer an easy
           | way to handle that last 5%. However, it can be problematic if
           | people take shortcuts too much and you end up putting too
           | much load on the cluster. You're also putting your latency
           | for reads at the highest latency of all the machines in your
           | cluster. For example, if you have 100 machines in your
           | cluster with a mean response time of 2ms and a 99th
           | percentile response time of 150ms, you're potentially going
           | to be providing a bad experience to users waiting on that
           | last box on secondary index queries.
           | 
           | This isn't to say that Cassandra isn't useful - Cassandra has
           | been making some good decisions to balance the problems
           | engineers face. However, it does come with trade-offs when
           | you distribute the data. When you have a well-defined
           | problem, it's a lot easier to design your data for efficient
           | querying and partitioning. When you're trying to figure
           | things out, the flexibility of a single machine and much
           | cheaper secondary index queries can be important - and if you
           | hit a massive scale, you figure out how you want to partition
           | it then.
        
             | markandrewj wrote:
             | Cassandra was just an example, but most databases can be
             | scaled either vertically or horizontally via sharding. You
             | are right if misconfigured performance can be hindered, but
             | this is also true for a database which is being scaled
             | vertically. Generally speaking you will get better
             | performance if you have a large dataset by growing
             | horizontally then you would by growing vertically.
             | 
             | https://stackoverflow.blog/2022/03/14/how-sharding-a-
             | databas...
        
         | robertlagrant wrote:
         | > Your product asks will consistently want to combine these
         | data sources (they don't know how your distributed databases
         | look, and oftentimes they really do not care).
         | 
         | I'm not sure how to parse this. What should "asks" be?
        
           | cfors wrote:
           | The feature requests (asks) that product wants to build -
           | sorry for the confusion there.
        
           | delecti wrote:
           | The phrase "Your product asks will consistently " can be de-
           | abbreviated to "product owners/product managers you work with
           | will consistently request".
        
         | wefarrell wrote:
         | "Your product asks will consistently want to combine these data
         | sources (they don't know how your distributed databases look,
         | and oftentimes they really do not care)."
         | 
         | This isn't a problem if state is properly divided along the
         | proper business domain and the people who need to access the
         | data have access to it. In fact many use cases require it -
         | publicly traded companies can't let anyone in the organization
         | access financial info and healthcare companies can't let anyone
         | access patient data. And of course are performance concerns as
         | well if anyone in the organization can arbitrarily execute
         | queries on any of the organization's data.
         | 
         | I would say YAGNI applies to data segregation as well and
         | separations shouldn't be introduced until they are necessary.
        
           | Mavvie wrote:
           | "combine these data sources" doesn't necessarily mean data
           | analytics. Just as an example, it could be something like
           | "show a badge if it's the user's birthday", which if you had
           | a separate microservice for birthdays would be much harder
           | than joining a new table.
        
             | wefarrell wrote:
             | Replace "people" with "features" and my comment still
             | holds. As software, features, and organizations become more
             | complex the core feature data becomes a smaller and smaller
             | proportion of the overall state and that's when
             | microservices and separate data stores become necessary.
        
         | lmm wrote:
         | If you do this then you'll have the hardest possible migration
         | when the time comes to split it up. It will take you literally
         | years, perhaps even a decade.
         | 
         | Shard your datastore from day 1, get your dataflow right so
         | that you don't need atomicity, and it'll be painless and scale
         | effortlessly. More importantly, you won't be able to paper over
         | crappy dataflow. It's like using proper types in your code:
         | yes, it takes a bit more effort up-front compared to just
         | YOLOing everything, but it pays dividends pretty quickly.
        
           | riku_iki wrote:
           | > Shard your datastore from day 1
           | 
           | what about using something like cocroach from day 1?
        
             | lmm wrote:
             | I don't know the characteristics of bikesheddb's upstream
             | in detail (if there's ever a production-quality release of
             | bikesheddb I'll take another look), but in general using
             | something that can scale horizontally (like Cassandra or
             | Riak, or even - for all its downsides - MongoDB) is a great
             | approach - I guess it's a question of terminology whether
             | you call that "sharding" or not. Personally I prefer that
             | kind of datastore over an SQL database.
        
               | riku_iki wrote:
               | > over an SQL database
               | 
               | it is actually distributed SQL Db with auto sharding,
               | their goal is to be SQL compatible with Postgres.
        
           | Rantenki wrote:
           | This is true IFF you get to the point where you have to split
           | up.
           | 
           | I know we're all hot and bothered about getting our apps to
           | scale up to be the next unicorn, but most apps never need to
           | scale past the limit of a single very high-performance
           | database. For most people, this single huge DB is sufficient.
           | 
           | Also, for many (maybe even most) applications, designated
           | outages for maintenance are not only acceptable, but industry
           | standard. Banks have had, and continue to have designated
           | outages all the time, usually on weekends when the impact is
           | reduced.
           | 
           | Sure, what I just wrote is bad advice for mega-scale SaaS
           | offerings with millions of concurrent users, but most of us
           | aren't building those, as much as we would like to pretend
           | that we are.
           | 
           | I will say that TWO of those servers, with some form of
           | synchronous replication, and point in time snapshots, are
           | probably a better choice, but that's hair-splitting.
           | 
           | (and I am a dyed in the wool microservices, scale-out Amazon
           | WS fanboi).
        
             | lmm wrote:
             | > I know we're all hot and bothered about getting our apps
             | to scale up to be the next unicorn, but most apps never
             | need to scale past the limit of a single very high-
             | performance database. For most people, this single huge DB
             | is sufficient.
             | 
             | True _if_ the reliability is good enough. I agree that many
             | organisations will never get to the scale where they need
             | it as a performance /data size measure, but you often will
             | grow past the reliability level that's possible to achieve
             | on a single node. And it's worth saying that the various
             | things that people do to mitigate these problems - read
             | replicas, WAL shipping, and all that - can have a pretty
             | high operational cost. Whereas if you just slap in a
             | horizontal autoscaling datastore with true master-master HA
             | from day 1, you bypass all of that trouble and just never
             | worry about it.
             | 
             | > Also, for many (maybe even most) applications, designated
             | outages for maintenance are not only acceptable, but
             | industry standard. Banks have had, and continue to have
             | designated outages all the time, usually on weekends when
             | the impact is reduced.
             | 
             | IME those are a minority of applications. Anything
             | consumer-facing, you absolutely do lose out (and even if
             | it's not a serious issue in itself, it makes you look bush-
             | league) if someone can't log into your system at 5AM on
             | Sunday. Even if you're B2B, if your clients are serving
             | customers then they want you to be online whenever their
             | customers are.
        
         | johnbellone wrote:
         | I agree with this sentiment but it is often misunderstood as a
         | means to force everything into a single database schema. More
         | people need to learn about logically separating schemas with
         | their database servers!
        
         | clairity wrote:
         | > "Use One Big Database."
         | 
         | yah, this is something i learned when designing my first server
         | stack (using sun machines) for a real business back during the
         | dot-com boom/bust era. our single database server was the
         | beefiest machine by far in the stack, 5U in the rack (we also
         | had a hot backup), while the other servers were 1U or 2U in
         | size. most of that girth was for memory and disk space, with
         | decent but not the fastest processors.
         | 
         | one big db server with a hot backup was our best tradeoff for
         | price, performance, and reliability. part of the mitigation was
         | that the other servers could be scaled horizontally to
         | compensate for a decent amount of growth without needing to
         | scale the db horizontally.
        
         | FpUser wrote:
         | >"Use One Big Database."
         | 
         | I do, it is running on the same big (relatively) server as my
         | native C++ backend talking to the database. The performance
         | smokes your standard cloudy setup big time. Serving thousand
         | requests per second on 16 core without breaking sweat. I am all
         | for monoliths running on real no cloudy hardware. As long as
         | the business scale is reasonable and does not approach FAANG
         | (like for 90% of the businesses) this solution is superior to
         | everything else money, maintenance, development time wise.
        
         | BenoitEssiambre wrote:
         | I'm glad this is becoming conventional wisdom. I used to argue
         | this in these pages a few years ago and would get downvoted
         | below the posts telling people to split everything into
         | microservices separated by queues (although I suppose it's
         | making me lose my competitive advantage when everyone else is
         | building lean and mean infrastructure too).
         | 
         | In my mind, reasons involve keeping transactional integrity,
         | ACID compliance, better error propagation, avoiding the
         | hundreds of impossible to solve roadblocks of distributed
         | systems (https://groups.csail.mit.edu/tds/papers/Lynch/MIT-LCS-
         | TM-394...).
         | 
         | But also it is about pushing the limits of what is physically
         | possible in computing. As Admiral Grace Hopper would point out
         | (https://www.youtube.com/watch?v=9eyFDBPk4Yw ) doing distance
         | over network wires involves hard latency constraints, not to
         | mention dealing with congestions over these wires.
         | 
         | Physical efficiency is about keeping data close to where it's
         | processed. Monoliths can make much better use of L1, L2, L3,
         | and ram caches than distributed systems for speedups often in
         | the order of 100X to 1000X.
         | 
         | Sure it's easier to throw more hardware at the problem with
         | distributed systems but the downsides are significant so be
         | sure you really need it.
         | 
         | Now there is a corollary to using monoliths. Since you only
         | have one db, that db should be treated as somewhat sacred, you
         | want to avoid wasting resources inside it. This means being a
         | bit more careful about how you are storing things, using the
         | smallest data structures, normalizing when you can etc. This is
         | not to save disk, disk is cheap. This is to make efficient use
         | of L1,L2,L3 and ram.
         | 
         | I've seen boolean true or false values saved as large JSON
         | documents. {"usersetting1": true, "usersetting2":fasle
         | "setting1name":"name" etc.} with 10 bits of data ending up as a
         | 1k JSON document. Avoid this! Storing documents means, the
         | keys, the full table schema is in every row. It has its uses
         | but if you can predefine your schema and use the smallest types
         | needed, you are gaining much performance mostly through much
         | higher cache efficiency!
        
           | Swizec wrote:
           | > I'm glad this is becoming conventional wisdom
           | 
           | My hunch is that computers caught up. Back in the early
           | 2000's horizontal scaling was the only way. You simply
           | couldn't handle even reasonably mediocre loads on a single
           | machine.
           | 
           | As computing becomes cheaper, horizontal scaling is starting
           | to look more and more like unnecessary complexity for even
           | surprisingly large/popular apps.
           | 
           | I mean you can buy a consumer off-the-shelf machine with
           | 1.5TB of memory these days. 20 years ago, when microservices
           | started gaining popularity, 1.5TB RAM in a single machine was
           | basically unimaginable.
        
           | FpUser wrote:
           | >"I'm glad this is becoming conventional wisdom. "
           | 
           | Yup, this is what I've always done and it works wonders.
           | Since I do not have bosses, just a clients I do not give a
           | flying fuck about latest fashion and do what actually makes
           | sense for me and said clients.
        
           | tsmarsh wrote:
           | 'over the wire' is less obvious than it used to be.
           | 
           | If you're in k8s pod, those calls are really kernel calls.
           | Sure you're serializing and process switching where you could
           | be just making a method call, but we had to do something.
           | 
           | I'm seeing less 'balls of mud' with microservices. Thats not
           | zero balls of mud. But its not a given for almost every code
           | base I wander into.
        
           | threeseed wrote:
           | > I'm glad this is becoming conventional wisdom
           | 
           | It's not though. You're just seeing the most popular opinion
           | on HN.
           | 
           | In reality it is nuanced like most real-world tech decisions
           | are. Some use cases necessitate a distributed or sharded
           | database, some work better with a single server and some are
           | simply going to outsource the problem to some vendor.
        
         | rbanffy wrote:
         | > Use One Big Database.
         | 
         | I emphatically disagree.
         | 
         | I've seen this evolve into tightly coupled microservices that
         | could be deployed independently in theory, but required
         | exquisite coordination to work.
         | 
         | If you want them to be on a single server, that's fine, but
         | having multiple databases or schemas will help enforce
         | separation.
         | 
         | And, if you need one single place for analytics, push changes
         | to that space asynchronously.
         | 
         | Having said that, I've seen silly optimizations being employed
         | that make sense when you are Twitter, and to nobody else. Slice
         | services up to the point they still do something meaningful in
         | terms of the solution and avoid going any further.
        
           | marcosdumay wrote:
           | Yeah... Dividing your work into microservices while your data
           | is in an interdependent database doesn't lead to great
           | results.
           | 
           | If you are creating microservices, you must segment them all
           | the way through.
        
             | zmmmmm wrote:
             | I have to say I disagree with this ... you can only
             | separate them if they are really, truly independent. Trying
             | to separate things that are actually coupled will quickly
             | take you on a path to hell.
             | 
             | The problem here is that most of the microservice
             | architecture divisions are going to be driven by Conway's
             | law, not what makes any technical sense. So if you insist
             | on separate databases per microservice, you're at high risk
             | of ending up with massive amounts of duplicated and
             | incoherent state models and half the work of the team
             | devoted to synchronizing between them.
             | 
             | I quite like an architecture where services are split
             | _except_ the database, which is considered a service of its
             | own.
        
           | Joeri wrote:
           | I have done both models. My previous job we had a monolith on
           | top of a 1200 table database. Now I work in an ecosystem of
           | 400 microservices, most with their own database.
           | 
           | What it fundamentally boils down to is that your org chart
           | determines your architecture. We had a single team in charge
           | of the monolith, and it was ok, and then we wanted to add
           | teams and it broke down. On the microservices architecture,
           | we have many teams, which can work independently quite well,
           | until there is a big project that needs coordinated changes,
           | and then the fun starts.
           | 
           | Like always there is no advice that is absolutely right.
           | Monoliths, microservices, function stores. One big server vs
           | kubernetes. Any of those things become the right answer in
           | the right context.
           | 
           | Although I'm still in favor of starting with a modular
           | monolith and splitting off services when it becomes apparent
           | they need to change at a different pace from the main body.
           | That is right in most contexts I think.
        
             | zmmmmm wrote:
             | > splitting off services when it becomes apparent they need
             | to change at a different pace from the main body
             | 
             | yes - this seems to get lost, but the microservice argument
             | is no different to the bigger picture software design in
             | general. When things change independently, separate and
             | decouple them. It works in code and so there is no reason
             | it shouldn't apply at the infrastructure layer.
             | 
             | If I am responsible for the FooBar and need to update it
             | once a week and know I am not going to break the FroggleBot
             | or the Bazlibee which are run by separate teams who don't
             | care about my needs and update their code once a year, hell
             | yeah I want to develop and deploy it as a separate service.
        
           | manigandham wrote:
           | There's no need for "microservices" in the first place then.
           | That's just logical groupings of functionality that can be
           | separate as classes, namespaces or other modules without
           | being entirely separate processes with a network boundary.
        
           | danpalmer wrote:
           | To clarify the advice, at least how I believe it should be
           | done...
           | 
           | Use One Big Database Server...
           | 
           | ... and on it, use one software database per application.
           | 
           | For example, one Postgres server can host many databases that
           | are mostly* independent from each other. Each application or
           | service should have its own database and be unaware of the
           | others, communicating with them via the services if
           | necessary. This makes splitting up into multiple database
           | servers fairly straightforward if needed later. In reality
           | most businesses will have a long tail of tiny databases that
           | can all be on the same server, with only bigger databases
           | needing dedicated resources.
           | 
           | *you can have interdependencies when you're using deep
           | features sometimes, but in an application-first development
           | model I'd advise against this.
        
             | goodoldneon wrote:
             | OP mentioned joining, so they were definitely talking about
             | a single database
        
               | danpalmer wrote:
               | You can still do a ton of joining.
               | 
               | I'd start with a monolith, that's a single app, single
               | database, single point of ownership of the data model,
               | and a ton of joins.
               | 
               | Then as services are added after the monolith they can
               | still use the main database for ease of infra
               | development, simpler backups and replication, etc. but
               | those wouldn't be able to be joined because they're
               | cross-service.
        
               | [deleted]
        
               | riquito wrote:
               | Not suggesting it, but for the sake of knowledge you can
               | join tables living in different databases, as long as
               | they are on the same server (e.g. mysql, postgresql, SQL
               | server supports it - doesn't necessarily come for free)
        
               | yellowapple wrote:
               | In PostgreSQL's case, it doesn't even need to be the same
               | server: https://www.postgresql.org/docs/current/postgres-
               | fdw.html
        
             | giardini wrote:
             | _" >Use One Big Database Server...
             | 
             | ... and on it, use one software database per
             | application.<"_
             | 
             | FWIW that is how it is usually is done(and has been done
             | for decades) on mainframes (IBM & UNISYS).
             | 
             | -----------------------
             | 
             |  _" Plus ca change, plus c'est la meme chose."_
             | 
             | English: _" the more things change, the more they stay the
             | same."_
             | 
             | - old French expression.
        
         | ryanisnan wrote:
         | Definitely use a big database, until you can't. My advice to
         | anyone starting with a relational data store is to use a proxy
         | from day 1 (or some point before adding something like that
         | becomes scary).
         | 
         | When you need to start sharding your database, having a proxy
         | is like having a super power.
        
           | chromatin wrote:
           | Are there postgres proxies that can specifically facilitate
           | sharding / partitioning later?
        
           | _ben_ wrote:
           | Disclaimer: I am the founder of PolyScale [1].
           | 
           | We see both use cases: single large database vs multiple
           | small, decoupled. I agree with the sentiment that a large
           | database offer simplicity, until access patterns change.
           | 
           | We focus on distributing database data to the edge using
           | caching. Typically this eliminates read-replicas and a lot of
           | the headache that goes with app logic rewrites or scaling
           | "One Big Database".
           | 
           | [1] https://www.polyscale.ai/
        
         | bartread wrote:
         | Not to mention, backups, restores, and disaster recovery are so
         | much easier with One Big Database(tm).
        
           | 1500100900 wrote:
           | How is backup restoration any easier if your whole PostgreSQL
           | cluster goes back in time when you only wanted to rewind that
           | one tenant?
        
             | fleddr wrote:
             | Your scenario is data recovery, not backup restoration.
             | Wildly different things.
        
         | wizofaus wrote:
         | Surely having separate DBs all sit on the One Big Server is
         | preferable in many cases. For cases where you really to extract
         | large amounts of data that is derived from multiple DBs,
         | there's no real harm in having some cross-DB joins defined in
         | views somewhere. If there are sensible logical ways to break a
         | monolithic service into component stand-alone services, and
         | good business reasons to do (or it's already been designed that
         | way), then having each talk to their own DB on a shared server
         | should be able to scale pretty well.
        
         | abraae wrote:
         | Another area for consolidation is auth. Use one giant keycloak,
         | with individual realms for every one of the individual apps you
         | are running. Your keycloak is back ended by your one giant
         | database.
        
         | doctor_eval wrote:
         | I agree that 1BDB is a good idea, but having one ginormous
         | schema has its own costs. So I still think data should be
         | logically partitioned between applications/microservices - in
         | PG terms, one "cluster" but multiple "databases".
         | 
         | We solved the problem of collecting data from the various
         | databases for end users by having a GraphQL layer which could
         | integrate all the data sources. This turned out to be
         | absolutely awesome. You could also do something similar using
         | FDW. The effort was not significant relative to the size of the
         | application.
         | 
         | The benefits of this architecture were manifold but one of the
         | main ones is that it reduces the complexity of each individual
         | database, which dramatically improved performance, and we knew
         | that if we needed more performance we could pull those
         | individual databases out into their own machine.
        
         | throwaway894345 wrote:
         | I'm pretty happy to pay a cloud provider to deal with managing
         | databases and hosts. It doesn't seem to cause me much grief,
         | and maybe I could do it better but my time is worth more than
         | our RDS bill. I can always come back and Do It Myself if I run
         | out of more valuable things to work on.
         | 
         | Similarly, paying for EKS or GKE or the higher-level container
         | offerings seems like a much better place to spend my resources
         | than figuring out how to run infrastructure on bare VMs.
         | 
         | Every time I've seen a normal-sized firm running on VMs, they
         | have one team who is responsible for managing the VMs, and
         | _either_ that team is expecting a Docker image artifact or they
         | 're expecting to manage the environment in which the
         | application runs (making sure all of the application
         | dependencies are installed in the environment, etc) which
         | typically implies a lot of coordination between the ops team
         | and the application teams (especially regarding deployment).
         | I've never seen that work as smoothly as deploying to
         | ECS/EKS/whatever and letting the ops team work on automating
         | things at a higher level of abstraction (automatic certificate
         | rotation, automatic DNS, etc).
         | 
         | That said, I've never tried the "one big server" approach,
         | although I wouldn't want to run fewer than 3 replicas, and I
         | would want reproducibility so I know I can stand up the exact
         | same thing if one of the replicas go down as well as for
         | higher-fidelity testing in lower environments. And since we
         | have that kind of reproducibility, there's no significant
         | difference in operational work between running fewer larger
         | servers and more smaller servers.
        
         | cogman10 wrote:
         | > Use One Big Database.
         | 
         | > Seriously. If you are a backend engineer, nothing is worse
         | than breaking up your data into self contained service
         | databases, where everything is passed over Rest/RPC. Your
         | product asks will consistently want to combine these data
         | sources (they don't know how your distributed databases look,
         | and oftentimes they really do not care).
         | 
         | This works until it doesn't and then you land in the position
         | my company finds itself in where our databases can't handle the
         | load we generate. We can't get bigger or faster hardware
         | because we are using the biggest and fastest hardware you can
         | buy.
         | 
         | Distributed systems suck, sure, and they make querying cross
         | systems a nightmare. However, by giving those aspects up, what
         | you gain is the ability to add new services, features, etc
         | without running into scotty yelling "She can't take much more
         | of it!"
         | 
         | Once you get to that point, it becomes SUPER hard to start
         | splitting things out. All the sudden you have 10000 "just a one
         | off" queries against several domains that are broken by trying
         | carve out a domain into a single owner.
        
           | Flow wrote:
           | Do you have Spectre countermeasures active in the kernel of
           | that machine?
        
             | runjake wrote:
             | What does it matter, in this context?
             | 
             | If it's about bare metal vs. virtual machines, know that
             | Spectre affects virtual machines, too.
        
               | rkagerer wrote:
               | I think they are implying disabling them (if on) could
               | squeeze you out a bit more performance.
        
           | kedean wrote:
           | Many databases can be distributed horizontally if you put in
           | the extra work, would that not solve the problems you're
           | describing? MariaDB supports at least two forms of
           | replication (one master/replica and one multi-master), for
           | example, and if you're willing to shell out for a MaxScale
           | license it's a breeze to load balance it and have automatic
           | failover.
        
             | hot_gril wrote:
             | Not without big compromises and a lot of extra work. If you
             | want a truly horizontally scaling database, and not just
             | multi-master for the purpose of availability, a good
             | example solution is Spanner. You have to lay your data out
             | differently, you're very restricted in what kinds of
             | queries you can make, etc.
        
             | kbenson wrote:
             | For what it's worth, I think distributing horizontally is
             | also much easier if you're already limited your database to
             | specific concerns by splitting it up in different ways.
             | Sharding a very large database with lots of data deeply
             | linked sounds like much more of a pain than something with
             | a limited scope that isn't too deeply linked with data
             | because it's already in other databases.
             | 
             | To some degree, sharding brings in a lot of the same
             | complexities as different microservices with their own data
             | store, in that you sometimes have to query across multiple
             | sources and combine in the client.
        
         | throwaway9870 wrote:
         | How do you use one big database when some of your info is stuck
         | in an ERP system?
        
         | marcosdumay wrote:
         | > Use One Big Database
         | 
         | Yep, with a passive replica or online (log) backup.
         | 
         | Keeping things centralized can reduce your hardware requirement
         | by multiple orders of magnitude. The one huge exception is a
         | traditional web service, those scale very well, so you may not
         | even want to get big servers for them (until you need them).
        
         | Closi wrote:
         | Breaking apart a stateless microservice and then basing it
         | around a giant single monolithic database is pretty pointless -
         | at that stage you might as well just build a monolith and get
         | on with it as every microservice is tightly coupled to the db.
        
           | adrianmsmith wrote:
           | That's true, unless you need
           | 
           | (1) Different programming languages e.g. you're written your
           | app in Java but now you need to do something for which the
           | perfect Python library is available.
           | 
           | (2) Different parts of your software need different types of
           | hardware. Maybe one part needs a huge amount of RAM for a
           | cache, but other parts are just a web server. It'd be a shame
           | to have to buy huge amounts of RAM for every server.
           | Splitting the software up and deploying the different parts
           | on different machines can be a win here.
           | 
           | I reckon the average startup doesn't need any of that, not
           | suggesting that monoliths aren't the way to go 90% of the
           | time. But if you do need these things, you can still go the
           | microservices route, but it still makes sense to stick to a
           | single database if at all possible, for consistency and
           | easier JOINs for ad-hoc queries, etc.
        
             | Closi wrote:
             | These are both true - but neither requires service-
             | oriented-architecture.
             | 
             | You can split up your applicaiton into chunks that are
             | deployed on seperate hardware, and use different languages,
             | without composing your whole architecture into
             | microservices.
             | 
             | A monolith can still have a seperate database server and a
             | web server, or even many different functions split across
             | different servers which are horizontally scalable, and be
             | written in both java and python.
             | 
             | Monoliths have had seperate database servers since the 80s
             | (and probably before that!). In fact, part of these
             | applications defining characteristics at the enterprise
             | level is that they often shared one big central database,
             | as often they were composed of lots of small applications
             | that would all make changes to the central database, which
             | would often end up in a right mess of software that was
             | incredibly hard to de-pick! (And all the software writing
             | to that database would, as you described, be written in
             | lots of different languages). People would then come along
             | and cake these central databases full of stored procedures
             | to make magic changes to implement functionality that
             | wasn't available in the legacy applications that they can't
             | change because of the risk and then you have even more of a
             | mess!
        
           | AtNightWeCode wrote:
           | Agree. Nothing worse than having different programs changing
           | data in the same database. The database should not be an
           | integration point between services.
        
             | jethro_tell wrote:
             | if you have multiple micro services updating the database
             | you need to have a database access layer service as well.
             | 
             | there's some real value with abstraction and microservices
             | but you can try to run them against a monolithic database
             | service
        
               | bergkvist wrote:
               | No amount of abstraction is going to save you from the
               | problem of 2 processes manipulating the same state
               | machine.
        
           | noduerme wrote:
           | I disagree. Suppose you have an enormous DB that's mainly
           | written to by workers inside a company, but has to be widely
           | read by the public outside. You want your internal services
           | on machines with extra layers of security, perhaps only
           | accessible by VPN. Your external facing microservices have
           | other things like e.g. user authentication (which may be tied
           | to a different monolithic database), and you want to put them
           | closer to users, spread out in various data centers or on the
           | edge. Even if they're all bound to one database, there's a
           | lot to recommend keeping them on separate, light cheap
           | servers that are built for http traffic and occasional DB
           | reads. And even more so if those services do a lot of
           | processing on the data that's accessed, such as building up
           | reports, etc.
        
             | Closi wrote:
             | You've not really built microservices then in the purest
             | sense though - i.e. all the microservices aren't
             | independently deployable components.
             | 
             | I'm not saying what you are proposing isn't a perfectly
             | valid architectural approach - it's just usually considered
             | an anti-pattern with microservices (because if all the
             | services depend on a single monolith, and a change to a
             | microservice functionality also mandates a change to the
             | shared monolith which then can impact/break the other
             | services, we have lost the 'independence' benefit that
             | microservices supposedly gives us where changes to one
             | microservice does not impact another).
             | 
             | Monoliths can still have layers to support business logic
             | that are seperate to the database anyway.
        
           | roflyear wrote:
           | Absolutely. I know someone who considers "different domains"
           | (as in web domains) to count as a microservice!
           | 
           | What is the point of that? it doesn't add anything. Just more
           | shit to remember and get right (and get wrong!)
        
           | manigandham wrote:
           | Why would you break apart a microservice? Any why do you need
           | to use/split into microservices anyway?
           | 
           | 99% of apps are best fit as monolithic apps _and_ databases
           | and should focus on business value rather than scale they 'll
           | never see.
        
             | Gigachad wrote:
             | Where I work we are looking at it because we are starting
             | to exceed the capabilities of one big database. Several
             | tables are reaching the billions of rows mark and just
             | plain inserts are starting to become too much.
        
               | nicoburns wrote:
               | Yeah, the at the billions of rows mark it definitely
               | makes sense to start looking at splitting things up. On
               | the other hand, the company I worked for split things up
               | from the start, and when I joined - 4 years down the line
               | - their biggest table had something like 50k rows, but
               | their query performance was awful (tens of seconds in
               | cases) because the data was so spread out.
        
             | threeseed wrote:
             | > 99% of apps are best fit as monolithic apps and databases
             | and should focus on business value rather than scale
             | they'll never see
             | 
             | You incorrectly assume that 99% of apps are building these
             | architectures for scalability reasons.
             | 
             | When in reality it's far more for development productivity,
             | security, use of third party services, different languages
             | etc.
        
               | jethro_tell wrote:
               | reliability, sometimes sharding just means you don't have
               | to get up in the middle of the night.
        
             | Closi wrote:
             | Totally agree.
             | 
             | I guess I just don't see the value in having a monolith
             | made up of microservices - you might as well just build a
             | monolith if you are going down that route.
             | 
             | And if your application fits the microservices pattern
             | better, then you might as well go down the microservices
             | pattern properly and not give them a big central DB.
        
               | adgjlsfhk1 wrote:
               | The one advantage of microservice on a single database
               | model is that it lets you test the independent components
               | much more easily while avoiding the complexity of
               | database sharding.
        
           | [deleted]
        
           | radu_floricica wrote:
           | To note that quite a bit of the performance problems come
           | when writing stuff. You can get away with A LOT if you accept
           | 1. the current service doesn't do (much) writing and 2. it
           | can live with slightly old data. Which I think covers 90% of
           | use cases.
           | 
           | So you can end up with those services living on separate
           | machines and connecting to read only db replicas, for
           | virtually limitless scalability. And when it realizes it
           | needs to do an update, it either switches the db connection
           | to a master, or it forwards the whole request to another
           | instance connected to a master db.
        
           | cfors wrote:
           | No disagreement here. I love a good monolith.
        
         | Guid_NewGuid wrote:
         | I think a strong test a lot of "let's use Google scale
         | architecture for our MVP" advocates fail is: can your
         | architecture support a performant paginated list with dynamic
         | sort, filter and search where eventual consistency isn't
         | acceptable?
         | 
         | Pretty much every CRUD app needs this at some point and if
         | every join needs a network call your app is going to suck to
         | use and suck to develop.
        
           | SkyPuncher wrote:
           | > Pretty much every CRUD app needs this at some point and if
           | every join needs a network call your app is going to suck to
           | use and suck to develop.
           | 
           | _at some point_ is the key word here.
           | 
           | Most startups (and businesses) can likely get away with this
           | well into Series A or Series B territory.
        
           | threeseed wrote:
           | > if every join needs a network call your app is going to
           | suck to use and suck to develop.
           | 
           | And yet developers do this every single day without any
           | issue.
           | 
           | It is bad practice to have your authentication database be
           | the same as your app database. Or you have data coming from
           | SaaS products, third party APIs or a cloud service. Or even
           | simply another service in your stack. And with complex
           | schemas often it's far easier to do that join in your
           | application layer.
           | 
           | All of these require a network call and join.
        
           | mhoad wrote:
           | I've found the following resource invaluable for designing
           | and creating "cloud native" APIs where I can tackle that kind
           | of thing from the very start without a huge amount of hassle
           | https://google.aip.dev/general
           | 
           | The patterns section covers all of this and more
        
             | gnat wrote:
             | This is a great resource but the RFC-style documentation
             | says what you SHOULD and MUST do, not HOW to do it ...
        
           | lmm wrote:
           | I don't believe you. Eventual consistency is how the real
           | world works, what possible use case is there where it
           | wouldn't be acceptable? Even if you somehow made the display
           | widget part of the database, you can't make the reader's
           | eyeballs ACID-compliant.
        
           | skyde wrote:
           | thanks a lot for this comment. I will borrow this as an
           | interview question :)
        
         | cdkmoose wrote:
         | >>(they don't know how your distributed databases look, and
         | oftentimes they really do not care)
         | 
         | Nor should they, it's the engineer's/team's job to provide the
         | database layer to them with high levels of service without them
         | having to know the details
        
         | z3t4 wrote:
         | The rule is: Keep related data together. Exceptions are:
         | Different customers (usually don't require each others data)
         | can be isolated. And if the database become the bottleneck you
         | can separate unrelated services.
        
       | bebrws wrote:
       | Someone call Brahm
        
       | notacoward wrote:
       | At various points in my career, I worked on Very Big Machines and
       | on Swarms Of Tiny Machines (relative to the technology of their
       | respective times). Both kind of sucked. Different reasons, but
       | sucked nonetheless. I've come to believe that the best approach
       | is generally somewhere in the middle - enough servers to ensure a
       | sufficient level of protection against failure, _but no more_ to
       | minimize coordination costs and data movement. Even then there
       | are exceptions. The key is _don 't run blindly toward the
       | extremes_. Your utility function is probably bell shaped, so you
       | need to build at least a rudimentary model to explore the problem
       | space and find the right balance.
        
         | mamcx wrote:
         | Yes, totally.
         | 
         | Among the setups the one that I think is _the golden_ is BIG Db
         | Server, 1-4 front-end(web /api/cache) servers. Off-hand the
         | backups and CDN.
         | 
         | That is.
        
       | rcarmo wrote:
       | I once fired up an Azure instance with 4TB of RAM and hundreds of
       | cores for a performance benchmark.
       | 
       | htop felt incredibly roomy, and I couldn't help thin how my three
       | previous projects would fit in with room to spare (albeit lacking
       | redundancy, of course).
        
       | gregmac wrote:
       | > However, cloud providers have often had global outages in the
       | past, and there is no reason to assume that cloud datacenters
       | will be down any less often than your individual servers.
       | 
       | A nice thing about being in a big provider is when they go down a
       | massive portion of the internet goes down, and it makes news
       | headlines. Users are much less likely to complain about _your_
       | service being down when it 's clear you're just caught up in the
       | global outage that's affecting 10 other things they use.
        
         | arwhatever wrote:
         | When migrating from [no-name CRM] to [big-name CRM] at a recent
         | job, the manager pointed out that when [big-name CRM] goes
         | down, it's in the Wall Street Journal, and when [no-name] goes
         | down, it's hard to get their own Support Team to care!
        
         | ramesh31 wrote:
         | Nobody ever got fired for buying IBM!
        
           | notjustanymike wrote:
           | We may need to update this one, I would definitely fire
           | someone today for buying IBM.
        
             | kkielhofner wrote:
             | Nobody ever got fired for buying AWS!
        
               | lanstin wrote:
               | The AWS people now are just like the IBM people in the
               | 80s - mastering a complex and not standards based array
               | of products and optional product add-ons. The internet
               | solutions were open and free for a few decades and now
               | it's AWS SNADS I mean AWS load balancers and edge
               | networks.
        
               | namose wrote:
               | AWS services are usually based on standards anyway. If
               | you use an architecturally sound approach to AWS you
               | could learn to develop for GCP or Azure pretty easily.
        
             | riku_iki wrote:
             | that's funny, since IBM is actually promoting one very fat
             | and reliable server.
        
             | dtparr wrote:
             | These days we just call it licensing Red Hat.
        
         | ustolemyname wrote:
         | This has given me a brilliant idea: deferring maintenance
         | downtime until some larger user-visible service is down.
         | 
         | This is terrible for many reasons, but I wouldn't be surprised
         | to hear someone has done this.
        
           | gorjusborg wrote:
           | Ah yes, the 'who cut the cheese?' maintenance window.
        
         | pdpi wrote:
         | Another advantage is that the third-party services you depend
         | on are also likely to be on one of the big providers, so it's
         | one less point of failure.
        
         | hsn915 wrote:
         | No. Your users have no idea that you rely on AWS (they don't
         | even know what it is), and they don't think of it as a valid or
         | reasonable excuse as to why your service is down.
        
         | andrepew wrote:
         | This is a huge one -- value in outsourcing blame. If you're
         | down because of a major provider outage in the news, you're
         | viewed more as a victim of a natural disaster rather than
         | someone to be blamed.
        
           | oceanplexian wrote:
           | I hear this repeated so many times at my workplace, and it's
           | so totally and completely uninformed.
           | 
           | Customers who have invested millions of dollars into making
           | their stack multi-region, multi-cloud, or multi-datacenter
           | aren't going to calmly accept the excuse that "AWS Went Down"
           | when you can't deliver the services you contractually agreed
           | to deliver. There are industries out there where having your
           | service casually go down a few times a year is totally
           | unacceptable (Healthcare, Government, Finance, etc). I worked
           | adjacent to a department that did online retail a while ago
           | and even an hour of outage would lose us $1M+ in business.
        
             | darkr wrote:
             | > Customers who have invested millions of dollars > ... >
             | an hour of outage would lose us $1M+ in business
             | 
             | Given (excluding us-east-1) you're looking at maybe an hour
             | a year on average of regional outage, sounds like best case
             | break even on that investment?
        
               | oceanplexian wrote:
               | I'm going to say that an hour a year is wildly
               | optimistic. But even then, that puts you at 4 nines
               | (99.99%) which is comparatively awful, consider that an
               | old fashioned telephone using technology from the 1970s
               | will achieve on average, 5 9's of reliability, or 5.26
               | minutes of downtime per year, and that most IT shops
               | operating their own infrastructure contractually expect 5
               | 9's from even fairly average datacenters and transit
               | providers.
        
               | nicoburns wrote:
               | I was amused when I joined my current company to find
               | that our contracts only stipulate one 9 of reliability
               | (98%). So ~30 mins a day or ~14 hours a month is
               | permissible.
        
             | rapind wrote:
             | I wonder if the aggregate outage time from misconfigured
             | and over-architected high availability services is greater
             | than the average AWS outage per year.
             | 
             | Similar to security, the last few 9s of availability come
             | at a heavily increasing (log) complexity / price. The
             | cutoff will vary case by case, and I'm sure the decision on
             | how many 9s you need is often irrational (CEO says it can
             | never go down! People need their pet food delivered on
             | time!).
        
           | mahidhar wrote:
           | Agreed. Recently I was discussing the same point with a non-
           | technical friend who was explaining that his CTO had decided
           | to move from Digital Ocean to AWS, after DO experienced some
           | outage. Apparently the CEO is furious at him and has assumed
           | that DO are the worst service provider because their services
           | were down for almost an entire business day. The CTO probably
           | knows that AWS could also fail in a similar fashion, but by
           | moving to AWS it becomes more or less an Act of God type of
           | situation and he can wash his hands of it.
        
           | tjoff wrote:
           | This seems like a recently popular exaggeration, I'd wager no
           | one but a select few in the HN-bubble actually cares.
           | 
           | You will primarily be judged by how much of an inconvenience
           | the outage was to every individual.
           | 
           | The best you can hope for is that the local ISP gets the
           | blame, but honestly. It can't be more than a rounding error
           | in the end.
        
             | treis wrote:
             | I think it's more of a shield against upper management. AWS
             | going down is treated like an act of god rendering everyone
             | blameless. But if it's your one big server that goes down
             | then it's your fault.
        
               | phkahler wrote:
               | >> AWS going down is treated like an act of god rendering
               | everyone blameless.
               | 
               | Someone decided to use AWS, so there is blame to go
               | around. I'm not saying if that blame is warranted or not,
               | just that it sounds like a valid thing to say for people
               | who want to blame someone.
        
               | flatiron wrote:
               | "Nobody gets fired for using aws" is pretty big now a
               | days. We use GCP but if they have an issue and it bubbles
               | down to me nobody bats an eye when I say the magical
               | cloud man made ut oh whoopsie and it wasn't me.
        
               | sebzim4500 wrote:
               | I doubt anyone has ever been fired for choosing AWS. I
               | know for a fact that people have been fired after
               | deciding to do it on bare metal and then it didn't work
               | very well.
        
               | jasonlotito wrote:
               | "I think it's more of a shield against upper management."
               | 
               | "Someone decided to use AWS, so there is blame to go
               | around."
               | 
               | Upper management.
        
           | ozim wrote:
           | So it does not really work in B2B.
           | 
           | I don't really have much to do with contracts - but my
           | company is stating that we have up time of 99.xx%.
           | 
           | In terms of contract customers don't care if I have Azure/AWS
           | or I keep my server in the box under the stairs. Yes they do
           | due diligence and would not buy my services if I keep it in
           | shoe box.
           | 
           | But then if they loose business they come to me .. I can go
           | after Azure/AWS but I am so small they will throw some free
           | credits and me and tell to go off.
           | 
           | Maybe if you are in B2C area then yeah - your customers will
           | probably shrug and say it was M$ or Amazon if you write sad
           | blog post with excuses.
        
             | zerkten wrote:
             | It's going to depend on the penalties for being
             | unavailable. Small B2B customers are very different from
             | enterprise B2B customers too, so you ultimately have to
             | build for your context.
             | 
             | If you have to give service credits to customers then with
             | "one box" you have to give 100% of customers a credit. If
             | your services are partitioned across two "shards" then one
             | of those shards can go down, but your credits are only paid
             | out at 50%.
             | 
             | Getting to this place doesn't prevent a 100% outage and it
             | imposes complexity. This kind of design can be planned for
             | enterprise B2B apps when the team are experienced with
             | enterprise clients. Many B2B SaaS are tech folk with zero
             | enterprise experience, so they have no idea of relatively
             | simple things that can be done to enable a shift to this
             | architecture.
             | 
             | Enterprise customers do care where things are hosted. They
             | very likely have some users in the EU, or other locations,
             | which care more about data protection and sovereignty than
             | the average US organization. Since they are used to hosting
             | on-prem and doing their own due diligence they will often
             | have preferences over hosting. In industries like
             | healthcare, you can find out what the hosting preferences
             | are, as well as understand how the public clouds are
             | addressing them. While not viewed as applicable by many on
             | HN due to the focus on B2C and smaller B2B here, this is
             | the kind of thing that can put a worse product ahead in the
             | enterprise scenario.
        
             | HWR_14 wrote:
             | Because you have a vendor/customer relationship. The big
             | thing for AWS is employer/employee relationships. If you
             | were a larger company, and AWS goes down, who blames you?
             | Who blames anyone in the company? At the C-level, does the
             | CEO expect more uptime than _Amazon_? Of course not. And so
             | it goes.
             | 
             | Whereas if you do something other than the industry
             | standard of AWS (or Azure/GCP) and it goes down, clearly
             | it's _your fault_.
        
             | andrepew wrote:
             | Depends on scale of B2B. Between enterprises, not as much.
             | Between small businesses, works very well (at least in my
             | experience, we are tiny B2B).
        
               | lanstin wrote:
               | It really varies a lot. I have seen very large lazy sites
               | suddenly pick up a client that wanted RCA for each bad
               | transaction, and suddenly get religion quickly (well
               | quickly as a large org can). Those are precious clients
               | because they force investment into useful directions of
               | availability instead of just new features.
        
           | travisgriggs wrote:
           | "Value in outsourcing blame"
           | 
           | The real reason that talented engineers secretly support all
           | of the middle management we vocally complain about.
        
           | ocdtrekkie wrote:
           | I find this entire attitude disappointing. Engineering has
           | moved from "provide the best reliability" to "provide the
           | reliability we won't get blamed for the failure of". Folks
           | who have this attitude missed out on the dang ethics course
           | their college was teaching.
           | 
           | If rolling your own is faster, cheaper, and more reliable (it
           | is), then the only justification for cloud is assigning
           | blame. But you know what you also don't get? Accolades.
           | 
           | I throw a little party of one here when Office 365 or Azure
           | or AWS or whatever Google calls it's cloud products this week
           | is down but all our staff are able to work without issue. =)
        
           | jeroenhd wrote:
           | If you work in B2B you can put the blame on Amazon and your
           | customers will ask "understandable, take the necessary steps
           | to make sure it doesn't happen again". AWS going down isn't
           | an act of God, it's something you should've planned for,
           | especially if it happened before.
        
         | nrmitchi wrote:
         | There is also the consideration that this isn't even an
         | argument of "other things are down too!" or "outsourcing blame"
         | as much as, depending on what your service is of course, you
         | are unlikely to be operating in a bubble. You likely have some
         | form of external dependencies, or you are an external
         | dependency, or have correlated/cross-dependency usage with
         | another service.
         | 
         | Guaranteeing isolation between all of these different moving
         | parts is _very difficult_. Even if you 're not directly
         | affected by a large cloud outage, it's becoming less-and-less
         | common that you, or your customers, are truely isolated.
         | 
         | As well, if your AWS-hosted service mostly exists to service
         | AWS-hosted customers, and AWS is down, it doesn't matter if you
         | are down. None of your customers are operational anyways. Is
         | this a 100% acceptable solution? Of course not. But for 95% of
         | services/SaaS out there, it really doesn't matter.
        
           | [deleted]
        
         | taylodl wrote:
         | Users are much more sympathetic to outages when they're
         | widespread. But, if there's a contractual SLA then their
         | sympathy doesn't matter. You have to meet your SLA. That
         | usually isn't a big problem as SLAs tend to account for some
         | amount of downtime, but it's important to keep the SLA in mind.
        
           | hans1729 wrote:
           | This just holds when you are b2b. If you're serving end
           | users, they don't care about the contract, they care about
           | their UX.
        
         | z3t4 wrote:
         | You also have to calculate in the complexity of running
         | thousands of servers vs running just one server. If you run
         | just one server it's unlikely to go down even once in it's
         | lifetime. Meanwhile cloud providers are guaranteed to have
         | outages due to the share complexity of managing thousands of
         | servers.
        
         | bilekas wrote:
         | I can't tell if this is a good thing or a bad thing though!
         | 
         | Imagine the clout of saying : "we stayed online while AWS died"
        
           | dghlsakjg wrote:
           | Depends on how technical your customer base is. Even as a
           | developer I would tend not to ascribe too much signal to that
           | message. All it tells me is that you don't use AWS.
           | 
           | "We stayed online when GCP, AWS, and Azure go down" is a
           | different story. On the other hand, if those three go down
           | simultaneously, I suspect the state of the world will be such
           | that I'm not worried about the internet.
        
             | lanstin wrote:
             | I would expect there are BGP issues that could do that, at
             | least for large swaths of the internet.
        
               | [deleted]
        
             | namose wrote:
             | I do also remember in one of the recent AWS outages, the
             | google cloud compute service had lower availability due to
             | failovers hitting all at once
        
           | Nextgrid wrote:
           | HN implicitly gets this clout - it became the _real_ status
           | page of most of the internet.
        
       | cal85 wrote:
       | > In comparison, buying servers takes about 8 months to break
       | even compared to using cloud servers, and 30 months to break even
       | compared to renting.
       | 
       | Can anyone help me understand why the cloud/renting is still this
       | expensive? I'm not familiar with this area, but it seems to me
       | that big data centers must have some pretty big cost-saving
       | advantages (maintenance? heat management?). And there are several
       | major providers all competing in a thriving marketplace, so I
       | would expect that to drive the cost down. How can it still be so
       | much cheaper to run your own on-prem server?
        
         | WJW wrote:
         | Several points:
         | 
         | - The price for on-prem conveniently omits costs for power,
         | cooling, networking, insurance and building space, it's only
         | the purchase price.
         | 
         | - The price for the cloud server includes (your share of) the
         | costs of replacing a broken power supply or hard drive, which
         | is not included in the list price for on-prem. You will have to
         | make sure enough of your devs know how to do that or else hire
         | a few sysadmin types.
         | 
         | - As the article already mentions, the cloud has to provision
         | for peak usage instead of average usage. If you buy an on-prem
         | server you always have the same amount of computing power
         | available and can't scale up quickly if you need 5x the
         | capacity because of a big event. That kind of flexibility costs
         | money.
        
           | cal85 wrote:
           | Thank you, that explains it.
        
         | zucker42 wrote:
         | Not included in the break even calculation was the cost of
         | colocation, or the cost of hiring someone to make sure the
         | computer is in working order, or the less hassle upon hardware
         | failures.
         | 
         | Also, as the author even mention in an article, a modern server
         | basically obsoletes a 10 year old server. So you're going to
         | have to replace your server at least every 10 years. So the
         | break even in the case of renting makes sense when you consider
         | that the server depreciates really quickly.
        
         | manigandham wrote:
         | You're paying a premium for _flexibility_. If you don 't need
         | that then there are far cheaper options like some managed
         | hosting from your local datacenter.
        
         | klysm wrote:
         | The huge capital required to get a data center with those cost
         | savings serves as a nice moat to let people price things high.
        
         | marcosdumay wrote:
         | Renting is not very expensive. 30 months is a large share of a
         | computer's lifetime, and you are paying for space, electricity,
         | and internet access too.
        
       | merb wrote:
       | > If you compare to the OVHCloud rental price for the same
       | server, the price premium of buying your compute through AWS
       | lambda is a factor of 25
       | 
       | and there is a factor of 25 that ovh is not a company where you
       | should rent servers:
       | 
       | https://www.google.com/search?q=ovh+fire
        
       | siliconc0w wrote:
       | One thing to keep in mind is separation. The prod environment
       | should be completely separated from the dev ones (plural, it
       | should be cheap/fast to spin up dev environments). Access to
       | production data should be limited to those that need it (ideally
       | for just the time they need it). Teams should be able to deploy
       | their app separately and not have to share dependencies (i.e
       | operating system libraries) and it should be possible to test OS
       | upgrades (containers do not make you immune from this). It's
       | _kinda_ possible to sort of do this with  'one big server' but
       | then you're running your own virtualized infrastructure which has
       | it's own costs/pains.
       | 
       | Definitely also don't recommend one big database, as that becomes
       | a hairball quickly - it's possible to have several logical
       | databases for one physical 'database 'server' though.
        
       | lordleft wrote:
       | Interesting write-up that acknowledges the benefits of cloud
       | computing while starkly demonstrating the value proposition of
       | just one powerful, on-prem server. If it's accurate, I think a
       | lot of people are underestimating the mark-up cloud providers
       | charge for their services.
       | 
       | I think one of the major issues I have with moving to the cloud
       | is a loss of sysadmin knowledge. The more locked in you become to
       | the cloud, the more that knowledge atrophies within your
       | organization. Which might be worth it to be nimble, but it's a
       | vulnerability.
        
         | phpisthebest wrote:
         | Given that AWS holds up the entire Amazon Company, and is a
         | large part of Bezo's personal wealth, I think the market up is
         | pretty good.
        
       | evilotto wrote:
       | Many people will respond that "one big server" is a massive
       | single point of failure, but in doing so they miss that it is
       | also a single point of success. If you have a distributed system,
       | you have to test and monitor lots of different failure scenarios.
       | With a SPOS, you only have one thing to monitor. For a lot of
       | cases the reliability of that SPOS is plenty.
       | 
       | Bonus: Just move it to the cloud, because AWS is definitely not
       | its own SPOF and it never goes down taking half the internet with
       | it.
        
       | MrStonedOne wrote:
       | /tg/station, the largest open source multiplayer video game on
       | github, gets cloudheads trying to help us "modernize" the game
       | server for the cloud all the time.
       | 
       | Here's how that breaks down:
       | 
       | The servers (sorry, i mean compute) cost the same (before
       | bandwidth, more on that at the bottom) to host one game server as
       | we pay (amortized) per game server to host 5 game servers on a
       | rented dedicated server. ($175/month for the rented server with
       | 64gb of ram and a 10gbit uplink)
       | 
       | They run twice as slow because high core count slow clock speed
       | servers aren't all they are cracked up to be, and our game engine
       | is single threaded, but even if it wasn't, there is an overhead
       | to multithreading things which combined with most high core count
       | servers also having slow clock speed, rarely squares out to an
       | actual increase in real world performance.
       | 
       | You can get the high clock speed units, they are twice to three
       | times as expensive. And still run 20% slower over windows vms on
       | rented bare metal because the sad fact is enterprise cpus by
       | either intel or amd have slower clock speeds and single threaded
       | performance then their gaming cpu counterparts, and getting
       | gaming cpus for rented servers is piss easy, but next to
       | impossible for cloud servers.
       | 
       | Each game server uses 2tb of bandwidth to host 70 player high
       | pops. This works with 5 servers on 1 machine because our hosting
       | provider gives us 15tb of bandwidth included in the price of the
       | server.
       | 
       | Well now the cloud bill just got a new 0. 10 to 30x more
       | expensive once you remember to price in bandwidth isn't looking
       | too great.
       | 
       | "but it would make it cheaper for small downstreams to start out"
       | until another youtuber mentions our tiny game, and every game
       | server is hitting the 120 hard pop cap, and a bunch of
       | downstreams get a surprise 4 digit bill for what would normally
       | run 2 digits.
       | 
       | The take away from this being that even adding in docker or k8s
       | deployment support to the game server is seen as creating the
       | risk some kid bankrupts themselves trying to host a game server
       | of their favorite game off their mcdonalds paycheck, and we tell
       | such tech "pros" to sod off with their trendy money wasters.
        
         | mwcampbell wrote:
         | > $175/month for the rented server with 64gb of ram and a
         | 10gbit uplink)
         | 
         | Wow, what provider is that?
        
           | corford wrote:
           | Hetzner's PX line offers 64GB ECC RAM, Xeon CPU, dual 1TB
           | NVME for < $100/month. A dedicated 10Gbit b/w link (plus
           | 10Gbit NIC) is then an extra ~$40/month on top (incls.
           | 20TB/month traffic, with overage billed at $1/TB).
        
       | twblalock wrote:
        
       | YetAnotherNick wrote:
       | This post raises small issues like reliability, but missed lot of
       | much bigger issues like testing, upgrades, reproducibility,
       | backups and even deployments. Also, the author is comparing on
       | demand pricing, which to me doesn't make sense if you are paying
       | for the server with reserved pricing. Still I agree there would
       | be a difference of 2-3x(unless your price is dominated by AWS
       | egress fees), but most server with fixed workload, even for very
       | popular but simple sites, it could be done in $1k/month in cloud,
       | less than 10% of one developer salary. For non fixed workload
       | like ML training, you would anyways need some cloudy setup.
        
       | softfalcon wrote:
       | So... I guess these folks haven't heard of latency before? Fairly
       | sure you have to have "one big server" in every country if you do
       | this. I feel like that would get rather costly compared to
       | geographically distributed cloud services long term.
        
         | gostsamo wrote:
         | The article explicitly mentiones CDN as something that you can
         | outsource and also notes that the market there is competitive
         | and the prices are low.
        
         | Nextgrid wrote:
         | As opposed, to "many small servers" in every country? The vast
         | majority of startups out there run out of a single AWS region
         | with a CDN caching read-only content. You can apply the same
         | CDN approach to a bare-metal server.
        
           | softfalcon wrote:
           | Yeah, but if I'm a startup and running only a small server,
           | the cloud hosting costs are minimal. I'm not sure how you
           | think it's cheaper to host tiny servers in lots of countries
           | and pay someone to manage that for you. You'll need IT in
           | every one of those locations to handle the service of your
           | "small servers".
           | 
           | I run services globally for my company, there is no way we
           | could do it. The fact that we just deploy containers to k8s
           | all over the world works very well for us.
           | 
           | Before you give me the "oh k8s, well you don't know bare
           | metal" please note that I'm an old hat that has done the
           | legacy C# ASP.NET IIS workflows on bare metal for a long
           | time. I have learned and migrated to k8s on AWS/GCloud and it
           | is a huge improvement compared to what I used to deal with.
           | 
           | Lastly, as for your CDN discussion, we don't just host CDN's
           | globally. We also host geo-located DB + k8s pods. Our service
           | uses web sockets and latency is a real issue. We can't have
           | 500 ms ping if we want to live update our client. We choose
           | to host locally (in what is usually NOT a small server) so we
           | get optimal ping for the live-interaction portion of our
           | services that are used by millions of people every day.
        
             | Nextgrid wrote:
             | > the cloud hosting costs are minimal
             | 
             | Disagreed. The cloud equivalent of a small server is still
             | a few hundred bucks a month + bandwidth. Sure, it's still a
             | relatively small cost but you're still overpaying
             | significantly over the Hetzner equivalent which will be
             | sub-$100.
             | 
             | > pay someone to manage that for you
             | 
             | The same guy that manages your AWS can do this. Having
             | bare-metal servers doesn't mean renting colo space and
             | having people on-site - you can get them from
             | Hetzner/OVH/etc and they will manage all the hardware for
             | you.
             | 
             | > The fact that we just deploy containers to k8s all over
             | the world works very well for us.
             | 
             | It's great that it works well for you and I am in no way
             | suggesting you should change, but I wouldn't say it would
             | apply to everyone - the cloud adds significant costs with
             | regards to bandwidth alone and makes some services outright
             | impossible with that pricing model.
             | 
             | > We also host geo-located DB
             | 
             | That's a complex use-case that's not representative of most
             | early/small SaaS which are just a CRUD app backed by a DB.
             | If your business case requires distributed databases and
             | you've already done the work, great - but a lot of services
             | don't need that (at least not yet) and can do just fine
             | with a single big DB server + application server and good
             | backups, and that will be dirt-cheap on bare-metal.
        
               | nostrebored wrote:
               | Claiming that Hetzner is equivalent is fallacious. The
               | offerings are completely different.
               | 
               | Agreed on networking though!
        
               | Nextgrid wrote:
               | In context of a "small server", I think they are
               | equivalent. AWS gives you a lot more functionality but
               | you're unlikely to be using any of it if you're just
               | running a single small "pet" server.
        
             | kkielhofner wrote:
             | You don't need IT in every location or even different
             | hosting facility contracts. Most colo hosting companies
             | have multiple regions. From the 800lb gorilla (Equinix):
             | 
             | https://www.equinix.com/data-centers
             | 
             | Or a smaller US focused colo provider:
             | 
             | https://www.coresite.com/data-centers/locations
             | 
             | Between vendor (Dell, HP, IBM, etc) and the remote hands
             | offered by the hosting facility you don't ever have to have
             | a member of your team even enter a facility. Anywhere.
             | Depending on the warranty/support package the vendor will
             | dispatch someone to show up to the facility to replace
             | failed components with little action from you.
             | 
             | The vendor will be happy to ship the server directly to the
             | facility (anywhere) and for a nominal fee the colo provider
             | will rack it and get IPMI, iLo, IP KVM, whatever up for you
             | to do your thing. When/if something ever "hits the fan"
             | they have on site 24 hour "remote hands" that can either
             | take basic pre-prescribed steps/instructions -or- work with
             | your team directly and remotely.
             | 
             | Interestingly, at my first startup we had a facility in the
             | nearest big metro area that not only hosted our hardware
             | but also provided an easy, cheap, and readily available
             | meeting space:
             | 
             | https://www.coresite.com/data-centers/data-center-
             | design/ame...
        
           | kgeist wrote:
           | >The vast majority of startups out there run out of a single
           | AWS region with a CDN caching read-only content.
           | 
           | I wonder how many of them violate GDPR and similar laws in
           | other countries in regards to personal data processing by
           | processing everything in the US.
        
         | treis wrote:
         | This is one of those problems that basically no one has. RTT
         | from Japan to Washington D.C. is 160ms. There's very few
         | applications where that amount of additional latency matters.
        
           | naavis wrote:
           | It adds up surprisingly quickly when you have to do a TLS
           | handshake, download many resources on pageload etc. The TLS
           | handshake alone costs 3 round-trips over the network.
        
             | treis wrote:
             | TLS is cached though. Your 3 round trips is 1/2 second on
             | initial load but then should be reused for subsequent
             | requests.
             | 
             | Resources should be served through a CDN so you'll get
             | local servers for those.
        
       | yomkippur wrote:
       | What holds me back from doing this is how will I reduce latency
       | from the calls coming from other side of the world when OVHcloud
       | seemingly does not have datacenters all over the world? There is
       | an noticeable lag when it comes to multiplayer games or even web
       | applications.
        
       | tonymet wrote:
       | people don't account for the cpu & wall-time cost of encode-
       | decode. I've seen it take up 70% of cpu on a fleet. That means
       | 700/1000 servers are just doing encode decode.
       | 
       | You can see high efficiency setups like stackexchange &
       | hackernews are orders of magnitude more efficient.
        
         | pclmulqdq wrote:
         | This is exactly correct. If you have a microservice running a
         | Rest API, you are probably spending most of your CPU time on
         | HTTP and JSON handling.
        
       | adam_arthur wrote:
       | I'm building an app with Cloudflare serverless and you can
       | emulate everything locally with a single command and debug
       | directly... It's pretty amazing.
       | 
       | But the way their offerings are structured means it will be quite
       | expensive to run at scale without a multi cloud setup. You can't
       | globally cache the results of a worker function in CDN, so any
       | call to a semi dynamic endpoint incurs one paid invocation, and
       | there's no mechanism to bypass this via CDN caching because the
       | workers live in front of the CDN, not behind it.
       | 
       | Despite their media towards lowering cloud costs, they have
       | explicitly designed their products to contain people in a cost
       | structure similar to but different than via egress fees. And in
       | fact it's quite easily bypassed by using a non Cloudflare CDN in
       | front of Cloudflare serverless.
       | 
       | Anyway, I reached a similar conclusion that for my app a single
       | large server instance works best. And actually I can fit my whole
       | dataset in RAM, so disk/JSON storage and load on startup is even
       | simpler than trying to use multiple systems and databases.
       | 
       | Further, can run this on a laptop for effectively free, and cache
       | everything via CDN, rather than pay ~$100/month for a cloud
       | instance.
       | 
       | When you're small, development time is going to be your biggest
       | constraint, and I highly advocate all new projects start with a
       | monolithic approach, though with a structure that's conducive to
       | decoupling later.
        
       | bilekas wrote:
       | I don't agree with EVERYTHING in the article such as getting 2
       | big rather than multiple smaller, this is really just a
       | cost/requirement issue though.
       | 
       | The biggest cost I've noticed with enterprises who go full cloud
       | is that they are locked in for the long term. I don't mean
       | contractually though, basically the way they design and implement
       | any system or service MUST follow the providers "way" this can be
       | very detrimental for leaving the provider or god forbid the
       | provider decides to sunset certain service versions etc.
       | 
       | That said, for enterprise it can make a lot of sense and the
       | article covers it well by admitting some "clouds" are beneficial.
       | 
       | For anything I've ever done outside of large businesses the go to
       | has always been "if it doesn't require a SRE to maintain, just
       | host your own".
        
       | amelius wrote:
       | Nice until your server gets hugged by HN.
        
       | runeks wrote:
       | > The big drawback of using a single big server is availability.
       | Your server is going to need downtime, and it is going to break.
       | Running a primary and a backup server is usually enough, keeping
       | them in different datacenters.
       | 
       | What about replication? I assume the 70k postgres IOPS fall to
       | the floor when needing to replicate the primary database to a
       | backup server in a different region.
        
       | arwhatever wrote:
       | Recent team I was on used one big server.
       | 
       | Wound up spawning off a separate thread from our would-be
       | stateless web api to run recurring bulk processing jobs.
       | 
       | Then coupled our web api to the global singleton-esque bulk
       | processing jobs thread in a stateful manner.
       | 
       | The wrapped actors up on actors on top of everything to try to
       | wring as much performance as possible out of the big server.
       | 
       | Then decided they wanted to have a failover/backup server but it
       | was too difficult due to the coupling to the global singleton-
       | esque bulk processing job.
       | 
       | [I resigned at this point.]
       | 
       | So yeah color me skeptical. I know every project's needs are
       | different, but I'm a huge fan of dumping my code into some cloud
       | host that auto-scaled horizontally, and then getting back to
       | writing more code that provides some freeeking busines value.
        
         | the_duke wrote:
         | This has nothing to do with cloud Vs big server.You can build
         | horrible, tightly coupled architectures anywhere. You can
         | alsocleanly separate workloads on a single server just fine.
        
       | malkia wrote:
       | It was all good, until NUMA came, and now you have to careful
       | rethought your process, or you get lots of performance issues in
       | your (otherwise) well threaded code. Speaking from first-hand
       | experience, when our level editor ended up being used by artists
       | on a server class machine, and supposedly 4x faster machine was
       | actually going 2x slower (why, lots of std::shared_ptr<> use on
       | our side, or any atomic reference counting) caused slowdowns, as
       | the cache (my understanding) had to be synchronized between the
       | two physical CPUs each having 12 threads.
       | 
       | But really not the only issue, just pointing out - that you can't
       | expect everything to scale smoothly there, unless well thought,
       | like ask your OS to allocate your threads/memory only on one of
       | the physical CPUS (and their threads), and somehow big
       | disconnected part of your process(es) on the other one(s), and
       | make sure the communication between them is minimal.. which
       | actually wants micro-services design again at that level.
       | 
       | so why not go with micro-services instead...
        
       | faizshah wrote:
       | In the paper on Twitter's "Who to Follow" service they mention
       | that they designed the service around storing the entire twitter
       | graph in the memory of a single node:
       | 
       | > An interesting design decision we made early in the Wtf project
       | was to assume in-memory processing on a single server. At first,
       | this may seem like an odd choice, run- ning counter to the
       | prevailing wisdom of "scaling out" on cheap, commodity clusters
       | instead of "scaling up" with more cores and more memory. This
       | decision was driven by two rationales: first, because the
       | alternative (a partitioned, dis- tributed graph processing
       | engine) is significantly more com- plex and di
       | and, second, because we could! We elaborate on these two
       | arguments below.
       | 
       | I always wondered if they still do this and if this influenced
       | any other architectures at other companies.
       | 
       | Paper:
       | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69...
        
       | faizshah wrote:
       | In the paper on Twitter's "Who to Follow" service they mention
       | that they designed the service around storing the entire twitter
       | graph in the memory of a single node:
       | 
       | > An interesting design decision we made early in the Wtf project
       | was to assume in-memory processing on a single server. At first,
       | this may seem like an odd choice, run- ning counter to the
       | prevailing wisdom of "scaling out" on cheap, commodity clusters
       | instead of "scaling up" with more cores and more memory. This
       | decision was driven by two rationales: first, because the
       | alternative (a partitioned, dis- tributed graph processing
       | engine) is significantly more com- plex and di
       | and, second, because we could! We elaborate on these two
       | arguments below.
       | 
       | I always wondered if they still do this and if this influenced
       | any other architectures at other companies.
       | 
       | Paper:
       | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69...
        
       | faizshah wrote:
       | In the paper on Twitter's "Who to Follow" service they mention
       | that they designed the service around storing the entire twitter
       | graph in the memory of a single node:
       | 
       | > An interesting design decision we made early in the Wtf project
       | was to assume in-memory processing on a single server. At first,
       | this may seem like an odd choice, run- ning counter to the
       | prevailing wisdom of "scaling out" on cheap, commodity clusters
       | instead of "scaling up" with more cores and more memory. This
       | decision was driven by two rationales: first, because the
       | alternative (a partitioned, dis- tributed graph processing
       | engine) is significantly more com- plex and dicult to build, and,
       | second, because we could! We elaborate on these two arguments
       | below.
       | 
       | > Requiring the Twitter graph to reside completely in mem- ory is
       | in line with the design of other high-performance web services
       | that have high-throughput, low-latency require- ments. For
       | example, it is well-known that Google's web indexes are served
       | from memory; database-backed services such as Twitter and
       | Facebook require prodigious amounts of cache servers to operate
       | smoothly, routinely achieving cache hit rates well above 99% and
       | thus only occasionally require disk access to perform common
       | operations. However, the additional limitation that the graph
       | fits in memory on a single machine might seem excessively
       | restrictive.
       | 
       | I always wondered if they still do this and if this influenced
       | any other architectures at other companies.
       | 
       | Paper:
       | https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69...
        
         | 3pt14159 wrote:
         | Yeah I think single machine has its place, and I once sped up a
         | program by 10000x by just converting it to Cython and having it
         | all fit in the CPU cache, but the cloud still does have a
         | place! Even for non-bursty loads. Even for loads that
         | theoretically could fit in a single big server.
         | 
         | Uptime.
         | 
         | Or are you going to go down as all your workers finish? Long
         | connections? Etc.
         | 
         | It is way easier to gradually handover across multiple API
         | servers as you do an upgrade than it is to figure out what to
         | do with a single beefy machine.
         | 
         | I'm not saying it is always worth it, but I don't even think
         | about the API servers when a deploy happens anymore.
         | 
         | Furthermore if you build your whole stack this way it will be
         | non-distributed by default code. Easy to transition for some
         | things, hell for others. Some access patterns or algorithms are
         | fine when everything is in a CPU cache or memory but would fall
         | over completely across multiple machines. Part of the nice part
         | about starting with cloud first is that it is generally easier
         | to scale to billions of people afterwards.
         | 
         | That said, I think the original article makes a nuanced case
         | with several great points and I think your highlighting of the
         | Twitter example is a good showcase for where single machine
         | makes sense.
        
       | efortis wrote:
       | Some comments wrongly equate bare-metal with on-premise. Bare-
       | metal servers can be rented out, collocated, or installed on-
       | premise.
       | 
       | Also, when renting, the company takes care of hardware failures.
       | Furthermore, as hard disk failures are the most common issue, you
       | can have hot spares and opt to let damaged disks rot, instead of
       | replacing them.
       | 
       | For example, in ZFS, you can mirror disks 1 and 2, while having 3
       | and 4 as hot spares, with the following command:
       | zpool create pool mirror $d1 $d2 spare $d3 $d4
       | 
       | ---
       | 
       | The 400Gbps are now 700Gbps
       | 
       | https://twitter.com/DanRayburn/status/1519077127575855104
       | 
       | ---
       | 
       | About the break even point:
       | 
       | Disregarding the security risks of multi-tenant cloud instances,
       | bare-metal is more cost-effective once your cloud bill exceeds
       | $3,000 per year, which is the cost of renting two bare-metal
       | servers.
       | 
       | ---
       | 
       | Here's how you can create a two-server infrastructure:
       | 
       | https://blog.uidrafter.com/freebsd-jails-network-setup
        
         | drewg123 wrote:
         | 720Gb/s actually. Those last 20-30Gb/s were pretty hard fought
         | :)
        
           | efortis wrote:
           | Yeah. Thank you!
        
       | zhoujianfu wrote:
       | 10 years ago I had a site running on an 8GB of ram VM ($80/mo?)
       | that ran a site serving over 200K daily active users on a
       | completely dynamic site written in PHP running MySQL locally.
       | Super fast and never went down!
        
       | porker wrote:
       | I like One Big (virtual) Server until you come to software
       | updates. At a current project we have one server running the
       | website in production. It runs an old version of Centos, the web
       | server, MySQL and Elasticsearch all on the one machine.
       | 
       | No network RTTs when doing too many MySQL queries on each page -
       | great! But when you want to upgrade one part of that stack... we
       | end up cloning the server, upgrading it, testing everything, and
       | then repeating the upgrade in-place on the production server.
       | 
       | I don't like that. I'd far rather have separate web, DB and
       | Elasticsearch servers where each can be upgraded without fear of
       | impacting the other services.
        
         | rlpb wrote:
         | You could just run system containers (eg. lxd) for each
         | component, but still on one server. That gets you multiple
         | "servers" for the purposes of upgrades, but without the rest of
         | the paradigm shift that Docker requires.
        
           | 0xbadcafebee wrote:
           | Which is great until there's a security vuln in an end-of-
           | life piece of core software (the distro, the kernel, lxc,
           | etc) and you need to upgrade the whole thing, and then it's a
           | 4+ week slog of building a new server, testing the new
           | software, fixing bugs, moving the apps, finding out you
           | missed some stuff and moving that stuff, shutting down the
           | old one. Better to occasionally upgrade/reinstall the whole
           | thing with a script and get used to not making one-off
           | changes on servers.
           | 
           | If I were to buy one big server, it would be as a hypervisor.
           | Run Xen or something and that way I can spin up and down VMs
           | as I choose, LVM+XFS for snapshots, logical disk management,
           | RAID, etc. But at that point you're just becoming a personal
           | cloud provider; might as well buy smaller VMs from the cloud
           | with a savings plan, never have to deal with hardware, make
           | complex changes with a single API call. Resizing an instance
           | is one (maybe two?) API call. Or snapshot, create new
           | instance, delete old instance: 3 API calls. Frickin' magic.
           | 
           |  _" the EC2 Instance Savings Plans offer up to 72% savings
           | compared to On-Demand pricing on your Amazon EC2 Instances"_
           | - https://aws.amazon.com/savingsplans/
        
             | rlpb wrote:
             | Huh? Using lxd would be identical to what you suggest (VMs
             | on Xen) from a security upgrade and management perspective.
             | Architecturally and operationally they're basically the
             | equivalent, except that VMs need memory slicing up but lxd
             | containers don't. There are security isolation differences
             | but you're not talking about that here?
        
               | 0xbadcafebee wrote:
               | I would want the memory slicing + isolation, plus a
               | hypervisor like Xen doesn't need an entire host OS so
               | there's less complexity, vulns, overhead, etc, and I'm
               | not aware if LXD does the kind of isolation that ex.
               | allows for IKE IPSec tunnels? Non-hypervisors don't allow
               | for it iirc. Would rather use Docker for containers
               | because the whole container ecosystem is built around it.
        
               | rlpb wrote:
               | > I would want the memory slicing + isolation...
               | 
               | Fine, but then that's your reason. "until there's a
               | security vuln in an end-of-life piece of core
               | software...and then it's a 4+ week slog of building a new
               | server" isn't a difference in the context of comparing
               | Xen VMs and lxd containers. As an aside, lxd does support
               | cgroup memory slicing. It has the advantage that it's not
               | mandatory like it is in VMs, but you can do it if you
               | want it.
               | 
               | > Would rather use Docker for containers because the
               | whole container ecosystem is built around it.
               | 
               | This makes no sense. You're hearing the word "container"
               | and inferring an equivalence that does not exist. The
               | "whole container ecosystem" is something that exists for
               | Docker-style containers, and is entirely irrelevant for
               | lxd containers.
               | 
               | lxd containers are equivalent to full systems, and exist
               | in the "Use one big server" ecosystem. If you're familiar
               | with running a full system into a VM, then you're
               | familiar with the inside of a lxd container. They're the
               | same. In userspace, there's no significant difference.
        
           | YetAnotherNick wrote:
           | Even lxd has updates, many a times security updates.
        
           | ansible wrote:
           | I use LXC a lot for our relatively small production setup.
           | And yes, I'm treating the servers like pets, not cattle.
           | 
           | What's nice is that I can snapshot a container and move it to
           | another physical machine. Handy for (manual) load balancing
           | and upgrades to the physical infrastructure. It is also easy
           | to run a snapshot of the entire server and then run an
           | upgrade, then if the upgrade fails, you roll back to the old
           | snapshot.
        
         | pclmulqdq wrote:
         | Containers are your friend here. The sysadmin tools that have
         | grown out of the cloud era are actually really helpful if you
         | don't cloud too much.
        
       | cxromos wrote:
       | is this clickbait?
       | 
       | although i do like the alternate version: use servers, but don't
       | be too serverly.
        
       | jedberg wrote:
       | I'm a huge advocate of cloud services, and have been since 2007
       | (not sure where this guy got 2010 as the start of the "cloud
       | revolution"). That out of the way, there is something to be said
       | for starting off with a monolith on a single beefy server. You'll
       | definitely iterate faster.
       | 
       | Where you'll get into trouble is if you get popular quickly. You
       | may run into scaling issues early on, and then have to scramble
       | to scale. It's just a tradeoff you have to consider when starting
       | your project -- iterate quickly early and then scramble to scale,
       | or start off more slowly but have a better ramping up story.
       | 
       | One other nitpick I had is that OP complains that even in the
       | cloud you still have to pay for peak load, but while that's
       | strictly true, it's amortized over so many customers that you
       | really aren't paying for it unless you're very large. The more
       | you take advantage of auto-scaling, the less of the peak load
       | you're paying. The customers who aren't auto-scaling are the ones
       | who are covering most of that cost.
       | 
       | You can run a pretty sizable business in the free tier on AWS and
       | let everyone else subsidize your peak (and base!) costs.
        
         | rmbyrro wrote:
         | Isn't this simplistic?
         | 
         | It really depends on the service, how it is used, the shape of
         | the data generated/consumed, what type of queries are needed,
         | etc.
         | 
         | I've worked for a startup that hit scaling issues with ~50
         | customers. And have seen services with +million users on a
         | single machine.
         | 
         | And what does "quickly" and "popular" even mean? It also
         | depends a lot on the context. We need to start discussing about
         | mental models for developers to think of scaling in a
         | contextual way.
        
         | Phil_Latio wrote:
         | > Where you'll get into trouble is if you get popular quickly.
         | You may run into scaling issues early on
         | 
         | Did it ever occur to you that you can still use the cloud for
         | on demand scaling? =)
        
           | jedberg wrote:
           | Sure but only if you architect it that way, which most people
           | don't if they're using one big beefy server, because the
           | whole reason they're doing that is to iterate quickly. It's
           | hard to build something that can bust to the cloud while
           | moving quickly.
           | 
           | Also, the biggest issue is where your data is. If you want to
           | bust to the cloud, you'll probably need a copy of your data
           | in the cloud. Now you aren't saving all that much money
           | anymore and adding in architectural overhead. If you're going
           | to bust to the cloud, you might as well just build in the
           | cloud. :)
        
       | EddySchauHai wrote:
       | > But if I use Cloud Architecture, I Don't Have to Hire Sysadmins
       | 
       | > Yes you do. They are just now called "Cloud Ops" and are under
       | a different manager. Also, their ability to read the arcane
       | documentation that comes from cloud companies and keep up with
       | the corresponding torrents of updates and deprecations makes them
       | 5x more expensive than system administrators.
       | 
       | I don't believe "Cloud Ops" is more complex than system
       | administration, having studied for the CCNA so being on the
       | Valley of Despair slope of the Dunning Kruger effect. If keeping
       | up with cloud companies updates is that much of a challenge to
       | warrant a 5x price over a SysAdmin then that's telling you
       | something about their DX...
        
       | abrax3141 wrote:
       | I may be misunderstanding, but it looks like the micro-services
       | comparison here is based on very high usage. Another use for
       | micro-services, like lambda, is exactly the opposite. If you have
       | very low usage, you aren't paying for cycles you don't use the
       | way you would be if you either owned the machine, or rented it
       | from AWS or DO and left it on all the time (which you'd have to
       | do in order to serve that randomly-arriving one hit per day!)
        
         | pclmulqdq wrote:
         | If you have microservices that truly need to be separate
         | services and have very little usage, you probably should use
         | things like serverless computing. It scales down to 0 really
         | well.
         | 
         | However, if you have a microservice with very little usage,
         | turning that service into a library is probably a good idea.
        
           | abrax3141 wrote:
           | Yes. I think that the former case is the situation we're in.
           | Lambdas are annoying (the whole AWS is annoying!) but, as you
           | say, scales to 0 very well.
        
         | marcosdumay wrote:
         | Why open yourself to random $300k bills from Amazon when the
         | alternative is wasting a $5/month server?
        
           | abrax3141 wrote:
           | I don't understand what these numbers are referring to.
        
             | marcosdumay wrote:
             | One is a normal size of those rare, but not too rare bills
             | people get from Amazon when their unused unoptmized
             | application gets some surprise usage.
             | 
             | The other is how much it costs to have an always-on server
             | paid VPS capable of answering the once a day request you
             | specified.
        
       ___________________________________________________________________
       (page generated 2022-08-02 23:00 UTC)