[HN Gopher] Use one big server
___________________________________________________________________
Use one big server
Author : pclmulqdq
Score : 841 points
Date : 2022-08-02 14:43 UTC (8 hours ago)
(HTM) web link (specbranch.com)
(TXT) w3m dump (specbranch.com)
| londons_explore wrote:
| Hybrid!
|
| If you are at all cost sensitive, you should have some of your
| own infrastructure, some rented, and some cloud.
|
| You should design your stuff to be relatively easily moved and
| scaled between these. Build with docker and kubernetes and that's
| pretty easy to do.
|
| As your company grows, the infrastructure team can schedule which
| jobs run where, and get more computation done for less money than
| just running everything in AWS, and without the scaling headaches
| of on-site stuff.
| dekhn wrote:
| Science advances as RAM on a single machine increases.
|
| For many years, genomics software was non-parallel and depending
| on having a lot of RAM- often a terabyte or more- to store data
| in big hash tables. Converting that to distributed computing was
| a major effort and to this day many people still just get a Big
| Server With Lots of Cores, RAM, and SSD.
|
| Personally after many years of working wiht distributed, I
| absolutely enjoy working on a big fat server that I have all to
| myself.
| bee_rider wrote:
| On the other hand in science, it sure is annoying that the size
| of problems that fit in a single node is always increasing.
| PARDISO running on a single node will always be nipping at your
| heels if you are designing a distributed linear system
| solver...
| notacoward wrote:
| > Science advances as RAM on a single machine increases.
|
| Also as people learn that correlation does not equal causation.
| ;)
| rstephenson2 wrote:
| It seems like lots of companies start in the cloud due to low
| commitments, and then later when they have more stability and
| demand and want to save costs, making bigger cloud commitments
| (RIs, enterprise agreements etc) are a turnkey way to save money
| but always leave you on the lower-efficiency cloud track. Has
| anyone had good experiences selectively offloading workloads from
| the cloud to bare metal servers nearby?
| reillyse wrote:
| Nope. Multiple small servers.
|
| 1) you need to get over the hump and build in multiple servers
| into your architecture from the get go (the author says you need
| two servers minimum), so really we are talking about two big
| servers.
|
| 2) having multiple small servers allows us to spread our service
| into different availability zones
|
| 3) multiple small servers allows us to do rolling deploys without
| bringing down our entire service
|
| 4) once we use the multiple small servers approach it's easy to
| scale up and down our compute by adding or removing machines.
| Having one server it's difficult to scale up or down without
| buying more machines. Small servers we can add incrementally but
| with the large server approach scaling up requires downtime and
| buying a new server.
| zhte415 wrote:
| It completely depends on what you doing. This was pointed out
| in the first paragraph of the article:
|
| > By thinking about the real operational considerations of our
| systems, we can get some insight into whether we actually need
| distributed systems for most things.
| Nextgrid wrote:
| > you need to get over the hump and build in multiple servers
| into your architecture from the get go (the author says you
| need two servers minimum), so really we are talking about two
| big servers.
|
| Managing a handful of big servers can be done manually if
| needed - it's not pretty but it works and people have been
| doing it just fine before the cloud came along. If you
| intentionally plan on having dozens/hundreds of small servers,
| manual management becomes unsustainable and now you need a
| control plane such as Kubernetes, and all the complexity and
| failure modes it brings.
|
| > having multiple small servers allows us to spread our service
| into different availability zones
|
| So will 2 big servers in different AZs (whether cloud AZs or
| old-school hosting providers such as OVH).
|
| > multiple small servers allows us to do rolling deploys
| without bringing down our entire service
|
| Nothing prevents you from starting multiple instances of your
| app on one big server nor doing rolling deploys with big bare-
| metal assuming one server can handle the peak load (so you take
| out your first server out of the LB, upgrade it, put it back in
| the LB, then do the same for the second and so on).
|
| > once we use the multiple small servers approach it's easy to
| scale up and down our compute by adding or removing machines.
| Having one server it's difficult to scale up or down without
| buying more machines. Small servers we can add incrementally
| but with the large server approach scaling up requires downtime
| and buying a new server.
|
| True but the cost premium of the cloud often offsets the
| savings of autoscaling. A bare-metal capable of handling peak
| load is often cheaper than your autoscaling stack at low load,
| therefore you can just overprovision to always meet peak load
| and still come out ahead.
| SoftTalker wrote:
| I manage hundreds of servers, and use Ansible. It's simple
| and it gets the job done. I tried to install Kubernetes on a
| cluster and couldn't get it to work. I mean I know it works,
| obviously, but I could not figure it out and decided to stay
| with what works for me.
| eastbound wrote:
| But it's specific, and no-one will want to take over your
| job.
|
| The upside of a standard AWS CloudFormation file is that
| engineers are replaceable. They're cargo-cult engineers,
| but they're not worried for their career.
| Nextgrid wrote:
| > But it's specific, and no-one will want to take over
| your job.
|
| It really depends what's on the table. Offer just half of
| the cost savings vs an equivalent AWS setup as a bonus
| (and pocket the other half) and I'm sure you'll find
| people who will happily do it (and you'll be happy to
| pocket the other half). For a lot of companies even just
| _half_ of the cost savings would be a significant sum
| (reminds me of an old client who spent _thousands_ per
| month on an RDS cluster that not only was slower than my
| entry-level MacBook, but ended up crapping out and stuck
| in an inconsistent state for 12 hours and required manual
| intervention from AWS to recover - so much for managed
| services - ended up restoring a backup but I wish I could
| 've SSH'd in and recovered it in-place).
|
| As someone who uses tech as a means to an end and is more
| worried about the _output_ said tech produces than the
| tech itself (aka I 'm not looking for a job nor resume
| clout nor invites to AWS/Hashicorp/etc conferences,
| instead I bank on the business problems my tech solves),
| I'm personally very happy to get my hands dirty with old-
| school sysadmin stuff if it means I don't spend 10-20x
| the money on infrastructure just to make Jeff Bezos
| richer - my end customers don't know nor care either way
| while my wallet appreciates the cost savings.
| [deleted]
| rubiquity wrote:
| The line of thinking you follow is what is plaguing this
| industry with too much complexity and simultaneously throwing
| away incredible CPU and PCIe performance gains in favor of
| using the network.
|
| Any technical decisions about how many instances to have and
| how they should be spread out needs to start as a business
| decision and end in crisp numbers about recovery point/time
| objections, and yet somehow that nearly never happens.
|
| To answer your points:
|
| 1) Not necessarily. You can stream data backups to remote
| storage and recover from that on a new single server as long as
| that recovery fits your Recovery Time Objective (RTO).
|
| 2) What's the benefit of multiple AZs if the SLA of a single AZ
| is greater than your intended availability goals? (Have you
| checked your provider's single AZ SLA?)
|
| 3) You can absolutely do rolling deploys on a single server.
|
| 4) Using one large server doesn't mean you can't compliment it
| with smaller servers on an as-needed basis. AWS even has a
| service for doing this.
|
| Which is to say: there aren't any prescriptions when it comes
| to such decisions. Some businesses warrant your choices, the
| vast majority do not.
| reillyse wrote:
| Ok, so to your points.
|
| "It depends" is the correct answer to the question, but the
| least informative.
|
| One Big Server or multiple small servers? It depends.
|
| It always depends. There are many workloads where one big
| server is the perfect size. There are many workloads where
| many small servers are the perfect solution.
|
| What my point is, is that the ideas put forward in the
| article are flawed for the vast majority of use cases.
|
| I'm saying that multiple small servers are a better solution
| on a number of different axis.
|
| For 1) "One Server (Plus a Backup) is Usually Plenty" Now I
| need some kind of remote storage streaming system and some
| kind of manual recovery, am I going to fail over to the
| backup (and so it needs to be as big as my "One server" or
| will I need to manually recover from my backup?
|
| 2) Yes it depends on your availability goals, but you get
| this as a side effect of having more than one small instance
|
| 3) Maybe I was ambiguous here. I don't just mean rolling
| deploys of code. I also mean changing the server code,
| restarting, upgrading and changing out the server. What
| happens when you migrate to a new server (when you scale up
| by purchasing a different box). Now we have a manual process
| that doesn't get executed very often and is bound to cause
| downtime.
|
| 4) Now we have "Use one Big Server - and a bunch of small
| ones"
|
| I'm going to add a final point on reliability. By far the
| biggest risk factor for reliability is me the engineer. I'm
| responsible for bringing down my own infra way more than any
| software bug or hardware issue. The probability of me messing
| up everything when there is one server that everything
| depends on is much much higher, speaking from experience.
|
| So. Like I said, I could have said "It depends" but instead I
| tried to give a response that was someway illuminating and
| helpful, especially given the strong opinions expressed in
| the article.
|
| I'll give a little color with the current setup for a site I
| run.
|
| moustachecoffeeclub.com runs on ECS
|
| I have 2 on-demand instances and 3 spot instances
|
| One tiny instance running my caches (redis, memcache) One
| "permanent" small instance running my web server
|
| Two small spot instances running web server One small spot
| instance running background jobs
|
| small being about 3 GB and 1024 CPU units
|
| And an RDS instance with backup about $67 / month
|
| All in I'm well under $200 per month including database.
|
| So you can do multiple small servers inexpensively.
|
| Another aspect is that I appreciate being able to go on
| vacation for a couple of weeks, go camping or take a plane
| flight without worrying if my one server is going to fall
| over when I'm away and my site is going to be down for a
| week. In a big company maybe there is someone paid to monitor
| this, but with a small company I could come back to a smoking
| hulk of a company and that wouldn't be fun.
| bombcar wrote:
| > Any technical decisions about how many instances to have
| and how they should be spread out needs to start as a
| business decision and end in crisp numbers about recovery
| point/time objections, and yet somehow that nearly never
| happens.
|
| Nobody wants to admit that their business or their department
| actually has a SLA of "as soon as you can, maybe tomorrow, as
| long as it usually works". So everything is pretend-
| engineered to be fifteen nines of reliability (when in
| reality it sometimes explodes _because_ of the "attempts" to
| make it robust).
|
| Being honest about the _actual_ requirements can be extremely
| helpful.
| bob1029 wrote:
| > Nobody wants to admit that their business or their
| department actually has a SLA of "as soon as you can, maybe
| tomorrow, as long as it usually works". So everything is
| pretend-engineered to be fifteen nines of reliability (when
| in reality it sometimes explodes because of the "attempts"
| to make it robust).
|
| I have yet to see my principal technical frustrations
| summarized so concisely. This is at the heart of
| _everything_.
|
| If the business and the engineers can get over their
| ridiculous obsession of statistical outcomes and strict
| determinism, they would be able to arrive at a much more
| cost effective, simple and human-friendly solution.
|
| The # of businesses that are _actually_ sensitive to >1
| minute of annual downtime are already running on top of IBM
| mainframes and have been for decades. No one's business is
| as important as the federal reserve or pentagon, but they
| don't want to admit it to themselves or others.
| marcosdumay wrote:
| > The # of businesses that are actually sensitive to >1
| minute of annual downtime are already running on top of
| IBM mainframes and have been for decades.
|
| Is there any?
|
| My bank certainly has way less than 5 9s of availability.
| It's not a problem at all. Credit/debit card processors
| seem to stay around 5 nines, and nobody is losing sleep
| over it. As long as your unavailability isn't all on the
| Christmas promotion day, I never saw anybody losing any
| sleep over web-store unavailability. The FED probably
| doesn't have 5 9's of availability. It's way overkill for
| a central bank, even if it's one that process online
| interbank transfers (what the FED doesn't).
|
| The organizations that need more than 5 9's are probably
| all on the military and science sectors. And those aren't
| using mainframes, they certainly use good old redundancy
| of equipment with simple failure modes.
| bob1029 wrote:
| > simultaneously throwing away incredible CPU and PCIe
| performance gains
|
| We _really_ need to double down on this point. I worry that
| some developers believe they can defeat the laws of physics
| with clever protocols.
|
| The amount of time it takes to round trip the network _in the
| same datacenter_ is roughly 100,000 to 1,000,000 nanoseconds.
|
| The amount of time it takes to round trip L1 cache is around
| half a nanosecond.
|
| A trip down PCIe isn't much worse, relatively speaking. Maybe
| hundreds of nanoseconds.
|
| Lots of assumptions and hand waving here, but L1 cache _can
| be_ around 1,000,000x faster than going across the network.
| SIX orders of magnitude of performance are _instantly_
| sacrificed to the gods of basic physics the moment you decide
| to spread that SQLite instance across US-EAST-1. Sure, it
| might not wind up a million times slower on a relative basis,
| but you 'll never get access to those zeroes again.
| roflyear wrote:
| I agree! Our "distributed cloud database" just went down last
| night for a couple of HOURS. Well, not entirely down. But
| there were connection issues for hours.
|
| Guess what never, never had this issue? The hardware I keep
| in a datacenter lol!
| dvfjsdhgfv wrote:
| > The line of thinking you follow is what is plaguing this
| industry with too much complexity and simultaneously throwing
| away incredible CPU and PCIe performance gains in favor of
| using the network.
|
| It will die out naturally once people realize how much the
| times have changed and that the old solutions based on weaker
| hardware are no longer optimal.
| deathanatos wrote:
| > _2) What 's the benefit of multiple AZs if the SLA of a
| single AZ is greater than your intended availability goals?
| (Have you checked your provider's single AZ SLA?)_
|
| ... my providers single AZ SLA is less than my company's
| intended availability goals.
|
| (IMO our goals are also nuts, too, but it is what it is.)
|
| Our provider, in the worse case (a VM using a managed hard
| disk) has an SLA of 95% within a month (I ... think. Their
| SLA page uses incorrect units on the top line items. The
| examples in the legalese -- examples are normative, right? --
| use a unit of % / mo...).
|
| You're also assuming a provider a.) typically meets their
| SLAs and b.) if they don't, honors them. IME, (a) is highly
| service dependent, with some services being just _stellar_ at
| it, and (b) is usually "they will if you can _prove_ to them
| with your own metrics they had an outage, and push for a
| credit. Also (c.) the service doesn 't fail in a way that's
| impactful, but not covered by SLA. (E.g., I had a cloud
| provider once whose SLA was over "the APIs should return
| 2xx", and the APIs during the outage, always returned "2xx,
| I'm processing your request". You then polled the API and got
| "2xx your request is pending". Nothing was happening, because
| they were having an outage, but that outage could continue
| indefinitely without impacting the SLA! _That_ was a fun
| support call...)
|
| There's also (d) AZs are a myth; I've seen multiple global
| outages. E.g., when something like the global authentication
| service falls over and takes basically every other service
| with it. (Because nothing can authenticate. What's even
| better is the provider then listing those services as "up" /
| not in an outage, because _technically_ it 's not _that_
| service that 's down, it is just the authentication service.
| Cause God forbid you'd have to give out _that_ credit. But
| the provider calling a service "up" that is failing 100% of
| the requests sent its way is just rich, from the customer's
| view.)
| ericd wrote:
| On a big server, you would probably be running VMs rather than
| serving directly. And then it becomes easy to do most of what
| you're talking about - the big server is just a pool of
| resources from which to make small, single purpose VMs as you
| need them.
| Koshkin wrote:
| Why VMs when you can use containers?
| ericd wrote:
| If you prefer those, go for it. I like my infra tech to be
| about as boring and battle tested as I can get it without
| big negatives in flexibility.
| Koshkin wrote:
| In theory, VMs should only be needed to run different
| OSes on one big box. Otherwise, what should have sufficed
| (speaking of what I 'prefer') is a multiuser OS that does
| not require additional layers to ensure security and
| proper isolation of users and their work environments
| from each other. Unfortunately, looks like UNIX and its
| descendants could not deliver on this basic need. (I
| wonder if Multics had something of a better design in
| this regard.)
| cestith wrote:
| Why containers when you can use unikernel applications?
| Koshkin wrote:
| But can unikernel applications share a big server
| (without themselves running inside VMs)?
| mixmastamyk wrote:
| Better support when at least in the neighborhood of the
| herd.
| PeterCorless wrote:
| We have a different take on running "one big database." At
| ScyllaDB we prefer vertical scaling because you get better
| utilization of all your vCPUs, but we still will keep a
| replication factor of 3 to ensure that you can maintain [at
| least] quorum reads and writes.
|
| So we would likely recommend running 3x big servers. For those
| who want to plan for failure, though, they might prefer to have
| 6x medium servers, because then the loss of any one means you
| don't take as much of a "torpedo hit" when any one server goes
| offline.
|
| So it's a balance. You want to be big, but you don't want to be
| monolithic. You want an HA architecture so that no one node kills
| your entire business.
|
| I also suggest that people planning systems create their own
| "torpedo test." We often benchmark to tell maximal optimum
| performance, presuming that everything is going to go right.
|
| But people who are concerned about real-world outage planning may
| want to "torpedo" a node to see how a 2-out-of-3-nodes-up cluster
| operates, versus a 5-out-of-6-nodes-up cluster.
|
| This is like planning for major jets, to see if you can work with
| 2 of 3 engines, or 1 of 2.
|
| Obviously, if you have 1 engine, there is nothing you can do if
| you lose that single point of failure. At that point, you are
| updating your resume, and checking on the quality of your
| parachute.
| vlovich123 wrote:
| > At that point, you are updating your resume, and checking on
| the quality of your parachute
|
| The ordering of these events seems off but that's
| understandable considering we're talking about distributed
| systems.
| pclmulqdq wrote:
| I think this is the right approach, and I really admire the
| work you do at ScyllaDB. For something truly critical, you
| really do want to have multiple nodes available (at least 2,
| and probably 3 is better). However, you really should want to
| have backup copies in multiple datacenters, not just the one.
|
| Today, if I were running something that absolutely needed to be
| up 24/7, I would run a 2x2 or 2x3 configuration with async
| replication between primary and backup sites.
| PeterCorless wrote:
| Exactly. Regional distribution can be vital. Our customer
| Kiwi.com had a datacenter fire. 10 of their 30 nodes were
| turned to a slag heap of ash and metal. But 20 of 30 nodes in
| their cluster were in completely different datacenters so
| they lost zero data and kept running non-stop. This is a rare
| story, but you do NOT want to be one of the thousands of
| others that only had one datacenter, and their backups were
| also stored there and burned up with their main servers. Oof!
|
| https://www.scylladb.com/2021/03/23/kiwi-com-nonstop-
| operati...
| zokier wrote:
| If you have just two servers how are you going to load-balance
| and fail-over them? Generally you need at least 3 nodes for any
| sort of quorum?
| titzer wrote:
| Last year I did some consulting for a client using Google cloud
| services such as Spanner and cloud storage. Storing and indexing
| mostly timeseries data with a custom index for specific types of
| queries. It was difficult for them to define a schema to handle
| the write bandwidth needed for their ingestion. In particular it
| required a careful hashing scheme to balance load across shards
| of the various tables. (It seems to be a pattern with many
| databases to suck at append-often, read-very-often patterns, like
| logs).
|
| We designed some custom in-memory data structures in Java but
| also also some of the standard high-performance concurrent data
| structures. Some reader/write locks. gRPC and some pub/sub to get
| updates on the order of a few hundred or thousand qps. In the
| end, we ended up with JVM instances that had memory requirements
| in the 10GB range. Replicate that 3-4x for failover, and we could
| serve queries at higher rates and lower latency than hitting
| Spanner. The main thing cloud was good for was the storage of the
| underlying timeseries data (600GB maybe?) for fast server
| startup, so that they could load the index off disk in less than
| a minute. We designed a custom binary disk format to make that
| blazingly fast, and then just threw binary files into a cloud
| filesystem.
|
| If you need to serve < 100GB of data and most of it is
| static...IMHO, screw the cloud, use a big server and replicate it
| for fail-over. Unless you got really high write rates or have
| seriously stringent transactional requirements, then man, a
| couple servers will do it.
|
| YMMV, but holy crap, servers are huge these days.
| eastbound wrote:
| When you say "screw the cloud", you mean "administer an EC2
| machine yourself" or really "buy your own hardware"?
| titzer wrote:
| The former, mostly. You don't necessarily have to use EC2,
| but that's easy to do. There are many other, smaller
| providers if you really want to get out from under the big 3.
| I have no experience managing hardware, so I personally
| wouldn't take that on myself.
| sllabres wrote:
| I would think that it can hold 1TB of RAM _per_socket_ (with 64GB
| DIMM), so _2TB_ total.
| bob1029 wrote:
| > 1 million IOPS on a NoSQL database
|
| I have gone well beyond this figure by doing clever tricks in
| software and batching multiple transactions into IO blocks where
| feasible. If your average transaction is substantially smaller
| than the IO block size, then you are probably leaving a lot of
| throughput on the table.
|
| The point I am trying to make is that even if you think "One Big
| Server" might have issues down the road, there are always some
| optimizations that can be made. Have some faith in the vertical.
|
| This path has worked out _really_ well for us over the last
| ~decade. New employees can pick things up much more quickly when
| you don 't have to show them the equivalent of a nuclear reactor
| CAD drawing to get started.
| mathisonturing wrote:
| > batching multiple transactions into IO blocks where feasible.
| If your average transaction is substantially smaller than the
| IO block size, then you are probably leaving a lot of
| throughput on the table.
|
| Could you expand on this? A quick Google search didn't help.
| Link to an article or a brief explanation would be nice!
| bob1029 wrote:
| Sure. If you are using some micro-batched event processing
| abstraction, such as the LMAX Disruptor, you have an
| opportunity to take small batches of transactions and process
| them as a single unit to disk.
|
| For event sourcing applications, multiple transactions can be
| coalesced into a single IO block & operation without much
| drama using this technique.
|
| Surprisingly, this technique also _lowers_ the amount of
| latency that any given user should experience, despite the
| fact that you are "blocking" multiple users to take
| advantage of small batching effects.
| lanstin wrote:
| I didn't see a point of cloudy services being easier to manage.
| If some team gets a capital budget to buy that one big server,
| they will put every thing on it, no matter your architectural
| standards. Cron jobs editing state on disk, tmux sessions shared
| between teams, random web servers doing who knows what, non-DBA
| team Postgres installs, etc. at least in cloud you can limit
| certain features and do charge back calculations.
|
| Not sure if that is a net win for cloud or physical, of course,
| but I think it is a factor
| kgeist wrote:
| One of our projects uses 1 big server and indeed, everyone
| started putting everything on it (because it's powerful): the
| project itself, a bunch of corporate sites, a code review tool,
| and god knows what else. Last week we started having issues
| with the projects going down because something is overloading
| the system and they still can't find out what exactly without
| stopping services/moving them to a different machine
| (fortunately, it's internal corporate stuff, not user-facing
| systems). The main problem I've found with this setup is that
| random stuff can accumulate with time and then one
| tool/process/project/service going out of control can bring
| down the whole machine. If it's N small machines, there's
| greater isolation.
| pclmulqdq wrote:
| It sounds like you need some containers.
| kbenson wrote:
| One server is for a hobby, not a business. Maybe that's fine, but
| keep that in mind. Backups at that level are something that keeps
| you from losing all data, not something that keeps you running
| and gets you up in any acceptable timeframe for most businesses.
|
| That doesn't mean you need to use the cloud, it just means one
| big piece of hardware with all its single points of failure is
| often not enough. Two servers gets you so much more than one. You
| can make one a hot spare, or actually split services between them
| and have each be ready to take over for specific services for the
| other, greatly including your burst handling capability and
| giving you time to put more resources in place to keep n+1
| redundancy going if you're using more than half of a server's
| resources.
| secabeen wrote:
| This is exactly the OPs recommended solution:
|
| > One Server (Plus a Backup) is Usually Plenty
| kbenson wrote:
| The I guess my first sentence is about _eqally_ as click-
| baity as the article title. ;)
| vitro wrote:
| Let's Encrypt's database server [1] would beg to differ. For
| businesses at certain scale two servers are really an overkill.
|
| [1] https://letsencrypt.org/2021/01/21/next-gen-database-
| servers...
| mh- wrote:
| That says they use a single _database_ , as in a logical
| MySQL database. I don't see any claim that they use a single
| _server_. In fact, the title of the article you 've linked
| suggests they use multiple.
| simonw wrote:
| https://letsencrypt.status.io/ shows a list of their
| servers, which look to be spread across three data centers
| (one "public", two "high availability").
| kbenson wrote:
| Do we know if it shows cold spares? That's all I think is
| needed at a minimum to avoid the problems I'm talking
| about, and I doubt they would note those if they don't
| necessarily have a hostname.
| kbenson wrote:
| Do they actually say they don't have a slave to that database
| ready to take over? I seriously doubt Let's Encrypt has no
| spare.
|
| Note I didn't say you shouldn't run one service (as in
| daemon) or set of services from one box, just that one box is
| not enough and you need that spare.
|
| It Let's Encrypt actually has no spare for their database
| server and they're one hardware failure away from being down
| for what may be a large chunk of time (I highly doubt it),
| then I wouldn't want to use them even if free. Thankfully, I
| doubt your interpretation of what that article is saying.
| vitro wrote:
| You're right, from the article:
|
| > The new AMD EPYC CPUs sit at about 25%. You can see in
| this graph where we promoted the new database server from
| replica (read-only) to primary (read/write) on September
| 15.
| kubb wrote:
| As per usual, don't copy Google if you don't have the same
| requirements. Google Search never goes down. HN goes down from
| time and nobody minds. Google serves tens (hundreds?) of
| thousands of queries per second. HN serves ten. HN is fine with
| one server because it's small. How big is your service going to
| be? Do that boring math :)
| FartyMcFarter wrote:
| Even Google search has gone down apparently, for five minutes
| in 2013:
|
| https://www.cnet.com/tech/services-and-software/google-goes-...
| terafo wrote:
| There were huge availability issues as recent as December
| 14th 2020, for 45 minutes.
| roflyear wrote:
| Correct. I like to ask "how much money do we lose if the site
| goes down for 1hr? a day?" etc.. and plan around that. If you
| are losing 1m an hour, or 50m if it goes down for a day, hell
| yeah you should spend a few million on making sure your site
| stays online!
|
| But, it is amazing how often c-levels cannot answer this
| question!
| _nhh wrote:
| I agree
| rbanffy wrote:
| I wouldn't recommend one, but at least two, for redundancy.
| londons_explore wrote:
| Don't be scared of 'one big server' for reliability. I'd bet that
| if you hired a big server today in a datacenter, the hardware
| will have more uptime than something cloud-native with az-
| failover hosted on AWS.
|
| Just make sure you have a tested 30 minute restoration plan in
| case of permanent hardware failure. You'll probably only use it
| once every 50 years on average, but it will be an expensive event
| when it happens.
| cpursley wrote:
| You've got features to ship. Stick your stuff on Render.com and
| don't think about it again. Even a dummy like me can manage that.
| alexpotato wrote:
| My favorite summary of why not to use microservices is from Grug:
|
| "grug wonder why big brain take hardest problem, factoring system
| correctly, and introduce network call too
|
| seem very confusing to grug"
|
| https://grugbrain.dev/#grug-on-microservices
| fleddr wrote:
| Our industry summarized:
|
| Hardware engineers are pushing the absolute physical limits of
| getting state (memory/storage) as close as possible to compute. A
| monumental accomplishment as impactful as the invention of
| agriculture and the industrial revolution.
|
| Software engineers: let's completely undo all that engineering by
| moving everything apart as far as possible. Hmmm, still too fast.
| Let's next add virtualization and software stacks with shitty
| abstractions.
|
| Fast and powerful browser? Let's completely ignore 20 years of
| performance engineering and reinvent...rendering. Hmm, sucks a
| bit. Let's add back server rendering. Wait, now we have to render
| twice. Ah well, let's just call it a "best practice".
|
| The mouse that I'm using right now (an expensive one) has a 2GB
| desktop Electron app that seems to want to update itself twice a
| week.
|
| The state of us, the absolute garbage that we put out, and the
| creative ways in which we try to justify it. It's like a mind
| virus.
|
| I want my downvotes now.
| GuB-42 wrote:
| Actually, for those who push for these cloudy solutions, they
| do that in part to make data close to you. I am talking mostly
| about CDNs, I don't thing YouTube and Netflix would have been
| possible without them.
|
| Google is a US company, but you don't want people in Australia
| to connect to the other side of the globe every time they need
| to access Google services, it would be an awful waste of
| intercontinental bandwidth. Instead, Google has data centers in
| Australia to serve people in Australia, and they only hit US
| servers when absolutely needed. And that's when you need to
| abstract things out. If something becomes relevant in
| Australia, move it in there, and move it out when it no longer
| matters. When something big happens, copy it everywhere, and
| replace the copies by something else as interest wanes.
|
| Big companies need to split everything, they can't centralize
| because the world isn't centralized. The problem is when small
| businesses try to do the same because "if Google is so
| successful doing that, it must be right". Scale matters.
| Foomf wrote:
| You've more or less described Wirth's Law:
| https://en.wikipedia.org/wiki/Wirth%27s_law
| fleddr wrote:
| I had no idea, thanks. Consider this a broken clock being
| sometimes right.
| kkielhofner wrote:
| Great article overall with many good points worth considering.
| Nothing is one size fits all so I won't get into the crux of the
| article: "just get one big server". I recently posted a comment
| breaking down the math for my situation:
|
| https://news.ycombinator.com/item?id=32250470#32253635
|
| For the most "extreme" option of buying your own $40k server from
| Dell I'm always surprised at how many people don't consider
| leasing. No matter what it breaks the cost into an operating
| expense vs a capital one which is par with the other options in
| terms of accounting and doesn't require laying out $40k.
|
| Adding on that, in the US we have some absolutely wild tax
| advantages for large "capital expenditures" that also apply to
| leasing:
|
| https://www.section179.org/section_179_leases/
| phendrenad2 wrote:
| The problem with "one big server" is, you really need good
| IT/ops/sysadmin people who can think in non-cloud terms. (If you
| catch them installing docker on it, throw them into a lava pit
| immediately).
| henry700 wrote:
| What's the problem with installing Docker so you can run
| containers of diferent distros, languages & flavors using the
| same one big server though?
| londons_explore wrote:
| One-big-VM is another approach...
|
| A big benefit is some providers will let you resize the VM bigger
| as you grow. The behind-the-scenes implementation is they migrate
| your VM to another machine with near-zero downtime. Pretty cool
| tech, and takes away a big disadvantage of bare metal which is
| growth pains.
| lrvick wrote:
| A consequence of one-big-server is decreased security. You become
| discouraged from applying patches because you must reboot. Also
| if one part of the system is compromised, every service is now
| compromised.
|
| Microservices on distinct systems offer damage control.
| jvanderbot wrote:
| No thanks. I have a few hobby sites, a personal vanity page, and
| some basic CPU expensive services that I use.
|
| Moving to Aws server-less has saved me so much headache with
| system updates, certificate management, archival and backup,
| networking, and so much more. Not to mention with my low-but-
| spikey load, my breakeven is a long way off.
| SassyGrapefruit wrote:
| >Use the Cloud, but don't be too Cloudy
|
| The number of applications I have inherited that were messes
| falling apart at the seams because of misguided attempts to avoid
| "vendor lockin" with the cloud can not be understated. There is
| something I find ironic about people paying to use a platform but
| not using it because they feel like using it too much will make
| them feel compelled to stay there. Its basically starving
| yourself so you don't get too familiar with eating regularly.
|
| Kids this PSA is for you. Auto Scaling Groups are just fine as
| are all the other "Cloud Native" services. Most business partners
| will tell you a dollar of growth is worth 5x-10x the value of a
| dollar of savings. Building a huge tall computer will be cheaper
| but if it isn't 10x cheaper(And that is Total Cost of Ownership
| not the cost of the metal) and you are moving more slowly than
| you otherwise would its almost a certainty you are leaving money
| on the table.
| meeks wrote:
| The whole argument comes down to bursty vs. non-bursty workloads.
| What type of workloads make up the fat part of the distribution?
| If most use cases are bursty (which I would argue they are) then
| the author's argument only applies for specific applications.
| Therefore, most people do indeed see cost benefits from the
| cloud.
| galkk wrote:
| One of first experiences in my professional career was situation
| when "one big server" that was serving the system that was making
| money actually failed on Friday, HP's warranty was like next or 2
| business days to get a replacement.
|
| The entire situation ended up having conference call with
| multiple department directors who were deciding which server from
| other systems to cannibalize (even if it is underpowered) to get
| the system going.
|
| Since that time I'm quite skeptical about "one", and to me this
| is one of big benefits of cloud provides, as, most likely, there
| is another instance and stockouts are more rare.
| jmull wrote:
| The article is really talking about one big server plus a
| backup vs. cloud providers.
| mochomocha wrote:
| > Why Should I Pay for Peak Load? [...] someone in that supply
| chain is charging you based on their peak load
|
| Oh it's even worse than that: this someone oversubscribe your
| hardware a little during your peak and a lot during your trough,
| padding their great margins at the expense of extra cache
| misses/perf degradation of your software that most of the time
| you won't notice if they do their job well.
|
| This is one of the reasons why large companies such as my
| employer (Netflix) are able to invest into their own compute
| platforms to reclaim some of these gains back, so that any
| oversubscription & collocation gains materialize into a lower
| cloud bill - instead of having your spare CPU cycles be funneled
| to a random co-tenant customer of your cloud provider, the latter
| capturing the extra value.
| robertlagrant wrote:
| This is why I like Cloudflare's worker model. It feels like the
| usefulness of cloud deployments, but with a pretty restrained
| pricing model.
| system2 wrote:
| It blows my mind people are spending $2000+ per month for a
| server they can get used for $4000-5000 one time only cost.
|
| VMWare + Synology Business Backup + Synology C2 backup is our way
| of doing business and never failed us for over 7 years. Why do
| people spend so much money for cloud while they can host it
| themselves less than 5% of the cost? (2 year usage assumed).
| adlpz wrote:
| I've tried it all except this, including renting bare metal.
| Nowadays I'm in the cloud but not _cloudy_ camp. Still, I 'm
| intrigued.
|
| Apart from the $4-5k server, what are your running costs?
| Licenses? Colocation? Network?
| vgeek wrote:
| https://www.he.net/colocation.html
|
| They have been around forever and their $400 deal is good,
| but that is for 42U, 1G and only 15 amps. With beefier
| servers, you will need more current (both BW and amperage) if
| you intend on filling the rack.
| soruly wrote:
| that's why letsencrypt use a single database on a powerful server
| https://letsencrypt.org/2021/01/21/next-gen-database-servers...
| wahnfrieden wrote:
| I've started augmenting one big server with iCloud (CloudKit)
| storage, specifically syncing local Realm DBs to the user's own
| iCloud storage. Which means I can avoid taking custody of
| PII/problematic data, can include non-custodial privacy in
| product value/marketing, and means I can charge enough of a
| premium for the one big server to keep it affordable. I know how
| to scale servers in and out, so I feel the value of avoiding all
| that complexity. This is a business approach that leans into
| that, with a way to keep the business growing with domain
| complexity/scope/adoption (iCloud storage, probably other good
| APIs like this to work with along similar lines).
| dugmartin wrote:
| I think Elixir/Erlang is uniquely positioned to get more traction
| in the inevitable microservice/kubernetes backlash and the return
| to single server deploys (with a hot backup). Not only does it
| usually sip server resources but it also scales naturally as more
| cores/threads are available on a server.
| lliamander wrote:
| Going _from_ an Erlang "monolith" to a java/k8s cluster, I was
| amazed at how much more work it is takes to build a "modern"
| microservice. Erlang still feels like the future to me.
| dougmoscrop wrote:
| Can you imagine if even a fraction of the effort poured in to
| k8s tooling had gone in to the Erlang/OTP ecosystem instead?
| dboreham wrote:
| This is the norm. It's only weird things like Node.js and Ruby
| that don't have this property.
| hunterloftis wrote:
| While individual Node.js processes are single-threaded,
| Node.js includes a standard API that distributes its load
| across multiple processes, and therefor cores.
|
| - https://nodejs.org/api/cluster.html#cluster
| throwaway787544 wrote:
| I have been doing this for two decades. Let me tell you about
| bare metal.
|
| Back in the day we had 1,000 physical servers to run a large
| scale web app. 90% of that capacity was used only for two months.
| So we had to buy 900 servers just to make most of our money over
| two events in two seasons.
|
| We also had to have 900 servers because even one beefy machine
| has bandwidth and latency limits. Your network switch simply
| can't pump more than a set amount of traffic through its
| backplane or your NICs, and the OS may have piss-poor packet
| performance too. Lots of smaller machines allow easier scaling of
| network load.
|
| But you can't just buy 900 servers. You always need more
| capacity, so you have to predict what your peak load will be, and
| buy for that. And you have to do it well in advance because it
| takes a long time to build and ship 900 servers and then assemble
| them, run burn-in, replace the duds, and prep the OS, firmware,
| software. And you have to do this every 3 years (minimum) because
| old hardware gets obsolete and slow, hardware dies, disks die,
| support contracts expire. But not all at once, because who knows
| what logistics problems you'd run into and possibly not get all
| the machines in time to make your projected peak load.
|
| If back then you told me I could turn on 900 servers for 1 month
| and then turn them off, no planning, no 3 year capital outlay, no
| assembly, burn in, software configuration, hardware repair, etc
| etc, I'd call you crazy. Hosting providers existed but _nobody_
| could just give you 900 servers in an hour, _nobody_ had that
| capacity.
|
| And by the way: cloud prices are _retail prices_. Get on a
| savings plan or reserve some instances and the cost can be half.
| Spot instances are a quarter or less the price. Serverless is
| pennies on the dollar with no management overhead.
|
| If you don't want to learn new things, buy one big server. I just
| pray it doesn't go down for you, as it can take up to several
| days for some cloud vendors to get some hardware classes in some
| regions. And I pray you were doing daily disk snapshots, and can
| get your dead disks replaced quickly.
| MrStonedOne wrote:
| i handled a 8x increase in traffic to my website from a
| youtuber reviewing our game, by increasing the cache timer and
| fixing the wiki creating session table entries for logged out
| users on a wiki that required accounts to edit it.
|
| we already get multiple millions of page hits a months for this
| happened.
|
| This server had 8 cores but 5 of them were reserved for the
| 10tb a month in bandwidth game servers running on the same
| machine.
|
| If you needed 1,000 physical computers to run your webapp, you
| fucked up somewhere along the line.
| toast0 wrote:
| > I have been doing this for two decades. Let me tell you about
| bare metal.
|
| > Back in the day we had 1,000 physical servers to run a large
| scale web app. 90% of that capacity was used only for two
| months. So we had to buy 900 servers just to make most of our
| money over two events in two seasons.
|
| > We also had to have 900 servers because even one beefy
| machine has bandwidth and latency limits. Your network switch
| simply can't pump more than a set amount of traffic through its
| backplane or your NICs, and the OS may have piss-poor packet
| performance too. Lots of smaller machines allow easier scaling
| of network load.
|
| I started working with real (bare metal) servers on real
| internet loads in 2004 and retired in 2019. While there's truth
| here, there's also missing information. In 2004, all my servers
| had 100M ethernet, but in 2019, all my new servers had 4x10G
| ethernet (2x public, 2x private), actually some of them had 6x,
| but with 2x unconnected, I dunno why. In the meantime, cpu,
| nics, and operating systems have improved such that if you're
| not getting line rate for full mtu packets, it's probably
| becsause your application uses a lot of cpu, or you've hit a
| pathological case in the OS (which happens, but if you're
| running 1000 servers, you've probably got someone to debug
| that).
|
| If you still need 1000 beefy 10G servers, you've got a pretty
| formidable load, but splitting it up into many more smaller
| servers is asking for problems of different kinds. Otoh, if
| your load really scales to 10x for a month, and you're at that
| scale, cloud economics are going to work for you.
|
| My seasonal loads were maybe 50% more than normal, but usage
| trends (and development trends) meant that the seasonal peak
| would become the new normal soon enough; cloud managing the
| peaks would help a bit, but buying for the peak and keeping it
| running for the growth was fine. Daily peaks were maybe 2-3x
| the off-peak usage, 5 or 6 days a week; a tightly managed cloud
| provisioning could reduce costs here, but probably not enough
| to compete with having bare metal for the full day.
| taylodl wrote:
| That's a good point about cloud services being retail. My
| company gets a very large discount from one of the most well-
| known cloud providers. This is available to everybody -
| typically if you commit to 12 months of a minimum usage then
| you can get substantial discounts. What I know is so far
| everything we've migrated to the cloud has resulted in
| _significantly_ reduced total costs, increased reliability,
| improved scalability, and is easier to enhance and remediate.
| Faster, cheaper, better - that 's been a huge win for us!
| fleddr wrote:
| The entire point of the article is that your dated example no
| longer applies: you can fit the vast majority of common loads
| on a single server now, they are this powerful.
|
| Redundancy concerns are also addressed in the article.
| PaulDavisThe1st wrote:
| > If you don't want to learn new things, buy one big server. I
| just pray it doesn't go down for you
|
| There's intermediate ground here. Rent one big server, reserved
| instance. Cloudy in the sense that you get the benefits of the
| cloud provider's infrastructure skills and experience, and
| uptime, plus easy backup provisioning; non-cloudy in that you
| can just treat that one server instance like your own hardware,
| running (more or less) your own preferred OS/distro, with
| "traditional" services running on it (e.g. in our case: nginx,
| gitea, discourse, mantis, ssh)
| yardie wrote:
| Let me take you back to March, 2020. When millions of Americans
| woke up to find out there was a pandemic and they would be
| working from home now. Not a problem, I'll just call up our
| cloud provider and request more cloud compute. You join a queue
| of a thousand other customers calling in that morning for the
| exact same thing. A few hours on hold and the CSR tells you
| they aren't provisioning anymore compute resources. east-us is
| tapped out, central-europe tapped out hours ago, California got
| a clue and they already called to reserve so you can't have
| that either.
|
| I use cloud all the time but there are also blackswan events
| where your IaaS can't do anymore for you.
| tempnow987 wrote:
| I never had this problem on AWS though I did see some
| startups struggle with some more specialized instances. Are
| midsize companies actually running into issues with non-
| specialized compute on AWS?
| kardianos wrote:
| That sounds like you have burst load. Per the article, cloud
| away, great fit.
|
| The point was most people don't have that and even their bursts
| can fit in a single server. This is my experience as well.
| maxbond wrote:
| The thing that confuses me is, isn't every publicly
| accessible service bursty on a long timescale? Everything
| looks seasonal and predictable until you hit the front page
| of Reddit, and you don't know what day that will be. You
| don't decide how much traffic you get, the world does.
| genousti wrote:
| Funily hitting reddit front page might ruin you if you run
| on aws
| NorwegianDude wrote:
| Hitting the front page of reddit is insignificant, it's not
| like you'll get anywhere near thousands upon thousands of
| requests each second. If you have a somewhat normal website
| and you're not doing something weird then it's easily
| handled with a single low-end server.
|
| If I get so much traffic that scaling becomes a problem
| then I'll be happy as I would make a ton of money. No need
| to build to be able to handle the whole world at the same
| time, that's just a waste of money in nearly all
| situations.
| taylodl wrote:
| If you're hosting on-prem then you have a cluster to configure
| and manage, you have multiple data centers you need to provision,
| you need data backups you have to manage plus the storage
| required for all those backups. Data centers also require power,
| cooling, real estate taxes, administration - and you need at
| least two of them to handle systemic outages. Now you have to
| manage and coordinate your data between those data centers. None
| of this is impossible of course, companies have been doing this
| everyday for decades now. But let's not pretend it doesn't all
| have a cost - and unless your business is running a data center,
| none of these costs are aligned with your business' core mission.
|
| If you're running a start-up it's pretty much a no-brainer you're
| going to start off in the cloud.
|
| What's the real criteria to evaluate on-prem versus the cloud?
| Load consistency. As the article notes, serverless cloud
| architectures are perfect for bursty loads. If your traffic is
| highly variable then the ability to quickly scale-up and then
| scale-down will be of benefit to you - and there's a lot of
| complexity you don't have to manage to boot! Generally speaking
| such a solution is going to be cheaper and easier to configure
| and manage. That's a win-win!
|
| If your load isn't as variable and you therefore have cloud
| resources always running, then it's almost always cheaper to host
| those applications on-prem - assuming you have on-prem hosting
| available to you. As I noted above, building data centers isn't
| cheap and it's almost always cheaper to stay in the cloud than it
| is to build a new data center, but if you already have data
| center(s) then your calculus is different.
|
| Another thing to keep in mind at the moment is even if you decide
| to deploy on-prem you may not be able to get the hardware you
| need. A colleague of mine is working on a large project that's to
| be hosted on-prem. It's going to take 6-12 months to get all the
| required hardware. Even prior to the pandemic the backlog was 3-6
| months because the major cloud providers are consuming all the
| hardware. Vendors would rather deal with buyers buying hardware
| by the tens of thousands than a shop buying a few dozen servers.
| You might even find your hardware delivery date getting pushed
| out as the "big guys" get their orders filled. It happens.
| ozim wrote:
| You know you can run a server in the cellar under your stairs.
|
| You know that if you are a startup you can just keep servers in
| a closet and hope that no one turns on coffee machine while
| airco runs because it will pop circuit breakers, which will
| take down your server or maybe you might have UPS at least so
| maybe not :)
|
| I have read horror stories about companies having such setups.
|
| While they don't need multiple data centers, power, cooling and
| redundancy sounds for them like some kind of STD - getting
| cheap VPS should be default for such people. That is a win as
| well.
| nostrebored wrote:
| As someone who's worked in cloud sales and no longer has any skin
| in the game, I've seen firsthand how cloud native architectures
| improve developer velocity, offer enhanced reliability and
| availability, and actually decrease lock-in over time.
|
| Every customer I worked with who had one of these huge servers
| introduced coupling and state in some unpleasant way. They were
| locked in to persisted state, and couldn't scale out to handle
| variable load even if they wanted to. Beyond that, hardware
| utilization became contentious at any mid-enterprise scale.
| Everyone views the resource pool as theirs, and organizational
| initiatives often push people towards consuming the same types of
| resources.
|
| When it came time to scale out or do international expansion,
| every single one of my customers who had adopted this strategy
| had assumptions baked into their access patterns that made sense
| given their single server. When it came time to store some part
| of the state in a way that made sense for geographically
| distributed consumers, it was months not sprints of time spent
| figuring out how to hammer this in to a model that's
| fundamentally at odds.
|
| From a reliability and availability standpoint, I'd often see
| customers tell me that 'we're highly available within a single
| data center' or 'we're split across X data centers' without
| considering the shared failure modes that each of these data
| centers had. Would a fiber outage knock out both of your DCs?
| Would a natural disaster likely knock something over? How about
| _power grids_? People often don't realize the failure modes
| they've already accepted.
|
| This is obviously not true for every workload. It's tech, there
| are tradeoffs you're making. But I would strongly caution any
| company that expects large growth against sitting on a single-
| server model for very long.
| secabeen wrote:
| The common element in the above is scaling and reliability.
| While lots of startups and companies are focused on the 1%
| chance that they are the next Google or Shopify, the reality is
| that nearly all aren't, and the overengineering and redundancy-
| first model that cloud pushes does cost them a lot of runway.
|
| It's even less useful for large companies; there is no world in
| which Kellogg is going to increase sales by 100x, or even 10x.
| nostrebored wrote:
| But most companies aren't startups. Many companies are
| established, growing businesses with a need to be able to
| easily implement new initiatives and products.
|
| The benefits of cloud for LE are completely different. I'm
| happy to break down why, but I addressed the smb and mid-
| enterprise space here because most large enterprises already
| know they shouldn't run on a single rack.
| secabeen wrote:
| > I addressed the smb and mid-enterprise space here because
| most large enterprises already know they shouldn't run on a
| single rack.
|
| This is a straw man. No one, anywhere in this thread or in
| the OPs original article proposed a single-rack solution.
|
| From the OP: > Running a primary and a backup server is
| usually enough, keeping them in different datacenters.
| nostrebored wrote:
| This is just a complete lack of engagement with the post.
| Most LE's know they shouldn't run a two rack setup
| either. That is not the size or layout of any LE that
| I've interacted with. The closest is a bank in the
| developing world that had a few racks split across data
| centers in the same city and was desperately trying to
| move away given power instability in the country.
| tboyd47 wrote:
| Could confirmation bias affect your analysis at all?
|
| How many companies went cloud-first and then ran out of money?
| You wouldn't necessary know anything about them.
|
| Were the scaling problems your single-server customers called
| you to solve unpleasant enough put their core business in
| danger? Or was the expense just a rounding error for them?
| nostrebored wrote:
| From this and the other comment, it looks like I wasn't clear
| about talking about SMB/ME rather than a seed/pre-seed
| startup, which I understand can be confusing given that we're
| on HN.
|
| I can tell you that I've never seen a company run out of
| money from going cloud-first (sample size of over 200 that I
| worked with directly). I did see multiple businesses scale
| down their consumption to near-zero and ride out the
| pandemic.
|
| The answer to scaling problems being unpleasant enough to put
| the business in danger is yes, but that was also during the
| pandemic when companies needed to make pivots to slightly
| different markets. Doing this was often unaffordable from an
| implementation cost perspective at the time when it had to
| happen. I've seen acquisitions fall through due to an
| inability to meet technical requirements because of stateful
| monstrosities. I've also seen top-line revenue get severely
| impacted when resource contention causes outages.
|
| The only times I've seen 'cloud-native' truly backfire were
| when companies didn't have the technical experience to move
| forward with these initiatives in-house. There are a lot of
| partners in the cloud implementation ecosystem who will
| fleece you for everything you have. One such example was a
| k8s microservices shop with a single contract developer
| managing the infra and a partner doing the heavy lifting. The
| partner gave them the spiel on how cloud-native provides
| flexibility and allows for reduced opex and the customer was
| very into it. They stored images in a RDBMS. Their database
| costs were almost 10% of the company's operating expenses by
| the time the customer noticed that something was wrong.
| stevenjgarner wrote:
| If you are not maxing out or even getting above 50% utilization
| of _128 physical cores (256 threads), 512 GB of memory, and 50
| Gbps of bandwidth for $1,318 /month_, I really like the approach
| of multiple low-end consumable computers as servers. I have been
| using arrays of Intel NUCs at some customer sites for years with
| considerable cost savings over cloud offerings. Keep an extra
| redundant one in the array ready to swap out a failure.
|
| Another often overlooked option is that in several fly-over
| states it is quite easy and cheap to register as a public
| telecommunication utility. This allows you to place a powered
| pedestal in the public right-of-way, where you can get situated
| adjacent to an optical meet point and get considerable savings on
| installation costs of optical Internet, even from a tier 1
| provider. If your server bandwidth is peak utilized during
| business hours and there is an apartment complex nearby you can
| use that utility designation and competitively provide
| residential Internet service to offset costs.
| warmwaffles wrote:
| > I have been using arrays of Intel NUCs at some customer sites
| for years
|
| Stares at the 3 NUCs on my desk waiting to be clustered for a
| local sandbox.
| titzer wrote:
| This is pretty devious and I love it.
| tzs wrote:
| I don't understand the pedestal approach. Do you put your
| server in the pedestal, so the pedestal is in effect your data
| center?
| saulrh wrote:
| > competitively provide residential > Internet service to
| offset costs.
|
| I uh. Providing residential Internet for an apartment complex
| feels like an entire business in and of itself and wildly out
| of scope for a small business? That's a whole extra competency
| and a major customer support commitment. Is there something I'm
| missing here?
| stevenjgarner wrote:
| It depends on the scale - it does not have to be a major
| undertaking. You are right, it is _a whole extra competency
| and a major customer support commitment_ , but for a lot of
| the entrepreneurial folk on HN quite a rewarding and
| accessible learning experience.
|
| The first time I did anything like this was in late 1984 in a
| small town in Iowa where GTE was the local telecommunication
| utility. Absolutely abysmal Internet service, nothing
| broadband from them at the time or from the MSO (Mediacom). I
| found out there was a statewide optical provider with cable
| going through the town. I incorporated an LLC, became a
| utility and built out less than 2 miles of single mode fiber
| to interconnect some of my original software business
| customers at first. Our internal moto was "how hard can it
| be?" (more as a rebuke to GTE). We found out. The whole 24x7
| public utility thing was very difficult for just a couple of
| guys. But it grew from there. I left after about 20 years and
| today it is a thriving provider.
|
| Technology has made the whole process so much easier today. I
| am amazed more people do not do it. You can get a small rack-
| mount sheet metal pedestal with an AC power meter and an HVAC
| unit for under $2k. Being a utility will allow you to place
| that on a concrete pad or vault in the utility corridor
| (often without any monthly fee from the city or county). You
| place a few bollards around it so no one drives into it. You
| want to get quotes from some tier 1 providers [0]. They will
| help you identify the best locations to engineer an optical
| meet and those are the locations you run by the
| city/county/state utilities board or commission.
|
| For a network engineer wanting to implement a fault tolerant
| network, you can place multiple pedestals at different
| locations on your provider's/peer's network to create a route
| diversified protected network.
|
| After all, when you are buying expensive cloud based services
| that literally is all your cloud provider is doing ... just
| on a completely more massive scale. The barrier to entry is
| not as high as you might think. You have technology offerings
| like OpenStack [1], where multiple competitive vendors will
| also help you engineer a solution. The government also
| provides (financial) support [2].
|
| The best perk is the number of parking spaces the requisite
| orange utility traffic cone opens up for you.
|
| [0] https://en.wikipedia.org/wiki/Tier_1_network
|
| [1] https://www.openstack.org/
|
| [2] https://www.usda.gov/reconnect
| MockObject wrote:
| In 1984, I am guessing the only use case for broadband
| internet was running an NNTP server?
| marktangotango wrote:
| This is some old school stuff right here. I have a hard
| time believing this sort of gumption and moxy are as
| prevalent today.
|
| > The best perk is the number of parking spaces the
| requisite orange utility traffic cone opens up for you.
|
| That's hilarious.
| bombcar wrote:
| You're missing "apartment complex" - you as the service
| provider contract with the apartment management company to
| basically cover your costs, and they handle the day-to-day
| along with running the apartment building.
|
| Done right, it'll be cheaper for them (they can advertise
| "high speed internet included!" or whatever) and you won't
| have much to do assuming everything on your end just works.
|
| The days where small ISPs provided things like email, web
| hosting, etc, are long gone; you're just providing a DHCP IP
| and potentially not even that if you roll out carrier-grade
| NAT.
| erichocean wrote:
| > _it is quite easy and cheap to register as a public
| telecommunication utility_
|
| Is North Carolina one of those states? I'm intrigued...
| stevenjgarner wrote:
| I have only done a few midwestern states. Call them and ask
| [0] - (919) 733-7328. You may want to first call your
| proposed county commissioner's office or city hall (if you
| are not rural), and ask them who to talk with about a new
| local business providing Internet service. If you can show
| the Utilities Commission that you are working with someone at
| the local level I have found they will treat you more
| seriously. In certain rural counties, you can even qualify
| for funding from the Rural Utilities Service of the USDA.
|
| [0] https://www.ncuc.net/
|
| EDIT: typos + also most states distinguish between
| facilities-based ISP's (ie with physical plant in the
| regulated public right-of-way) and other ISPs. Tell them you
| are looking to become a facilities-based ISP.
| xen2xen1 wrote:
| What other benefits are there to being a "public
| telecommunication utility"?
| stevenjgarner wrote:
| The benefit that is obvious to the regulators is that you
| can charge money for services. So for example, offering
| telephone services requires being a LEC (local exchange
| carrier) or CLEC (competitive local exchange carrier).
| But even telephone services have become considerably
| unregulated through VoIP. It's just that at some point,
| the VoIP has to terminate/interface with a (C)LEC
| offering real dial tone and telephone numbering. You can
| put in your own Asterisk server [0] and provide VoIP
| service on your burgeoning optical utilities network,
| together with other bundled services including
| television, movies, gaming, metering etc.. All of these
| offerings can be resold from wholesale services, where
| all you need is an Internet feed.
|
| Other benefits to being a "public telecommunication
| utility" include the competitive right to place your own
| facilities on telephone/power poles or underground in
| public right-of-way under the Telecommunications Act of
| 1996. You will need to enter into and pay for a pole
| attachment agreement. Of course local governments can
| reserve the right to tariff your facilities, which has
| its own ugliness.
|
| One potentially valuable thing a utility can do is place
| empty conduit in public right of way that can be
| used/resold in the future at a (considerable) gain. For
| example, before highways, roadways, airports and other
| infrastructure is built, it is orders of magnitude
| cheaper just to plow conduit under bare ground before the
| improvements are placed.
|
| [0] https://www.asterisk.org/
| eek2121 wrote:
| > Other benefits to being a "public telecommunication
| utility" include the competitive right to place your own
| facilities on telephone/power poles or underground in
| public right-of-way under the Telecommunications Act of
| 1996. You will need to enter into and pay for a pole
| attachment agreement. Of course local governments can
| reserve the right to tariff your facilities, which has
| its own ugliness.
|
| Note that in many parts of the country, the
| telcos/cablecos themselves own the poles. Google had a
| ton of trouble with AT&T in my state thanks to this. They
| lost to AT&T in court and gave up.
| count wrote:
| While VOIP is mostly unregulated, be acutely aware of
| e-911 laws and requirements. This isn't the Wild West
| shitshow it was in 2003 when I was doing similar things
| :)
|
| https://www.intrado.com/life-safety/e911-regulations has
| a good overview and links to applicable CFR/rules.
| erichocean wrote:
| Thanks!
| stevenjgarner wrote:
| Feel free to reach out at my gmail [0]
|
| [0] https://news.ycombinator.com/user?id=stevenjgarner
| cfors wrote:
| Yep, there's a premium on making your architecture more cloudy.
| However, the best point for Use One Big Server is not necessarily
| running your big monolithic API server, but your database.
|
| Use One Big Database.
|
| Seriously. If you are a backend engineer, nothing is worse than
| breaking up your data into self contained service databases,
| where everything is passed over Rest/RPC. Your product asks will
| consistently want to combine these data sources (they don't know
| how your distributed databases look, and oftentimes they really
| do not care).
|
| It is so much easier to do these joins efficiently in a single
| database than fanning out RPC calls to multiple different
| databases, not to mention dealing with inconsistencies, lack of
| atomicity, etc. etc. Spin up a specific reader of that database
| if there needs to be OLAP queries, or use a message bus. But keep
| your OLTP data within one database for as long as possible.
|
| You can break apart a stateless microservice, but there are few
| things as stagnant in the world of software than data. It will
| keep you nimble for new product features. The boxes that they
| offer on cloud vendors today for managed databases are giant!
| s_dev wrote:
| >Use One Big Database.
|
| It may be reasonable to have two databases e.g. a class a and
| class b for pci compliance. So context still deeply matters.
|
| Also having a dev DB with mock data and a live DB with real
| data is a common setup in many companies.
| belak wrote:
| This is absolutely true - when I was at Bitbucket (ages ago at
| this point) and we were having issues with our DB server
| (mostly due to scaling), almost everyone we talked to said "buy
| a bigger box until you can't any more" because of how complex
| (and indirectly expensive) the alternatives are - sharding and
| microservices both have a ton more failure points than a single
| large box.
|
| I'm sure they eventually moved off that single primary box, but
| for many years Bitbucket was run off 1 primary in each
| datacenter (with a failover), and a few read-only copies. If
| you're getting to the point where one database isn't enough,
| you're either doing something pretty weird, are working on a
| specific problem which needs a more complicated setup, or have
| grown to the point where investing in a microservice
| architecture starts to make sense.
| thayne wrote:
| One issue I've seen with this is that if you have a single,
| very large database, it can take a very, very long time to
| restore from backups. Or for that matter just taking backups.
|
| I'd be interested to know if anyone has a good solution for
| that.
| rszorness wrote:
| Try out pg_probackup. It works on database files directly.
| Restore is as fast as you can write on your ssd.
|
| I've setup a pgsql server with timescaledb recently.
| Continuing backup based on WAL takes seconds each hour and
| a complete restore takes 15 minutes for almost 300 GB of
| data because the 1 GBit connection to the backup server is
| the bottleneck.
| Svenstaro wrote:
| I found this approach pretty cool in that regard:
| https://github.com/pgbackrest/pgbackrest
| dsr_ wrote:
| Here's the way it works for, say, Postgresql:
|
| - you rsync or zfs send the database files from machine A
| to machine B. You would like the database to be off during
| this process, which will make it consistent. The big
| advantage of ZFS is that you can stop PG, snapshot the
| filesystem, and turn PG on again immediately, then send the
| snapshot. Machine B is now a cold backup replica of A. Your
| loss potential is limited to the time between backups.
|
| - after the previous step is completed, you arrange for
| machine A to send WAL files to machine B. It's well
| documented. You could use rsync or scp here. It happens
| automatically and frequently. Machine B is now a warm
| replica of A -- if you need to turn it on in an emergency,
| you will only have lost one WAL file's worth of changes.
|
| - after that step is completed, you give machine B
| credentials to login to A for live replication. Machine B
| is now a live, very slightly delayed read-only replica of
| A. Anything that A processes will be updated on B as soon
| as it is received.
|
| You can go further and arrange to load balance requests
| between read-only replicas, while sending the write
| requests to the primary; you can look at Citus (now open
| source) to add multi-primary clustering.
| hamandcheese wrote:
| Do you even have to stop Postgres if using ZFS snapshots?
| ZFS snapshots are atomic, so I'd expect that to be fine.
| If it wasn't fine, that would also mean Postgres couldn't
| handle power failure or other sudden failures.
| dsr_ wrote:
| You have choices.
|
| * shut down PG. Gain perfect consistency.
|
| * use pg_dump. Perfect consistency at the cost of a
| longer transaction. Gain portability for major version
| upgrades.
|
| * Don't shut down PG: here's what the manual says:
|
| However, a backup created in this way saves the database
| files in a state as if the database server was not
| properly shut down; therefore, when you start the
| database server on the backed-up data, it will think the
| previous server instance crashed and will replay the WAL
| log. This is not a problem; just be aware of it (and be
| sure to include the WAL files in your backup). You can
| perform a CHECKPOINT before taking the snapshot to reduce
| recovery time.
|
| * Midway: use SELECT pg_start_backup('label', false,
| false); and SELECT * FROM pg_stop_backup(false, true); to
| generate WAL files while you are running the backup, and
| add those to your backup.
| mgiampapa wrote:
| This isn't really a backup, it's redundancy which is good
| thing but not the same as a backup solution. You can't
| get out of a drop table production type event this way.
| hamandcheese wrote:
| If you stop at the first bullet point then you have a
| backup solution.
| dsr_ wrote:
| Precisely so.
| thayne wrote:
| It doesn't solve the problem that sending that snapshot
| to a backup location takes a long time.
| maxclark wrote:
| Going back 20 years with Oracle DB it was common to use
| "triple mirror" on storage to make a block level copy of
| the database. Lock the DB for changes, flush the logs,
| break the mirror. You now have a point in time copy of
| the database that could be mounted by a second system to
| create a tape backup, or as a recovery point to restore.
|
| It was the way to do it, and very easy to manage.
| Twisell wrote:
| The previous commenter was probably unaware of the
| various way to backup recent postgresql release.
|
| For what you describe a "point in time recovery" backup
| would probably be the more adequate flavor
| https://www.postgresql.org/docs/current/continuous-
| archiving...
|
| It was first release around 2010 and gained robustness
| with every release hence not everyone is aware of it.
|
| The for instance I don't think it's really required
| anymore to shutdown the database to do the initial sync
| if you use the proper tooling (for instance pg_basebackup
| if I remember correctly)
| mike_hearn wrote:
| Presumably it doesn't matter if you break your DB up into
| smaller DBs, you still have the same amount of data to back
| up no matter what. However, now you also have the problem
| of snapshot consistency to worry about.
|
| If you need to backup/restore just one set of tables, you
| can do that with a single DB server without taking the rest
| offline.
| thayne wrote:
| > you still have the same amount of data to back up no
| matter what
|
| But you can restore/back up the databases in parallel.
|
| > If you need to backup/restore just one set of tables,
| you can do that with a single DB server without taking
| the rest offline.
|
| I'm not aware of a good way to restore just a few tables
| from a full db backup. At least that doesn't require
| copying over all the data (because the backup is stored
| over the network, not on a local disk). And that may be
| desirable to recover from say a bug corrupting or
| deleting a customer's data.
| nick__m wrote:
| On mariadb you can tell the replica to enter into a
| snapshotable state[1] and take a simple lvm snapshot, tell
| the the database it's over, backup your snapshot somewhere
| else and finally delete the snapshot.
|
| 1) https://mariadb.com/kb/en/storage-snapshots-and-backup-
| stage...
| altdataseller wrote:
| What if your product simply stores a lot of data (ie a search
| engine) How is that weird?
| skeeter2020 wrote:
| This is not typically going to be stored in an ACID-
| compliant RDBMS, which is where the most common scaling
| problem occurs. Search engines, document stores, adtech,
| eventing, etc. are likely going to have a different storage
| mechanism where consistency isn't as important.
| rmbyrro wrote:
| a search engine won't need joins, but other things (ie text
| indexing) that can be split in a relatively easier way.
| belak wrote:
| That's fair - I added "are working on a specific problem
| which needs a more complicated setup" to my original
| comment as a nicer way of referring to edge cases like
| search engines. I still believe that 99% of applications
| would function perfectly fine with a single primary DB.
| zasdffaa wrote:
| Depends what you mean by a database I guess. I take it to
| mean an RDBMS.
|
| RDBMSs provide guarantees that web searching doesn't need.
| You can afford to lose a pieces of data, provide not-quite-
| perfect results for web stuff. It's just wrong for an
| RDBMS.
| altdataseller wrote:
| What if you are using the database as a system of record
| to index into a real search engine like Elasticsearch?
| For a product where you have tons of data to search from
| (ie text from web pages)
| IggleSniggle wrote:
| In regards to Elasticsearch, you basically opt-in to
| which behavior you want/need. You end up in the same
| place: potentially losing some data points or introducing
| some "fuzziness" to the results in exchange for speed.
| When you ask Elasticsearch to behave in a guaranteed
| atomic manner across all records, performing locks on
| data, you end up with similar constraints as in a RDBMS.
|
| Elasticsearch is for search.
|
| If you're asking about "what if you use an RDBMS as a
| pointer to Elasticsearch" then I guess I would ask: why
| would you do this? Elasticsearch can be used as a system
| of record. You could use an RDBMS over top of
| Elasticsearch without configuring Elasticsearch as a
| system of record, but then you would be lying when you
| refer to your RDBMS as a "system of record." It's not a
| "system of record" for your actual data, just a record of
| where pointers to actual data were at one point in time.
|
| I feel like I must be missing what you're suggesting
| here.
| altdataseller wrote:
| Having just an Elasticsearch index without also having
| the data in a primary store like a RDMS is an anti-
| pattern and not recommended by almost all experts.
| Whether you want to call it a "system of record", i wont
| argue semantics. But the point is, its recommended hacing
| your data in a primary store where you can index into
| elasticsearch.
| ladyattis wrote:
| At my current job we have four different databases so I concur
| with this assessment. I think it's okay to have some data in
| different DBs if they're significantly different like say the
| user login data could be in its own database. But anything that
| we do which is a combination of e-commerce and
| testing/certification I think they should be in one big
| database so I can do reasonable queries for information that we
| need. This doesn't include two other databases we have on-prem
| which one is a Salesforce setup and another is an internal
| application system that essentially marries Salesforce to that.
| It's a weird wild environment to navigate when adding features.
| jasonwatkinspdx wrote:
| A relative worked for a hedge fund that used this idea. They
| were a C#/MSSQL shop, so they just bought whatever was the
| biggest MSSQL server at the time, updating frequently. They
| said it was a huge advantage, where the limit in scale was more
| than offset by productivity.
|
| I think it's an underrated idea. There's a lot of people out
| there building a lot of complexity for datasets that in the end
| are less than 100 TB.
|
| But it also has limits. Infamously Twitter delayed going to a
| sharded architecture a bit too long, making it more of an ugly
| migration.
| manigandham wrote:
| Server hardware is so cheap and fast today that 99% of
| companies will never hit that limit in scale either.
| AtNightWeCode wrote:
| If you get your services right there is little or no
| communications between the services since a microservice should
| have all the data it needs in it's own store.
| HeavyStorm wrote:
| > they don't know how your distributed databases look, and
| oftentimes they really do not care
|
| Nor should they.
| markandrewj wrote:
| Just FYI, you can have one big database, without running it on
| one big server. As an example, databases like Cassandra are
| designed to be scaled horizontally (i.e. scale out, instead of
| scale up).
|
| https://cassandra.apache.org/_/cassandra-basics.html
| 1500100900 wrote:
| Cassandra may be great when you have to scale your database
| that you no longer develop significantly. The problem with
| this DB system is that you have to know all the queries
| before you can define the schema.
| threeseed wrote:
| > The problem with this DB system is that you have to know
| all the queries before you can define the schema
|
| Not true.
|
| You just need to optimise your schema if you want the best
| performance. Exactly the same as an RDBMS.
| mdasen wrote:
| There are trade-offs when you scale horizontally even if a
| database is designed for it. For example, DataStax's Storage
| Attached Indexes or Cassandra's hidden-table secondary
| indexing allow for indexing on columns that aren't part of
| the clustering/partitioning, but when you're reading you're
| going to have to ask all the nodes to look for something if
| you aren't including a clustering/partitioning criteria to
| narrow it down.
|
| You've now scaled out, but you now have to ask each node when
| searching by secondary index. If you're asking every node for
| your queries, you haven't really scaled horizontally. You've
| just increased complexity.
|
| Now, maybe 95% of your queries can be handled with a
| clustering key and you just need secondary indexes to handle
| 5% of your stuff. In that case, Cassandra does offer an easy
| way to handle that last 5%. However, it can be problematic if
| people take shortcuts too much and you end up putting too
| much load on the cluster. You're also putting your latency
| for reads at the highest latency of all the machines in your
| cluster. For example, if you have 100 machines in your
| cluster with a mean response time of 2ms and a 99th
| percentile response time of 150ms, you're potentially going
| to be providing a bad experience to users waiting on that
| last box on secondary index queries.
|
| This isn't to say that Cassandra isn't useful - Cassandra has
| been making some good decisions to balance the problems
| engineers face. However, it does come with trade-offs when
| you distribute the data. When you have a well-defined
| problem, it's a lot easier to design your data for efficient
| querying and partitioning. When you're trying to figure
| things out, the flexibility of a single machine and much
| cheaper secondary index queries can be important - and if you
| hit a massive scale, you figure out how you want to partition
| it then.
| markandrewj wrote:
| Cassandra was just an example, but most databases can be
| scaled either vertically or horizontally via sharding. You
| are right if misconfigured performance can be hindered, but
| this is also true for a database which is being scaled
| vertically. Generally speaking you will get better
| performance if you have a large dataset by growing
| horizontally then you would by growing vertically.
|
| https://stackoverflow.blog/2022/03/14/how-sharding-a-
| databas...
| robertlagrant wrote:
| > Your product asks will consistently want to combine these
| data sources (they don't know how your distributed databases
| look, and oftentimes they really do not care).
|
| I'm not sure how to parse this. What should "asks" be?
| cfors wrote:
| The feature requests (asks) that product wants to build -
| sorry for the confusion there.
| delecti wrote:
| The phrase "Your product asks will consistently " can be de-
| abbreviated to "product owners/product managers you work with
| will consistently request".
| wefarrell wrote:
| "Your product asks will consistently want to combine these data
| sources (they don't know how your distributed databases look,
| and oftentimes they really do not care)."
|
| This isn't a problem if state is properly divided along the
| proper business domain and the people who need to access the
| data have access to it. In fact many use cases require it -
| publicly traded companies can't let anyone in the organization
| access financial info and healthcare companies can't let anyone
| access patient data. And of course are performance concerns as
| well if anyone in the organization can arbitrarily execute
| queries on any of the organization's data.
|
| I would say YAGNI applies to data segregation as well and
| separations shouldn't be introduced until they are necessary.
| Mavvie wrote:
| "combine these data sources" doesn't necessarily mean data
| analytics. Just as an example, it could be something like
| "show a badge if it's the user's birthday", which if you had
| a separate microservice for birthdays would be much harder
| than joining a new table.
| wefarrell wrote:
| Replace "people" with "features" and my comment still
| holds. As software, features, and organizations become more
| complex the core feature data becomes a smaller and smaller
| proportion of the overall state and that's when
| microservices and separate data stores become necessary.
| lmm wrote:
| If you do this then you'll have the hardest possible migration
| when the time comes to split it up. It will take you literally
| years, perhaps even a decade.
|
| Shard your datastore from day 1, get your dataflow right so
| that you don't need atomicity, and it'll be painless and scale
| effortlessly. More importantly, you won't be able to paper over
| crappy dataflow. It's like using proper types in your code:
| yes, it takes a bit more effort up-front compared to just
| YOLOing everything, but it pays dividends pretty quickly.
| riku_iki wrote:
| > Shard your datastore from day 1
|
| what about using something like cocroach from day 1?
| lmm wrote:
| I don't know the characteristics of bikesheddb's upstream
| in detail (if there's ever a production-quality release of
| bikesheddb I'll take another look), but in general using
| something that can scale horizontally (like Cassandra or
| Riak, or even - for all its downsides - MongoDB) is a great
| approach - I guess it's a question of terminology whether
| you call that "sharding" or not. Personally I prefer that
| kind of datastore over an SQL database.
| riku_iki wrote:
| > over an SQL database
|
| it is actually distributed SQL Db with auto sharding,
| their goal is to be SQL compatible with Postgres.
| Rantenki wrote:
| This is true IFF you get to the point where you have to split
| up.
|
| I know we're all hot and bothered about getting our apps to
| scale up to be the next unicorn, but most apps never need to
| scale past the limit of a single very high-performance
| database. For most people, this single huge DB is sufficient.
|
| Also, for many (maybe even most) applications, designated
| outages for maintenance are not only acceptable, but industry
| standard. Banks have had, and continue to have designated
| outages all the time, usually on weekends when the impact is
| reduced.
|
| Sure, what I just wrote is bad advice for mega-scale SaaS
| offerings with millions of concurrent users, but most of us
| aren't building those, as much as we would like to pretend
| that we are.
|
| I will say that TWO of those servers, with some form of
| synchronous replication, and point in time snapshots, are
| probably a better choice, but that's hair-splitting.
|
| (and I am a dyed in the wool microservices, scale-out Amazon
| WS fanboi).
| lmm wrote:
| > I know we're all hot and bothered about getting our apps
| to scale up to be the next unicorn, but most apps never
| need to scale past the limit of a single very high-
| performance database. For most people, this single huge DB
| is sufficient.
|
| True _if_ the reliability is good enough. I agree that many
| organisations will never get to the scale where they need
| it as a performance /data size measure, but you often will
| grow past the reliability level that's possible to achieve
| on a single node. And it's worth saying that the various
| things that people do to mitigate these problems - read
| replicas, WAL shipping, and all that - can have a pretty
| high operational cost. Whereas if you just slap in a
| horizontal autoscaling datastore with true master-master HA
| from day 1, you bypass all of that trouble and just never
| worry about it.
|
| > Also, for many (maybe even most) applications, designated
| outages for maintenance are not only acceptable, but
| industry standard. Banks have had, and continue to have
| designated outages all the time, usually on weekends when
| the impact is reduced.
|
| IME those are a minority of applications. Anything
| consumer-facing, you absolutely do lose out (and even if
| it's not a serious issue in itself, it makes you look bush-
| league) if someone can't log into your system at 5AM on
| Sunday. Even if you're B2B, if your clients are serving
| customers then they want you to be online whenever their
| customers are.
| johnbellone wrote:
| I agree with this sentiment but it is often misunderstood as a
| means to force everything into a single database schema. More
| people need to learn about logically separating schemas with
| their database servers!
| clairity wrote:
| > "Use One Big Database."
|
| yah, this is something i learned when designing my first server
| stack (using sun machines) for a real business back during the
| dot-com boom/bust era. our single database server was the
| beefiest machine by far in the stack, 5U in the rack (we also
| had a hot backup), while the other servers were 1U or 2U in
| size. most of that girth was for memory and disk space, with
| decent but not the fastest processors.
|
| one big db server with a hot backup was our best tradeoff for
| price, performance, and reliability. part of the mitigation was
| that the other servers could be scaled horizontally to
| compensate for a decent amount of growth without needing to
| scale the db horizontally.
| FpUser wrote:
| >"Use One Big Database."
|
| I do, it is running on the same big (relatively) server as my
| native C++ backend talking to the database. The performance
| smokes your standard cloudy setup big time. Serving thousand
| requests per second on 16 core without breaking sweat. I am all
| for monoliths running on real no cloudy hardware. As long as
| the business scale is reasonable and does not approach FAANG
| (like for 90% of the businesses) this solution is superior to
| everything else money, maintenance, development time wise.
| BenoitEssiambre wrote:
| I'm glad this is becoming conventional wisdom. I used to argue
| this in these pages a few years ago and would get downvoted
| below the posts telling people to split everything into
| microservices separated by queues (although I suppose it's
| making me lose my competitive advantage when everyone else is
| building lean and mean infrastructure too).
|
| In my mind, reasons involve keeping transactional integrity,
| ACID compliance, better error propagation, avoiding the
| hundreds of impossible to solve roadblocks of distributed
| systems (https://groups.csail.mit.edu/tds/papers/Lynch/MIT-LCS-
| TM-394...).
|
| But also it is about pushing the limits of what is physically
| possible in computing. As Admiral Grace Hopper would point out
| (https://www.youtube.com/watch?v=9eyFDBPk4Yw ) doing distance
| over network wires involves hard latency constraints, not to
| mention dealing with congestions over these wires.
|
| Physical efficiency is about keeping data close to where it's
| processed. Monoliths can make much better use of L1, L2, L3,
| and ram caches than distributed systems for speedups often in
| the order of 100X to 1000X.
|
| Sure it's easier to throw more hardware at the problem with
| distributed systems but the downsides are significant so be
| sure you really need it.
|
| Now there is a corollary to using monoliths. Since you only
| have one db, that db should be treated as somewhat sacred, you
| want to avoid wasting resources inside it. This means being a
| bit more careful about how you are storing things, using the
| smallest data structures, normalizing when you can etc. This is
| not to save disk, disk is cheap. This is to make efficient use
| of L1,L2,L3 and ram.
|
| I've seen boolean true or false values saved as large JSON
| documents. {"usersetting1": true, "usersetting2":fasle
| "setting1name":"name" etc.} with 10 bits of data ending up as a
| 1k JSON document. Avoid this! Storing documents means, the
| keys, the full table schema is in every row. It has its uses
| but if you can predefine your schema and use the smallest types
| needed, you are gaining much performance mostly through much
| higher cache efficiency!
| Swizec wrote:
| > I'm glad this is becoming conventional wisdom
|
| My hunch is that computers caught up. Back in the early
| 2000's horizontal scaling was the only way. You simply
| couldn't handle even reasonably mediocre loads on a single
| machine.
|
| As computing becomes cheaper, horizontal scaling is starting
| to look more and more like unnecessary complexity for even
| surprisingly large/popular apps.
|
| I mean you can buy a consumer off-the-shelf machine with
| 1.5TB of memory these days. 20 years ago, when microservices
| started gaining popularity, 1.5TB RAM in a single machine was
| basically unimaginable.
| FpUser wrote:
| >"I'm glad this is becoming conventional wisdom. "
|
| Yup, this is what I've always done and it works wonders.
| Since I do not have bosses, just a clients I do not give a
| flying fuck about latest fashion and do what actually makes
| sense for me and said clients.
| tsmarsh wrote:
| 'over the wire' is less obvious than it used to be.
|
| If you're in k8s pod, those calls are really kernel calls.
| Sure you're serializing and process switching where you could
| be just making a method call, but we had to do something.
|
| I'm seeing less 'balls of mud' with microservices. Thats not
| zero balls of mud. But its not a given for almost every code
| base I wander into.
| threeseed wrote:
| > I'm glad this is becoming conventional wisdom
|
| It's not though. You're just seeing the most popular opinion
| on HN.
|
| In reality it is nuanced like most real-world tech decisions
| are. Some use cases necessitate a distributed or sharded
| database, some work better with a single server and some are
| simply going to outsource the problem to some vendor.
| rbanffy wrote:
| > Use One Big Database.
|
| I emphatically disagree.
|
| I've seen this evolve into tightly coupled microservices that
| could be deployed independently in theory, but required
| exquisite coordination to work.
|
| If you want them to be on a single server, that's fine, but
| having multiple databases or schemas will help enforce
| separation.
|
| And, if you need one single place for analytics, push changes
| to that space asynchronously.
|
| Having said that, I've seen silly optimizations being employed
| that make sense when you are Twitter, and to nobody else. Slice
| services up to the point they still do something meaningful in
| terms of the solution and avoid going any further.
| marcosdumay wrote:
| Yeah... Dividing your work into microservices while your data
| is in an interdependent database doesn't lead to great
| results.
|
| If you are creating microservices, you must segment them all
| the way through.
| zmmmmm wrote:
| I have to say I disagree with this ... you can only
| separate them if they are really, truly independent. Trying
| to separate things that are actually coupled will quickly
| take you on a path to hell.
|
| The problem here is that most of the microservice
| architecture divisions are going to be driven by Conway's
| law, not what makes any technical sense. So if you insist
| on separate databases per microservice, you're at high risk
| of ending up with massive amounts of duplicated and
| incoherent state models and half the work of the team
| devoted to synchronizing between them.
|
| I quite like an architecture where services are split
| _except_ the database, which is considered a service of its
| own.
| Joeri wrote:
| I have done both models. My previous job we had a monolith on
| top of a 1200 table database. Now I work in an ecosystem of
| 400 microservices, most with their own database.
|
| What it fundamentally boils down to is that your org chart
| determines your architecture. We had a single team in charge
| of the monolith, and it was ok, and then we wanted to add
| teams and it broke down. On the microservices architecture,
| we have many teams, which can work independently quite well,
| until there is a big project that needs coordinated changes,
| and then the fun starts.
|
| Like always there is no advice that is absolutely right.
| Monoliths, microservices, function stores. One big server vs
| kubernetes. Any of those things become the right answer in
| the right context.
|
| Although I'm still in favor of starting with a modular
| monolith and splitting off services when it becomes apparent
| they need to change at a different pace from the main body.
| That is right in most contexts I think.
| zmmmmm wrote:
| > splitting off services when it becomes apparent they need
| to change at a different pace from the main body
|
| yes - this seems to get lost, but the microservice argument
| is no different to the bigger picture software design in
| general. When things change independently, separate and
| decouple them. It works in code and so there is no reason
| it shouldn't apply at the infrastructure layer.
|
| If I am responsible for the FooBar and need to update it
| once a week and know I am not going to break the FroggleBot
| or the Bazlibee which are run by separate teams who don't
| care about my needs and update their code once a year, hell
| yeah I want to develop and deploy it as a separate service.
| manigandham wrote:
| There's no need for "microservices" in the first place then.
| That's just logical groupings of functionality that can be
| separate as classes, namespaces or other modules without
| being entirely separate processes with a network boundary.
| danpalmer wrote:
| To clarify the advice, at least how I believe it should be
| done...
|
| Use One Big Database Server...
|
| ... and on it, use one software database per application.
|
| For example, one Postgres server can host many databases that
| are mostly* independent from each other. Each application or
| service should have its own database and be unaware of the
| others, communicating with them via the services if
| necessary. This makes splitting up into multiple database
| servers fairly straightforward if needed later. In reality
| most businesses will have a long tail of tiny databases that
| can all be on the same server, with only bigger databases
| needing dedicated resources.
|
| *you can have interdependencies when you're using deep
| features sometimes, but in an application-first development
| model I'd advise against this.
| goodoldneon wrote:
| OP mentioned joining, so they were definitely talking about
| a single database
| danpalmer wrote:
| You can still do a ton of joining.
|
| I'd start with a monolith, that's a single app, single
| database, single point of ownership of the data model,
| and a ton of joins.
|
| Then as services are added after the monolith they can
| still use the main database for ease of infra
| development, simpler backups and replication, etc. but
| those wouldn't be able to be joined because they're
| cross-service.
| [deleted]
| riquito wrote:
| Not suggesting it, but for the sake of knowledge you can
| join tables living in different databases, as long as
| they are on the same server (e.g. mysql, postgresql, SQL
| server supports it - doesn't necessarily come for free)
| yellowapple wrote:
| In PostgreSQL's case, it doesn't even need to be the same
| server: https://www.postgresql.org/docs/current/postgres-
| fdw.html
| giardini wrote:
| _" >Use One Big Database Server...
|
| ... and on it, use one software database per
| application.<"_
|
| FWIW that is how it is usually is done(and has been done
| for decades) on mainframes (IBM & UNISYS).
|
| -----------------------
|
| _" Plus ca change, plus c'est la meme chose."_
|
| English: _" the more things change, the more they stay the
| same."_
|
| - old French expression.
| ryanisnan wrote:
| Definitely use a big database, until you can't. My advice to
| anyone starting with a relational data store is to use a proxy
| from day 1 (or some point before adding something like that
| becomes scary).
|
| When you need to start sharding your database, having a proxy
| is like having a super power.
| chromatin wrote:
| Are there postgres proxies that can specifically facilitate
| sharding / partitioning later?
| _ben_ wrote:
| Disclaimer: I am the founder of PolyScale [1].
|
| We see both use cases: single large database vs multiple
| small, decoupled. I agree with the sentiment that a large
| database offer simplicity, until access patterns change.
|
| We focus on distributing database data to the edge using
| caching. Typically this eliminates read-replicas and a lot of
| the headache that goes with app logic rewrites or scaling
| "One Big Database".
|
| [1] https://www.polyscale.ai/
| bartread wrote:
| Not to mention, backups, restores, and disaster recovery are so
| much easier with One Big Database(tm).
| 1500100900 wrote:
| How is backup restoration any easier if your whole PostgreSQL
| cluster goes back in time when you only wanted to rewind that
| one tenant?
| fleddr wrote:
| Your scenario is data recovery, not backup restoration.
| Wildly different things.
| wizofaus wrote:
| Surely having separate DBs all sit on the One Big Server is
| preferable in many cases. For cases where you really to extract
| large amounts of data that is derived from multiple DBs,
| there's no real harm in having some cross-DB joins defined in
| views somewhere. If there are sensible logical ways to break a
| monolithic service into component stand-alone services, and
| good business reasons to do (or it's already been designed that
| way), then having each talk to their own DB on a shared server
| should be able to scale pretty well.
| abraae wrote:
| Another area for consolidation is auth. Use one giant keycloak,
| with individual realms for every one of the individual apps you
| are running. Your keycloak is back ended by your one giant
| database.
| doctor_eval wrote:
| I agree that 1BDB is a good idea, but having one ginormous
| schema has its own costs. So I still think data should be
| logically partitioned between applications/microservices - in
| PG terms, one "cluster" but multiple "databases".
|
| We solved the problem of collecting data from the various
| databases for end users by having a GraphQL layer which could
| integrate all the data sources. This turned out to be
| absolutely awesome. You could also do something similar using
| FDW. The effort was not significant relative to the size of the
| application.
|
| The benefits of this architecture were manifold but one of the
| main ones is that it reduces the complexity of each individual
| database, which dramatically improved performance, and we knew
| that if we needed more performance we could pull those
| individual databases out into their own machine.
| throwaway894345 wrote:
| I'm pretty happy to pay a cloud provider to deal with managing
| databases and hosts. It doesn't seem to cause me much grief,
| and maybe I could do it better but my time is worth more than
| our RDS bill. I can always come back and Do It Myself if I run
| out of more valuable things to work on.
|
| Similarly, paying for EKS or GKE or the higher-level container
| offerings seems like a much better place to spend my resources
| than figuring out how to run infrastructure on bare VMs.
|
| Every time I've seen a normal-sized firm running on VMs, they
| have one team who is responsible for managing the VMs, and
| _either_ that team is expecting a Docker image artifact or they
| 're expecting to manage the environment in which the
| application runs (making sure all of the application
| dependencies are installed in the environment, etc) which
| typically implies a lot of coordination between the ops team
| and the application teams (especially regarding deployment).
| I've never seen that work as smoothly as deploying to
| ECS/EKS/whatever and letting the ops team work on automating
| things at a higher level of abstraction (automatic certificate
| rotation, automatic DNS, etc).
|
| That said, I've never tried the "one big server" approach,
| although I wouldn't want to run fewer than 3 replicas, and I
| would want reproducibility so I know I can stand up the exact
| same thing if one of the replicas go down as well as for
| higher-fidelity testing in lower environments. And since we
| have that kind of reproducibility, there's no significant
| difference in operational work between running fewer larger
| servers and more smaller servers.
| cogman10 wrote:
| > Use One Big Database.
|
| > Seriously. If you are a backend engineer, nothing is worse
| than breaking up your data into self contained service
| databases, where everything is passed over Rest/RPC. Your
| product asks will consistently want to combine these data
| sources (they don't know how your distributed databases look,
| and oftentimes they really do not care).
|
| This works until it doesn't and then you land in the position
| my company finds itself in where our databases can't handle the
| load we generate. We can't get bigger or faster hardware
| because we are using the biggest and fastest hardware you can
| buy.
|
| Distributed systems suck, sure, and they make querying cross
| systems a nightmare. However, by giving those aspects up, what
| you gain is the ability to add new services, features, etc
| without running into scotty yelling "She can't take much more
| of it!"
|
| Once you get to that point, it becomes SUPER hard to start
| splitting things out. All the sudden you have 10000 "just a one
| off" queries against several domains that are broken by trying
| carve out a domain into a single owner.
| Flow wrote:
| Do you have Spectre countermeasures active in the kernel of
| that machine?
| runjake wrote:
| What does it matter, in this context?
|
| If it's about bare metal vs. virtual machines, know that
| Spectre affects virtual machines, too.
| rkagerer wrote:
| I think they are implying disabling them (if on) could
| squeeze you out a bit more performance.
| kedean wrote:
| Many databases can be distributed horizontally if you put in
| the extra work, would that not solve the problems you're
| describing? MariaDB supports at least two forms of
| replication (one master/replica and one multi-master), for
| example, and if you're willing to shell out for a MaxScale
| license it's a breeze to load balance it and have automatic
| failover.
| hot_gril wrote:
| Not without big compromises and a lot of extra work. If you
| want a truly horizontally scaling database, and not just
| multi-master for the purpose of availability, a good
| example solution is Spanner. You have to lay your data out
| differently, you're very restricted in what kinds of
| queries you can make, etc.
| kbenson wrote:
| For what it's worth, I think distributing horizontally is
| also much easier if you're already limited your database to
| specific concerns by splitting it up in different ways.
| Sharding a very large database with lots of data deeply
| linked sounds like much more of a pain than something with
| a limited scope that isn't too deeply linked with data
| because it's already in other databases.
|
| To some degree, sharding brings in a lot of the same
| complexities as different microservices with their own data
| store, in that you sometimes have to query across multiple
| sources and combine in the client.
| throwaway9870 wrote:
| How do you use one big database when some of your info is stuck
| in an ERP system?
| marcosdumay wrote:
| > Use One Big Database
|
| Yep, with a passive replica or online (log) backup.
|
| Keeping things centralized can reduce your hardware requirement
| by multiple orders of magnitude. The one huge exception is a
| traditional web service, those scale very well, so you may not
| even want to get big servers for them (until you need them).
| Closi wrote:
| Breaking apart a stateless microservice and then basing it
| around a giant single monolithic database is pretty pointless -
| at that stage you might as well just build a monolith and get
| on with it as every microservice is tightly coupled to the db.
| adrianmsmith wrote:
| That's true, unless you need
|
| (1) Different programming languages e.g. you're written your
| app in Java but now you need to do something for which the
| perfect Python library is available.
|
| (2) Different parts of your software need different types of
| hardware. Maybe one part needs a huge amount of RAM for a
| cache, but other parts are just a web server. It'd be a shame
| to have to buy huge amounts of RAM for every server.
| Splitting the software up and deploying the different parts
| on different machines can be a win here.
|
| I reckon the average startup doesn't need any of that, not
| suggesting that monoliths aren't the way to go 90% of the
| time. But if you do need these things, you can still go the
| microservices route, but it still makes sense to stick to a
| single database if at all possible, for consistency and
| easier JOINs for ad-hoc queries, etc.
| Closi wrote:
| These are both true - but neither requires service-
| oriented-architecture.
|
| You can split up your applicaiton into chunks that are
| deployed on seperate hardware, and use different languages,
| without composing your whole architecture into
| microservices.
|
| A monolith can still have a seperate database server and a
| web server, or even many different functions split across
| different servers which are horizontally scalable, and be
| written in both java and python.
|
| Monoliths have had seperate database servers since the 80s
| (and probably before that!). In fact, part of these
| applications defining characteristics at the enterprise
| level is that they often shared one big central database,
| as often they were composed of lots of small applications
| that would all make changes to the central database, which
| would often end up in a right mess of software that was
| incredibly hard to de-pick! (And all the software writing
| to that database would, as you described, be written in
| lots of different languages). People would then come along
| and cake these central databases full of stored procedures
| to make magic changes to implement functionality that
| wasn't available in the legacy applications that they can't
| change because of the risk and then you have even more of a
| mess!
| AtNightWeCode wrote:
| Agree. Nothing worse than having different programs changing
| data in the same database. The database should not be an
| integration point between services.
| jethro_tell wrote:
| if you have multiple micro services updating the database
| you need to have a database access layer service as well.
|
| there's some real value with abstraction and microservices
| but you can try to run them against a monolithic database
| service
| bergkvist wrote:
| No amount of abstraction is going to save you from the
| problem of 2 processes manipulating the same state
| machine.
| noduerme wrote:
| I disagree. Suppose you have an enormous DB that's mainly
| written to by workers inside a company, but has to be widely
| read by the public outside. You want your internal services
| on machines with extra layers of security, perhaps only
| accessible by VPN. Your external facing microservices have
| other things like e.g. user authentication (which may be tied
| to a different monolithic database), and you want to put them
| closer to users, spread out in various data centers or on the
| edge. Even if they're all bound to one database, there's a
| lot to recommend keeping them on separate, light cheap
| servers that are built for http traffic and occasional DB
| reads. And even more so if those services do a lot of
| processing on the data that's accessed, such as building up
| reports, etc.
| Closi wrote:
| You've not really built microservices then in the purest
| sense though - i.e. all the microservices aren't
| independently deployable components.
|
| I'm not saying what you are proposing isn't a perfectly
| valid architectural approach - it's just usually considered
| an anti-pattern with microservices (because if all the
| services depend on a single monolith, and a change to a
| microservice functionality also mandates a change to the
| shared monolith which then can impact/break the other
| services, we have lost the 'independence' benefit that
| microservices supposedly gives us where changes to one
| microservice does not impact another).
|
| Monoliths can still have layers to support business logic
| that are seperate to the database anyway.
| roflyear wrote:
| Absolutely. I know someone who considers "different domains"
| (as in web domains) to count as a microservice!
|
| What is the point of that? it doesn't add anything. Just more
| shit to remember and get right (and get wrong!)
| manigandham wrote:
| Why would you break apart a microservice? Any why do you need
| to use/split into microservices anyway?
|
| 99% of apps are best fit as monolithic apps _and_ databases
| and should focus on business value rather than scale they 'll
| never see.
| Gigachad wrote:
| Where I work we are looking at it because we are starting
| to exceed the capabilities of one big database. Several
| tables are reaching the billions of rows mark and just
| plain inserts are starting to become too much.
| nicoburns wrote:
| Yeah, the at the billions of rows mark it definitely
| makes sense to start looking at splitting things up. On
| the other hand, the company I worked for split things up
| from the start, and when I joined - 4 years down the line
| - their biggest table had something like 50k rows, but
| their query performance was awful (tens of seconds in
| cases) because the data was so spread out.
| threeseed wrote:
| > 99% of apps are best fit as monolithic apps and databases
| and should focus on business value rather than scale
| they'll never see
|
| You incorrectly assume that 99% of apps are building these
| architectures for scalability reasons.
|
| When in reality it's far more for development productivity,
| security, use of third party services, different languages
| etc.
| jethro_tell wrote:
| reliability, sometimes sharding just means you don't have
| to get up in the middle of the night.
| Closi wrote:
| Totally agree.
|
| I guess I just don't see the value in having a monolith
| made up of microservices - you might as well just build a
| monolith if you are going down that route.
|
| And if your application fits the microservices pattern
| better, then you might as well go down the microservices
| pattern properly and not give them a big central DB.
| adgjlsfhk1 wrote:
| The one advantage of microservice on a single database
| model is that it lets you test the independent components
| much more easily while avoiding the complexity of
| database sharding.
| [deleted]
| radu_floricica wrote:
| To note that quite a bit of the performance problems come
| when writing stuff. You can get away with A LOT if you accept
| 1. the current service doesn't do (much) writing and 2. it
| can live with slightly old data. Which I think covers 90% of
| use cases.
|
| So you can end up with those services living on separate
| machines and connecting to read only db replicas, for
| virtually limitless scalability. And when it realizes it
| needs to do an update, it either switches the db connection
| to a master, or it forwards the whole request to another
| instance connected to a master db.
| cfors wrote:
| No disagreement here. I love a good monolith.
| Guid_NewGuid wrote:
| I think a strong test a lot of "let's use Google scale
| architecture for our MVP" advocates fail is: can your
| architecture support a performant paginated list with dynamic
| sort, filter and search where eventual consistency isn't
| acceptable?
|
| Pretty much every CRUD app needs this at some point and if
| every join needs a network call your app is going to suck to
| use and suck to develop.
| SkyPuncher wrote:
| > Pretty much every CRUD app needs this at some point and if
| every join needs a network call your app is going to suck to
| use and suck to develop.
|
| _at some point_ is the key word here.
|
| Most startups (and businesses) can likely get away with this
| well into Series A or Series B territory.
| threeseed wrote:
| > if every join needs a network call your app is going to
| suck to use and suck to develop.
|
| And yet developers do this every single day without any
| issue.
|
| It is bad practice to have your authentication database be
| the same as your app database. Or you have data coming from
| SaaS products, third party APIs or a cloud service. Or even
| simply another service in your stack. And with complex
| schemas often it's far easier to do that join in your
| application layer.
|
| All of these require a network call and join.
| mhoad wrote:
| I've found the following resource invaluable for designing
| and creating "cloud native" APIs where I can tackle that kind
| of thing from the very start without a huge amount of hassle
| https://google.aip.dev/general
|
| The patterns section covers all of this and more
| gnat wrote:
| This is a great resource but the RFC-style documentation
| says what you SHOULD and MUST do, not HOW to do it ...
| lmm wrote:
| I don't believe you. Eventual consistency is how the real
| world works, what possible use case is there where it
| wouldn't be acceptable? Even if you somehow made the display
| widget part of the database, you can't make the reader's
| eyeballs ACID-compliant.
| skyde wrote:
| thanks a lot for this comment. I will borrow this as an
| interview question :)
| cdkmoose wrote:
| >>(they don't know how your distributed databases look, and
| oftentimes they really do not care)
|
| Nor should they, it's the engineer's/team's job to provide the
| database layer to them with high levels of service without them
| having to know the details
| z3t4 wrote:
| The rule is: Keep related data together. Exceptions are:
| Different customers (usually don't require each others data)
| can be isolated. And if the database become the bottleneck you
| can separate unrelated services.
| bebrws wrote:
| Someone call Brahm
| notacoward wrote:
| At various points in my career, I worked on Very Big Machines and
| on Swarms Of Tiny Machines (relative to the technology of their
| respective times). Both kind of sucked. Different reasons, but
| sucked nonetheless. I've come to believe that the best approach
| is generally somewhere in the middle - enough servers to ensure a
| sufficient level of protection against failure, _but no more_ to
| minimize coordination costs and data movement. Even then there
| are exceptions. The key is _don 't run blindly toward the
| extremes_. Your utility function is probably bell shaped, so you
| need to build at least a rudimentary model to explore the problem
| space and find the right balance.
| mamcx wrote:
| Yes, totally.
|
| Among the setups the one that I think is _the golden_ is BIG Db
| Server, 1-4 front-end(web /api/cache) servers. Off-hand the
| backups and CDN.
|
| That is.
| rcarmo wrote:
| I once fired up an Azure instance with 4TB of RAM and hundreds of
| cores for a performance benchmark.
|
| htop felt incredibly roomy, and I couldn't help thin how my three
| previous projects would fit in with room to spare (albeit lacking
| redundancy, of course).
| gregmac wrote:
| > However, cloud providers have often had global outages in the
| past, and there is no reason to assume that cloud datacenters
| will be down any less often than your individual servers.
|
| A nice thing about being in a big provider is when they go down a
| massive portion of the internet goes down, and it makes news
| headlines. Users are much less likely to complain about _your_
| service being down when it 's clear you're just caught up in the
| global outage that's affecting 10 other things they use.
| arwhatever wrote:
| When migrating from [no-name CRM] to [big-name CRM] at a recent
| job, the manager pointed out that when [big-name CRM] goes
| down, it's in the Wall Street Journal, and when [no-name] goes
| down, it's hard to get their own Support Team to care!
| ramesh31 wrote:
| Nobody ever got fired for buying IBM!
| notjustanymike wrote:
| We may need to update this one, I would definitely fire
| someone today for buying IBM.
| kkielhofner wrote:
| Nobody ever got fired for buying AWS!
| lanstin wrote:
| The AWS people now are just like the IBM people in the
| 80s - mastering a complex and not standards based array
| of products and optional product add-ons. The internet
| solutions were open and free for a few decades and now
| it's AWS SNADS I mean AWS load balancers and edge
| networks.
| namose wrote:
| AWS services are usually based on standards anyway. If
| you use an architecturally sound approach to AWS you
| could learn to develop for GCP or Azure pretty easily.
| riku_iki wrote:
| that's funny, since IBM is actually promoting one very fat
| and reliable server.
| dtparr wrote:
| These days we just call it licensing Red Hat.
| ustolemyname wrote:
| This has given me a brilliant idea: deferring maintenance
| downtime until some larger user-visible service is down.
|
| This is terrible for many reasons, but I wouldn't be surprised
| to hear someone has done this.
| gorjusborg wrote:
| Ah yes, the 'who cut the cheese?' maintenance window.
| pdpi wrote:
| Another advantage is that the third-party services you depend
| on are also likely to be on one of the big providers, so it's
| one less point of failure.
| hsn915 wrote:
| No. Your users have no idea that you rely on AWS (they don't
| even know what it is), and they don't think of it as a valid or
| reasonable excuse as to why your service is down.
| andrepew wrote:
| This is a huge one -- value in outsourcing blame. If you're
| down because of a major provider outage in the news, you're
| viewed more as a victim of a natural disaster rather than
| someone to be blamed.
| oceanplexian wrote:
| I hear this repeated so many times at my workplace, and it's
| so totally and completely uninformed.
|
| Customers who have invested millions of dollars into making
| their stack multi-region, multi-cloud, or multi-datacenter
| aren't going to calmly accept the excuse that "AWS Went Down"
| when you can't deliver the services you contractually agreed
| to deliver. There are industries out there where having your
| service casually go down a few times a year is totally
| unacceptable (Healthcare, Government, Finance, etc). I worked
| adjacent to a department that did online retail a while ago
| and even an hour of outage would lose us $1M+ in business.
| darkr wrote:
| > Customers who have invested millions of dollars > ... >
| an hour of outage would lose us $1M+ in business
|
| Given (excluding us-east-1) you're looking at maybe an hour
| a year on average of regional outage, sounds like best case
| break even on that investment?
| oceanplexian wrote:
| I'm going to say that an hour a year is wildly
| optimistic. But even then, that puts you at 4 nines
| (99.99%) which is comparatively awful, consider that an
| old fashioned telephone using technology from the 1970s
| will achieve on average, 5 9's of reliability, or 5.26
| minutes of downtime per year, and that most IT shops
| operating their own infrastructure contractually expect 5
| 9's from even fairly average datacenters and transit
| providers.
| nicoburns wrote:
| I was amused when I joined my current company to find
| that our contracts only stipulate one 9 of reliability
| (98%). So ~30 mins a day or ~14 hours a month is
| permissible.
| rapind wrote:
| I wonder if the aggregate outage time from misconfigured
| and over-architected high availability services is greater
| than the average AWS outage per year.
|
| Similar to security, the last few 9s of availability come
| at a heavily increasing (log) complexity / price. The
| cutoff will vary case by case, and I'm sure the decision on
| how many 9s you need is often irrational (CEO says it can
| never go down! People need their pet food delivered on
| time!).
| mahidhar wrote:
| Agreed. Recently I was discussing the same point with a non-
| technical friend who was explaining that his CTO had decided
| to move from Digital Ocean to AWS, after DO experienced some
| outage. Apparently the CEO is furious at him and has assumed
| that DO are the worst service provider because their services
| were down for almost an entire business day. The CTO probably
| knows that AWS could also fail in a similar fashion, but by
| moving to AWS it becomes more or less an Act of God type of
| situation and he can wash his hands of it.
| tjoff wrote:
| This seems like a recently popular exaggeration, I'd wager no
| one but a select few in the HN-bubble actually cares.
|
| You will primarily be judged by how much of an inconvenience
| the outage was to every individual.
|
| The best you can hope for is that the local ISP gets the
| blame, but honestly. It can't be more than a rounding error
| in the end.
| treis wrote:
| I think it's more of a shield against upper management. AWS
| going down is treated like an act of god rendering everyone
| blameless. But if it's your one big server that goes down
| then it's your fault.
| phkahler wrote:
| >> AWS going down is treated like an act of god rendering
| everyone blameless.
|
| Someone decided to use AWS, so there is blame to go
| around. I'm not saying if that blame is warranted or not,
| just that it sounds like a valid thing to say for people
| who want to blame someone.
| flatiron wrote:
| "Nobody gets fired for using aws" is pretty big now a
| days. We use GCP but if they have an issue and it bubbles
| down to me nobody bats an eye when I say the magical
| cloud man made ut oh whoopsie and it wasn't me.
| sebzim4500 wrote:
| I doubt anyone has ever been fired for choosing AWS. I
| know for a fact that people have been fired after
| deciding to do it on bare metal and then it didn't work
| very well.
| jasonlotito wrote:
| "I think it's more of a shield against upper management."
|
| "Someone decided to use AWS, so there is blame to go
| around."
|
| Upper management.
| ozim wrote:
| So it does not really work in B2B.
|
| I don't really have much to do with contracts - but my
| company is stating that we have up time of 99.xx%.
|
| In terms of contract customers don't care if I have Azure/AWS
| or I keep my server in the box under the stairs. Yes they do
| due diligence and would not buy my services if I keep it in
| shoe box.
|
| But then if they loose business they come to me .. I can go
| after Azure/AWS but I am so small they will throw some free
| credits and me and tell to go off.
|
| Maybe if you are in B2C area then yeah - your customers will
| probably shrug and say it was M$ or Amazon if you write sad
| blog post with excuses.
| zerkten wrote:
| It's going to depend on the penalties for being
| unavailable. Small B2B customers are very different from
| enterprise B2B customers too, so you ultimately have to
| build for your context.
|
| If you have to give service credits to customers then with
| "one box" you have to give 100% of customers a credit. If
| your services are partitioned across two "shards" then one
| of those shards can go down, but your credits are only paid
| out at 50%.
|
| Getting to this place doesn't prevent a 100% outage and it
| imposes complexity. This kind of design can be planned for
| enterprise B2B apps when the team are experienced with
| enterprise clients. Many B2B SaaS are tech folk with zero
| enterprise experience, so they have no idea of relatively
| simple things that can be done to enable a shift to this
| architecture.
|
| Enterprise customers do care where things are hosted. They
| very likely have some users in the EU, or other locations,
| which care more about data protection and sovereignty than
| the average US organization. Since they are used to hosting
| on-prem and doing their own due diligence they will often
| have preferences over hosting. In industries like
| healthcare, you can find out what the hosting preferences
| are, as well as understand how the public clouds are
| addressing them. While not viewed as applicable by many on
| HN due to the focus on B2C and smaller B2B here, this is
| the kind of thing that can put a worse product ahead in the
| enterprise scenario.
| HWR_14 wrote:
| Because you have a vendor/customer relationship. The big
| thing for AWS is employer/employee relationships. If you
| were a larger company, and AWS goes down, who blames you?
| Who blames anyone in the company? At the C-level, does the
| CEO expect more uptime than _Amazon_? Of course not. And so
| it goes.
|
| Whereas if you do something other than the industry
| standard of AWS (or Azure/GCP) and it goes down, clearly
| it's _your fault_.
| andrepew wrote:
| Depends on scale of B2B. Between enterprises, not as much.
| Between small businesses, works very well (at least in my
| experience, we are tiny B2B).
| lanstin wrote:
| It really varies a lot. I have seen very large lazy sites
| suddenly pick up a client that wanted RCA for each bad
| transaction, and suddenly get religion quickly (well
| quickly as a large org can). Those are precious clients
| because they force investment into useful directions of
| availability instead of just new features.
| travisgriggs wrote:
| "Value in outsourcing blame"
|
| The real reason that talented engineers secretly support all
| of the middle management we vocally complain about.
| ocdtrekkie wrote:
| I find this entire attitude disappointing. Engineering has
| moved from "provide the best reliability" to "provide the
| reliability we won't get blamed for the failure of". Folks
| who have this attitude missed out on the dang ethics course
| their college was teaching.
|
| If rolling your own is faster, cheaper, and more reliable (it
| is), then the only justification for cloud is assigning
| blame. But you know what you also don't get? Accolades.
|
| I throw a little party of one here when Office 365 or Azure
| or AWS or whatever Google calls it's cloud products this week
| is down but all our staff are able to work without issue. =)
| jeroenhd wrote:
| If you work in B2B you can put the blame on Amazon and your
| customers will ask "understandable, take the necessary steps
| to make sure it doesn't happen again". AWS going down isn't
| an act of God, it's something you should've planned for,
| especially if it happened before.
| nrmitchi wrote:
| There is also the consideration that this isn't even an
| argument of "other things are down too!" or "outsourcing blame"
| as much as, depending on what your service is of course, you
| are unlikely to be operating in a bubble. You likely have some
| form of external dependencies, or you are an external
| dependency, or have correlated/cross-dependency usage with
| another service.
|
| Guaranteeing isolation between all of these different moving
| parts is _very difficult_. Even if you 're not directly
| affected by a large cloud outage, it's becoming less-and-less
| common that you, or your customers, are truely isolated.
|
| As well, if your AWS-hosted service mostly exists to service
| AWS-hosted customers, and AWS is down, it doesn't matter if you
| are down. None of your customers are operational anyways. Is
| this a 100% acceptable solution? Of course not. But for 95% of
| services/SaaS out there, it really doesn't matter.
| [deleted]
| taylodl wrote:
| Users are much more sympathetic to outages when they're
| widespread. But, if there's a contractual SLA then their
| sympathy doesn't matter. You have to meet your SLA. That
| usually isn't a big problem as SLAs tend to account for some
| amount of downtime, but it's important to keep the SLA in mind.
| hans1729 wrote:
| This just holds when you are b2b. If you're serving end
| users, they don't care about the contract, they care about
| their UX.
| z3t4 wrote:
| You also have to calculate in the complexity of running
| thousands of servers vs running just one server. If you run
| just one server it's unlikely to go down even once in it's
| lifetime. Meanwhile cloud providers are guaranteed to have
| outages due to the share complexity of managing thousands of
| servers.
| bilekas wrote:
| I can't tell if this is a good thing or a bad thing though!
|
| Imagine the clout of saying : "we stayed online while AWS died"
| dghlsakjg wrote:
| Depends on how technical your customer base is. Even as a
| developer I would tend not to ascribe too much signal to that
| message. All it tells me is that you don't use AWS.
|
| "We stayed online when GCP, AWS, and Azure go down" is a
| different story. On the other hand, if those three go down
| simultaneously, I suspect the state of the world will be such
| that I'm not worried about the internet.
| lanstin wrote:
| I would expect there are BGP issues that could do that, at
| least for large swaths of the internet.
| [deleted]
| namose wrote:
| I do also remember in one of the recent AWS outages, the
| google cloud compute service had lower availability due to
| failovers hitting all at once
| Nextgrid wrote:
| HN implicitly gets this clout - it became the _real_ status
| page of most of the internet.
| cal85 wrote:
| > In comparison, buying servers takes about 8 months to break
| even compared to using cloud servers, and 30 months to break even
| compared to renting.
|
| Can anyone help me understand why the cloud/renting is still this
| expensive? I'm not familiar with this area, but it seems to me
| that big data centers must have some pretty big cost-saving
| advantages (maintenance? heat management?). And there are several
| major providers all competing in a thriving marketplace, so I
| would expect that to drive the cost down. How can it still be so
| much cheaper to run your own on-prem server?
| WJW wrote:
| Several points:
|
| - The price for on-prem conveniently omits costs for power,
| cooling, networking, insurance and building space, it's only
| the purchase price.
|
| - The price for the cloud server includes (your share of) the
| costs of replacing a broken power supply or hard drive, which
| is not included in the list price for on-prem. You will have to
| make sure enough of your devs know how to do that or else hire
| a few sysadmin types.
|
| - As the article already mentions, the cloud has to provision
| for peak usage instead of average usage. If you buy an on-prem
| server you always have the same amount of computing power
| available and can't scale up quickly if you need 5x the
| capacity because of a big event. That kind of flexibility costs
| money.
| cal85 wrote:
| Thank you, that explains it.
| zucker42 wrote:
| Not included in the break even calculation was the cost of
| colocation, or the cost of hiring someone to make sure the
| computer is in working order, or the less hassle upon hardware
| failures.
|
| Also, as the author even mention in an article, a modern server
| basically obsoletes a 10 year old server. So you're going to
| have to replace your server at least every 10 years. So the
| break even in the case of renting makes sense when you consider
| that the server depreciates really quickly.
| manigandham wrote:
| You're paying a premium for _flexibility_. If you don 't need
| that then there are far cheaper options like some managed
| hosting from your local datacenter.
| klysm wrote:
| The huge capital required to get a data center with those cost
| savings serves as a nice moat to let people price things high.
| marcosdumay wrote:
| Renting is not very expensive. 30 months is a large share of a
| computer's lifetime, and you are paying for space, electricity,
| and internet access too.
| merb wrote:
| > If you compare to the OVHCloud rental price for the same
| server, the price premium of buying your compute through AWS
| lambda is a factor of 25
|
| and there is a factor of 25 that ovh is not a company where you
| should rent servers:
|
| https://www.google.com/search?q=ovh+fire
| siliconc0w wrote:
| One thing to keep in mind is separation. The prod environment
| should be completely separated from the dev ones (plural, it
| should be cheap/fast to spin up dev environments). Access to
| production data should be limited to those that need it (ideally
| for just the time they need it). Teams should be able to deploy
| their app separately and not have to share dependencies (i.e
| operating system libraries) and it should be possible to test OS
| upgrades (containers do not make you immune from this). It's
| _kinda_ possible to sort of do this with 'one big server' but
| then you're running your own virtualized infrastructure which has
| it's own costs/pains.
|
| Definitely also don't recommend one big database, as that becomes
| a hairball quickly - it's possible to have several logical
| databases for one physical 'database 'server' though.
| lordleft wrote:
| Interesting write-up that acknowledges the benefits of cloud
| computing while starkly demonstrating the value proposition of
| just one powerful, on-prem server. If it's accurate, I think a
| lot of people are underestimating the mark-up cloud providers
| charge for their services.
|
| I think one of the major issues I have with moving to the cloud
| is a loss of sysadmin knowledge. The more locked in you become to
| the cloud, the more that knowledge atrophies within your
| organization. Which might be worth it to be nimble, but it's a
| vulnerability.
| phpisthebest wrote:
| Given that AWS holds up the entire Amazon Company, and is a
| large part of Bezo's personal wealth, I think the market up is
| pretty good.
| evilotto wrote:
| Many people will respond that "one big server" is a massive
| single point of failure, but in doing so they miss that it is
| also a single point of success. If you have a distributed system,
| you have to test and monitor lots of different failure scenarios.
| With a SPOS, you only have one thing to monitor. For a lot of
| cases the reliability of that SPOS is plenty.
|
| Bonus: Just move it to the cloud, because AWS is definitely not
| its own SPOF and it never goes down taking half the internet with
| it.
| MrStonedOne wrote:
| /tg/station, the largest open source multiplayer video game on
| github, gets cloudheads trying to help us "modernize" the game
| server for the cloud all the time.
|
| Here's how that breaks down:
|
| The servers (sorry, i mean compute) cost the same (before
| bandwidth, more on that at the bottom) to host one game server as
| we pay (amortized) per game server to host 5 game servers on a
| rented dedicated server. ($175/month for the rented server with
| 64gb of ram and a 10gbit uplink)
|
| They run twice as slow because high core count slow clock speed
| servers aren't all they are cracked up to be, and our game engine
| is single threaded, but even if it wasn't, there is an overhead
| to multithreading things which combined with most high core count
| servers also having slow clock speed, rarely squares out to an
| actual increase in real world performance.
|
| You can get the high clock speed units, they are twice to three
| times as expensive. And still run 20% slower over windows vms on
| rented bare metal because the sad fact is enterprise cpus by
| either intel or amd have slower clock speeds and single threaded
| performance then their gaming cpu counterparts, and getting
| gaming cpus for rented servers is piss easy, but next to
| impossible for cloud servers.
|
| Each game server uses 2tb of bandwidth to host 70 player high
| pops. This works with 5 servers on 1 machine because our hosting
| provider gives us 15tb of bandwidth included in the price of the
| server.
|
| Well now the cloud bill just got a new 0. 10 to 30x more
| expensive once you remember to price in bandwidth isn't looking
| too great.
|
| "but it would make it cheaper for small downstreams to start out"
| until another youtuber mentions our tiny game, and every game
| server is hitting the 120 hard pop cap, and a bunch of
| downstreams get a surprise 4 digit bill for what would normally
| run 2 digits.
|
| The take away from this being that even adding in docker or k8s
| deployment support to the game server is seen as creating the
| risk some kid bankrupts themselves trying to host a game server
| of their favorite game off their mcdonalds paycheck, and we tell
| such tech "pros" to sod off with their trendy money wasters.
| mwcampbell wrote:
| > $175/month for the rented server with 64gb of ram and a
| 10gbit uplink)
|
| Wow, what provider is that?
| corford wrote:
| Hetzner's PX line offers 64GB ECC RAM, Xeon CPU, dual 1TB
| NVME for < $100/month. A dedicated 10Gbit b/w link (plus
| 10Gbit NIC) is then an extra ~$40/month on top (incls.
| 20TB/month traffic, with overage billed at $1/TB).
| twblalock wrote:
| YetAnotherNick wrote:
| This post raises small issues like reliability, but missed lot of
| much bigger issues like testing, upgrades, reproducibility,
| backups and even deployments. Also, the author is comparing on
| demand pricing, which to me doesn't make sense if you are paying
| for the server with reserved pricing. Still I agree there would
| be a difference of 2-3x(unless your price is dominated by AWS
| egress fees), but most server with fixed workload, even for very
| popular but simple sites, it could be done in $1k/month in cloud,
| less than 10% of one developer salary. For non fixed workload
| like ML training, you would anyways need some cloudy setup.
| softfalcon wrote:
| So... I guess these folks haven't heard of latency before? Fairly
| sure you have to have "one big server" in every country if you do
| this. I feel like that would get rather costly compared to
| geographically distributed cloud services long term.
| gostsamo wrote:
| The article explicitly mentiones CDN as something that you can
| outsource and also notes that the market there is competitive
| and the prices are low.
| Nextgrid wrote:
| As opposed, to "many small servers" in every country? The vast
| majority of startups out there run out of a single AWS region
| with a CDN caching read-only content. You can apply the same
| CDN approach to a bare-metal server.
| softfalcon wrote:
| Yeah, but if I'm a startup and running only a small server,
| the cloud hosting costs are minimal. I'm not sure how you
| think it's cheaper to host tiny servers in lots of countries
| and pay someone to manage that for you. You'll need IT in
| every one of those locations to handle the service of your
| "small servers".
|
| I run services globally for my company, there is no way we
| could do it. The fact that we just deploy containers to k8s
| all over the world works very well for us.
|
| Before you give me the "oh k8s, well you don't know bare
| metal" please note that I'm an old hat that has done the
| legacy C# ASP.NET IIS workflows on bare metal for a long
| time. I have learned and migrated to k8s on AWS/GCloud and it
| is a huge improvement compared to what I used to deal with.
|
| Lastly, as for your CDN discussion, we don't just host CDN's
| globally. We also host geo-located DB + k8s pods. Our service
| uses web sockets and latency is a real issue. We can't have
| 500 ms ping if we want to live update our client. We choose
| to host locally (in what is usually NOT a small server) so we
| get optimal ping for the live-interaction portion of our
| services that are used by millions of people every day.
| Nextgrid wrote:
| > the cloud hosting costs are minimal
|
| Disagreed. The cloud equivalent of a small server is still
| a few hundred bucks a month + bandwidth. Sure, it's still a
| relatively small cost but you're still overpaying
| significantly over the Hetzner equivalent which will be
| sub-$100.
|
| > pay someone to manage that for you
|
| The same guy that manages your AWS can do this. Having
| bare-metal servers doesn't mean renting colo space and
| having people on-site - you can get them from
| Hetzner/OVH/etc and they will manage all the hardware for
| you.
|
| > The fact that we just deploy containers to k8s all over
| the world works very well for us.
|
| It's great that it works well for you and I am in no way
| suggesting you should change, but I wouldn't say it would
| apply to everyone - the cloud adds significant costs with
| regards to bandwidth alone and makes some services outright
| impossible with that pricing model.
|
| > We also host geo-located DB
|
| That's a complex use-case that's not representative of most
| early/small SaaS which are just a CRUD app backed by a DB.
| If your business case requires distributed databases and
| you've already done the work, great - but a lot of services
| don't need that (at least not yet) and can do just fine
| with a single big DB server + application server and good
| backups, and that will be dirt-cheap on bare-metal.
| nostrebored wrote:
| Claiming that Hetzner is equivalent is fallacious. The
| offerings are completely different.
|
| Agreed on networking though!
| Nextgrid wrote:
| In context of a "small server", I think they are
| equivalent. AWS gives you a lot more functionality but
| you're unlikely to be using any of it if you're just
| running a single small "pet" server.
| kkielhofner wrote:
| You don't need IT in every location or even different
| hosting facility contracts. Most colo hosting companies
| have multiple regions. From the 800lb gorilla (Equinix):
|
| https://www.equinix.com/data-centers
|
| Or a smaller US focused colo provider:
|
| https://www.coresite.com/data-centers/locations
|
| Between vendor (Dell, HP, IBM, etc) and the remote hands
| offered by the hosting facility you don't ever have to have
| a member of your team even enter a facility. Anywhere.
| Depending on the warranty/support package the vendor will
| dispatch someone to show up to the facility to replace
| failed components with little action from you.
|
| The vendor will be happy to ship the server directly to the
| facility (anywhere) and for a nominal fee the colo provider
| will rack it and get IPMI, iLo, IP KVM, whatever up for you
| to do your thing. When/if something ever "hits the fan"
| they have on site 24 hour "remote hands" that can either
| take basic pre-prescribed steps/instructions -or- work with
| your team directly and remotely.
|
| Interestingly, at my first startup we had a facility in the
| nearest big metro area that not only hosted our hardware
| but also provided an easy, cheap, and readily available
| meeting space:
|
| https://www.coresite.com/data-centers/data-center-
| design/ame...
| kgeist wrote:
| >The vast majority of startups out there run out of a single
| AWS region with a CDN caching read-only content.
|
| I wonder how many of them violate GDPR and similar laws in
| other countries in regards to personal data processing by
| processing everything in the US.
| treis wrote:
| This is one of those problems that basically no one has. RTT
| from Japan to Washington D.C. is 160ms. There's very few
| applications where that amount of additional latency matters.
| naavis wrote:
| It adds up surprisingly quickly when you have to do a TLS
| handshake, download many resources on pageload etc. The TLS
| handshake alone costs 3 round-trips over the network.
| treis wrote:
| TLS is cached though. Your 3 round trips is 1/2 second on
| initial load but then should be reused for subsequent
| requests.
|
| Resources should be served through a CDN so you'll get
| local servers for those.
| yomkippur wrote:
| What holds me back from doing this is how will I reduce latency
| from the calls coming from other side of the world when OVHcloud
| seemingly does not have datacenters all over the world? There is
| an noticeable lag when it comes to multiplayer games or even web
| applications.
| tonymet wrote:
| people don't account for the cpu & wall-time cost of encode-
| decode. I've seen it take up 70% of cpu on a fleet. That means
| 700/1000 servers are just doing encode decode.
|
| You can see high efficiency setups like stackexchange &
| hackernews are orders of magnitude more efficient.
| pclmulqdq wrote:
| This is exactly correct. If you have a microservice running a
| Rest API, you are probably spending most of your CPU time on
| HTTP and JSON handling.
| adam_arthur wrote:
| I'm building an app with Cloudflare serverless and you can
| emulate everything locally with a single command and debug
| directly... It's pretty amazing.
|
| But the way their offerings are structured means it will be quite
| expensive to run at scale without a multi cloud setup. You can't
| globally cache the results of a worker function in CDN, so any
| call to a semi dynamic endpoint incurs one paid invocation, and
| there's no mechanism to bypass this via CDN caching because the
| workers live in front of the CDN, not behind it.
|
| Despite their media towards lowering cloud costs, they have
| explicitly designed their products to contain people in a cost
| structure similar to but different than via egress fees. And in
| fact it's quite easily bypassed by using a non Cloudflare CDN in
| front of Cloudflare serverless.
|
| Anyway, I reached a similar conclusion that for my app a single
| large server instance works best. And actually I can fit my whole
| dataset in RAM, so disk/JSON storage and load on startup is even
| simpler than trying to use multiple systems and databases.
|
| Further, can run this on a laptop for effectively free, and cache
| everything via CDN, rather than pay ~$100/month for a cloud
| instance.
|
| When you're small, development time is going to be your biggest
| constraint, and I highly advocate all new projects start with a
| monolithic approach, though with a structure that's conducive to
| decoupling later.
| bilekas wrote:
| I don't agree with EVERYTHING in the article such as getting 2
| big rather than multiple smaller, this is really just a
| cost/requirement issue though.
|
| The biggest cost I've noticed with enterprises who go full cloud
| is that they are locked in for the long term. I don't mean
| contractually though, basically the way they design and implement
| any system or service MUST follow the providers "way" this can be
| very detrimental for leaving the provider or god forbid the
| provider decides to sunset certain service versions etc.
|
| That said, for enterprise it can make a lot of sense and the
| article covers it well by admitting some "clouds" are beneficial.
|
| For anything I've ever done outside of large businesses the go to
| has always been "if it doesn't require a SRE to maintain, just
| host your own".
| amelius wrote:
| Nice until your server gets hugged by HN.
| runeks wrote:
| > The big drawback of using a single big server is availability.
| Your server is going to need downtime, and it is going to break.
| Running a primary and a backup server is usually enough, keeping
| them in different datacenters.
|
| What about replication? I assume the 70k postgres IOPS fall to
| the floor when needing to replicate the primary database to a
| backup server in a different region.
| arwhatever wrote:
| Recent team I was on used one big server.
|
| Wound up spawning off a separate thread from our would-be
| stateless web api to run recurring bulk processing jobs.
|
| Then coupled our web api to the global singleton-esque bulk
| processing jobs thread in a stateful manner.
|
| The wrapped actors up on actors on top of everything to try to
| wring as much performance as possible out of the big server.
|
| Then decided they wanted to have a failover/backup server but it
| was too difficult due to the coupling to the global singleton-
| esque bulk processing job.
|
| [I resigned at this point.]
|
| So yeah color me skeptical. I know every project's needs are
| different, but I'm a huge fan of dumping my code into some cloud
| host that auto-scaled horizontally, and then getting back to
| writing more code that provides some freeeking busines value.
| the_duke wrote:
| This has nothing to do with cloud Vs big server.You can build
| horrible, tightly coupled architectures anywhere. You can
| alsocleanly separate workloads on a single server just fine.
| malkia wrote:
| It was all good, until NUMA came, and now you have to careful
| rethought your process, or you get lots of performance issues in
| your (otherwise) well threaded code. Speaking from first-hand
| experience, when our level editor ended up being used by artists
| on a server class machine, and supposedly 4x faster machine was
| actually going 2x slower (why, lots of std::shared_ptr<> use on
| our side, or any atomic reference counting) caused slowdowns, as
| the cache (my understanding) had to be synchronized between the
| two physical CPUs each having 12 threads.
|
| But really not the only issue, just pointing out - that you can't
| expect everything to scale smoothly there, unless well thought,
| like ask your OS to allocate your threads/memory only on one of
| the physical CPUS (and their threads), and somehow big
| disconnected part of your process(es) on the other one(s), and
| make sure the communication between them is minimal.. which
| actually wants micro-services design again at that level.
|
| so why not go with micro-services instead...
| faizshah wrote:
| In the paper on Twitter's "Who to Follow" service they mention
| that they designed the service around storing the entire twitter
| graph in the memory of a single node:
|
| > An interesting design decision we made early in the Wtf project
| was to assume in-memory processing on a single server. At first,
| this may seem like an odd choice, run- ning counter to the
| prevailing wisdom of "scaling out" on cheap, commodity clusters
| instead of "scaling up" with more cores and more memory. This
| decision was driven by two rationales: first, because the
| alternative (a partitioned, dis- tributed graph processing
| engine) is significantly more com- plex and di
| and, second, because we could! We elaborate on these two
| arguments below.
|
| I always wondered if they still do this and if this influenced
| any other architectures at other companies.
|
| Paper:
| https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69...
| faizshah wrote:
| In the paper on Twitter's "Who to Follow" service they mention
| that they designed the service around storing the entire twitter
| graph in the memory of a single node:
|
| > An interesting design decision we made early in the Wtf project
| was to assume in-memory processing on a single server. At first,
| this may seem like an odd choice, run- ning counter to the
| prevailing wisdom of "scaling out" on cheap, commodity clusters
| instead of "scaling up" with more cores and more memory. This
| decision was driven by two rationales: first, because the
| alternative (a partitioned, dis- tributed graph processing
| engine) is significantly more com- plex and di
| and, second, because we could! We elaborate on these two
| arguments below.
|
| I always wondered if they still do this and if this influenced
| any other architectures at other companies.
|
| Paper:
| https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69...
| faizshah wrote:
| In the paper on Twitter's "Who to Follow" service they mention
| that they designed the service around storing the entire twitter
| graph in the memory of a single node:
|
| > An interesting design decision we made early in the Wtf project
| was to assume in-memory processing on a single server. At first,
| this may seem like an odd choice, run- ning counter to the
| prevailing wisdom of "scaling out" on cheap, commodity clusters
| instead of "scaling up" with more cores and more memory. This
| decision was driven by two rationales: first, because the
| alternative (a partitioned, dis- tributed graph processing
| engine) is significantly more com- plex and dicult to build, and,
| second, because we could! We elaborate on these two arguments
| below.
|
| > Requiring the Twitter graph to reside completely in mem- ory is
| in line with the design of other high-performance web services
| that have high-throughput, low-latency require- ments. For
| example, it is well-known that Google's web indexes are served
| from memory; database-backed services such as Twitter and
| Facebook require prodigious amounts of cache servers to operate
| smoothly, routinely achieving cache hit rates well above 99% and
| thus only occasionally require disk access to perform common
| operations. However, the additional limitation that the graph
| fits in memory on a single machine might seem excessively
| restrictive.
|
| I always wondered if they still do this and if this influenced
| any other architectures at other companies.
|
| Paper:
| https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.69...
| 3pt14159 wrote:
| Yeah I think single machine has its place, and I once sped up a
| program by 10000x by just converting it to Cython and having it
| all fit in the CPU cache, but the cloud still does have a
| place! Even for non-bursty loads. Even for loads that
| theoretically could fit in a single big server.
|
| Uptime.
|
| Or are you going to go down as all your workers finish? Long
| connections? Etc.
|
| It is way easier to gradually handover across multiple API
| servers as you do an upgrade than it is to figure out what to
| do with a single beefy machine.
|
| I'm not saying it is always worth it, but I don't even think
| about the API servers when a deploy happens anymore.
|
| Furthermore if you build your whole stack this way it will be
| non-distributed by default code. Easy to transition for some
| things, hell for others. Some access patterns or algorithms are
| fine when everything is in a CPU cache or memory but would fall
| over completely across multiple machines. Part of the nice part
| about starting with cloud first is that it is generally easier
| to scale to billions of people afterwards.
|
| That said, I think the original article makes a nuanced case
| with several great points and I think your highlighting of the
| Twitter example is a good showcase for where single machine
| makes sense.
| efortis wrote:
| Some comments wrongly equate bare-metal with on-premise. Bare-
| metal servers can be rented out, collocated, or installed on-
| premise.
|
| Also, when renting, the company takes care of hardware failures.
| Furthermore, as hard disk failures are the most common issue, you
| can have hot spares and opt to let damaged disks rot, instead of
| replacing them.
|
| For example, in ZFS, you can mirror disks 1 and 2, while having 3
| and 4 as hot spares, with the following command:
| zpool create pool mirror $d1 $d2 spare $d3 $d4
|
| ---
|
| The 400Gbps are now 700Gbps
|
| https://twitter.com/DanRayburn/status/1519077127575855104
|
| ---
|
| About the break even point:
|
| Disregarding the security risks of multi-tenant cloud instances,
| bare-metal is more cost-effective once your cloud bill exceeds
| $3,000 per year, which is the cost of renting two bare-metal
| servers.
|
| ---
|
| Here's how you can create a two-server infrastructure:
|
| https://blog.uidrafter.com/freebsd-jails-network-setup
| drewg123 wrote:
| 720Gb/s actually. Those last 20-30Gb/s were pretty hard fought
| :)
| efortis wrote:
| Yeah. Thank you!
| zhoujianfu wrote:
| 10 years ago I had a site running on an 8GB of ram VM ($80/mo?)
| that ran a site serving over 200K daily active users on a
| completely dynamic site written in PHP running MySQL locally.
| Super fast and never went down!
| porker wrote:
| I like One Big (virtual) Server until you come to software
| updates. At a current project we have one server running the
| website in production. It runs an old version of Centos, the web
| server, MySQL and Elasticsearch all on the one machine.
|
| No network RTTs when doing too many MySQL queries on each page -
| great! But when you want to upgrade one part of that stack... we
| end up cloning the server, upgrading it, testing everything, and
| then repeating the upgrade in-place on the production server.
|
| I don't like that. I'd far rather have separate web, DB and
| Elasticsearch servers where each can be upgraded without fear of
| impacting the other services.
| rlpb wrote:
| You could just run system containers (eg. lxd) for each
| component, but still on one server. That gets you multiple
| "servers" for the purposes of upgrades, but without the rest of
| the paradigm shift that Docker requires.
| 0xbadcafebee wrote:
| Which is great until there's a security vuln in an end-of-
| life piece of core software (the distro, the kernel, lxc,
| etc) and you need to upgrade the whole thing, and then it's a
| 4+ week slog of building a new server, testing the new
| software, fixing bugs, moving the apps, finding out you
| missed some stuff and moving that stuff, shutting down the
| old one. Better to occasionally upgrade/reinstall the whole
| thing with a script and get used to not making one-off
| changes on servers.
|
| If I were to buy one big server, it would be as a hypervisor.
| Run Xen or something and that way I can spin up and down VMs
| as I choose, LVM+XFS for snapshots, logical disk management,
| RAID, etc. But at that point you're just becoming a personal
| cloud provider; might as well buy smaller VMs from the cloud
| with a savings plan, never have to deal with hardware, make
| complex changes with a single API call. Resizing an instance
| is one (maybe two?) API call. Or snapshot, create new
| instance, delete old instance: 3 API calls. Frickin' magic.
|
| _" the EC2 Instance Savings Plans offer up to 72% savings
| compared to On-Demand pricing on your Amazon EC2 Instances"_
| - https://aws.amazon.com/savingsplans/
| rlpb wrote:
| Huh? Using lxd would be identical to what you suggest (VMs
| on Xen) from a security upgrade and management perspective.
| Architecturally and operationally they're basically the
| equivalent, except that VMs need memory slicing up but lxd
| containers don't. There are security isolation differences
| but you're not talking about that here?
| 0xbadcafebee wrote:
| I would want the memory slicing + isolation, plus a
| hypervisor like Xen doesn't need an entire host OS so
| there's less complexity, vulns, overhead, etc, and I'm
| not aware if LXD does the kind of isolation that ex.
| allows for IKE IPSec tunnels? Non-hypervisors don't allow
| for it iirc. Would rather use Docker for containers
| because the whole container ecosystem is built around it.
| rlpb wrote:
| > I would want the memory slicing + isolation...
|
| Fine, but then that's your reason. "until there's a
| security vuln in an end-of-life piece of core
| software...and then it's a 4+ week slog of building a new
| server" isn't a difference in the context of comparing
| Xen VMs and lxd containers. As an aside, lxd does support
| cgroup memory slicing. It has the advantage that it's not
| mandatory like it is in VMs, but you can do it if you
| want it.
|
| > Would rather use Docker for containers because the
| whole container ecosystem is built around it.
|
| This makes no sense. You're hearing the word "container"
| and inferring an equivalence that does not exist. The
| "whole container ecosystem" is something that exists for
| Docker-style containers, and is entirely irrelevant for
| lxd containers.
|
| lxd containers are equivalent to full systems, and exist
| in the "Use one big server" ecosystem. If you're familiar
| with running a full system into a VM, then you're
| familiar with the inside of a lxd container. They're the
| same. In userspace, there's no significant difference.
| YetAnotherNick wrote:
| Even lxd has updates, many a times security updates.
| ansible wrote:
| I use LXC a lot for our relatively small production setup.
| And yes, I'm treating the servers like pets, not cattle.
|
| What's nice is that I can snapshot a container and move it to
| another physical machine. Handy for (manual) load balancing
| and upgrades to the physical infrastructure. It is also easy
| to run a snapshot of the entire server and then run an
| upgrade, then if the upgrade fails, you roll back to the old
| snapshot.
| pclmulqdq wrote:
| Containers are your friend here. The sysadmin tools that have
| grown out of the cloud era are actually really helpful if you
| don't cloud too much.
| cxromos wrote:
| is this clickbait?
|
| although i do like the alternate version: use servers, but don't
| be too serverly.
| jedberg wrote:
| I'm a huge advocate of cloud services, and have been since 2007
| (not sure where this guy got 2010 as the start of the "cloud
| revolution"). That out of the way, there is something to be said
| for starting off with a monolith on a single beefy server. You'll
| definitely iterate faster.
|
| Where you'll get into trouble is if you get popular quickly. You
| may run into scaling issues early on, and then have to scramble
| to scale. It's just a tradeoff you have to consider when starting
| your project -- iterate quickly early and then scramble to scale,
| or start off more slowly but have a better ramping up story.
|
| One other nitpick I had is that OP complains that even in the
| cloud you still have to pay for peak load, but while that's
| strictly true, it's amortized over so many customers that you
| really aren't paying for it unless you're very large. The more
| you take advantage of auto-scaling, the less of the peak load
| you're paying. The customers who aren't auto-scaling are the ones
| who are covering most of that cost.
|
| You can run a pretty sizable business in the free tier on AWS and
| let everyone else subsidize your peak (and base!) costs.
| rmbyrro wrote:
| Isn't this simplistic?
|
| It really depends on the service, how it is used, the shape of
| the data generated/consumed, what type of queries are needed,
| etc.
|
| I've worked for a startup that hit scaling issues with ~50
| customers. And have seen services with +million users on a
| single machine.
|
| And what does "quickly" and "popular" even mean? It also
| depends a lot on the context. We need to start discussing about
| mental models for developers to think of scaling in a
| contextual way.
| Phil_Latio wrote:
| > Where you'll get into trouble is if you get popular quickly.
| You may run into scaling issues early on
|
| Did it ever occur to you that you can still use the cloud for
| on demand scaling? =)
| jedberg wrote:
| Sure but only if you architect it that way, which most people
| don't if they're using one big beefy server, because the
| whole reason they're doing that is to iterate quickly. It's
| hard to build something that can bust to the cloud while
| moving quickly.
|
| Also, the biggest issue is where your data is. If you want to
| bust to the cloud, you'll probably need a copy of your data
| in the cloud. Now you aren't saving all that much money
| anymore and adding in architectural overhead. If you're going
| to bust to the cloud, you might as well just build in the
| cloud. :)
| EddySchauHai wrote:
| > But if I use Cloud Architecture, I Don't Have to Hire Sysadmins
|
| > Yes you do. They are just now called "Cloud Ops" and are under
| a different manager. Also, their ability to read the arcane
| documentation that comes from cloud companies and keep up with
| the corresponding torrents of updates and deprecations makes them
| 5x more expensive than system administrators.
|
| I don't believe "Cloud Ops" is more complex than system
| administration, having studied for the CCNA so being on the
| Valley of Despair slope of the Dunning Kruger effect. If keeping
| up with cloud companies updates is that much of a challenge to
| warrant a 5x price over a SysAdmin then that's telling you
| something about their DX...
| abrax3141 wrote:
| I may be misunderstanding, but it looks like the micro-services
| comparison here is based on very high usage. Another use for
| micro-services, like lambda, is exactly the opposite. If you have
| very low usage, you aren't paying for cycles you don't use the
| way you would be if you either owned the machine, or rented it
| from AWS or DO and left it on all the time (which you'd have to
| do in order to serve that randomly-arriving one hit per day!)
| pclmulqdq wrote:
| If you have microservices that truly need to be separate
| services and have very little usage, you probably should use
| things like serverless computing. It scales down to 0 really
| well.
|
| However, if you have a microservice with very little usage,
| turning that service into a library is probably a good idea.
| abrax3141 wrote:
| Yes. I think that the former case is the situation we're in.
| Lambdas are annoying (the whole AWS is annoying!) but, as you
| say, scales to 0 very well.
| marcosdumay wrote:
| Why open yourself to random $300k bills from Amazon when the
| alternative is wasting a $5/month server?
| abrax3141 wrote:
| I don't understand what these numbers are referring to.
| marcosdumay wrote:
| One is a normal size of those rare, but not too rare bills
| people get from Amazon when their unused unoptmized
| application gets some surprise usage.
|
| The other is how much it costs to have an always-on server
| paid VPS capable of answering the once a day request you
| specified.
___________________________________________________________________
(page generated 2022-08-02 23:00 UTC)