[HN Gopher] Why we moved from AWS RDS to Postgres in Kubernetes
___________________________________________________________________
Why we moved from AWS RDS to Postgres in Kubernetes
Author : elitan
Score : 101 points
Date : 2022-09-26 18:15 UTC (4 hours ago)
(HTM) web link (nhost.io)
(TXT) w3m dump (nhost.io)
| techn00 wrote:
| So what solution did you end up using? Crunchy operator?
| nesmanrique wrote:
| We evaluated several operators but at the end decided it would
| be best to deploy our own setup for the postgres workloads
| instead using helm.
| geggam wrote:
| I would love to see the monitoring on this.
|
| Network IOPs and NAT nastiness or disk IO the bigger issue ?
| qeternity wrote:
| These threads are always full of people who have always used an
| AWS/GCP/Azure service, or have never actually run the service
| themselves.
|
| Running HA Postgres is not easy...but at any sort of scale where
| this stuff matters, nothing is easy. It's not as if AWS has 100%
| uptime, nor is it super cheap/performant. There are tradeoffs for
| everyone's use-case but every thread is full of people at one end
| of the cloud / roll-your-own spectrum.
| 988747 wrote:
| I've been successfully running Postgres in Kubernetes with the
| Operator from Crunchy Data. It makes HA setup really easy with
| a tool called Patroni, which basically takes care of all the
| hard stuff. Running 1 primary and 2 replicas is really no
| harder than running single-node Postgres.
| api wrote:
| I wonder how many people use things like CockroachDB, Yugabyte,
| or TiDB? They're at least in theory far easier to run in HA
| configurations at the cost of some additional overhead and in
| some cases more limited SQL functionality.
|
| They seem like a huge step up from the arcane "1980s Unix"
| nightmare of Postgres clustering but I don't hear about them
| that much. Are they not used much or are their users just happy
| and quiet?
|
| (These are all "NewSQL" databases.)
| belmont_sup wrote:
| New user of cockroach. We'll find out! If this startup ever
| makes it to any meaningful user sizd
| ftufek wrote:
| Honestly, that's what I initially thought trying to run ha
| postgres on k8s, but zalando's postgres operator made things so
| much easier (maybe even easier than RDS). Very easy to rollout
| as many postgres clusters with whatever size you want. We've
| been running our production db on it for the last 6 months or
| so, no outage yet. Though I guess if you have to have a very
| custom setup, it might be more difficult.
| qubit23 wrote:
| I was hoping to see a bit more of an explanation of how this was
| implemented.
| elitan wrote:
| We need a follow up: *How* we're running thosands of Postgres
| databases in Kubernetes.
| KaiserPro wrote:
| In this instance I can see the point, being able to give raw
| access to customer's own psql instance is a good feature.
|
| but. It sounds bloody expensive to develop and maintain a
| reliable psql service on k8s
| jmarbach wrote:
| $0.50 per extra GB seems high, especially for a storage-intensive
| app. Given the cost of cloud Object Storage services it doesn't
| seem to make much sense.
|
| Examples of alternatives for managed Postgres:
|
| * Supabase is $0.125 per GB
|
| * DigitalOcean managed Postgres is ~$0.35 per GB
| makestuff wrote:
| SUpabase runs on AWS so they are either losing a ton of money,
| have some amazing deal with AWS, or the $0.50 is inaccurate.
| kiwicopple wrote:
| (supabase ceo)
|
| EBS pricing is here: https://aws.amazon.com/ebs/pricing/
|
| I'd have to check with the team but I'm 80% sure we're on gp3
| ($0.08/GB-month).
|
| That said, we have a very generous free tier. With AWS we
| have an enterprise plan + savings plan + reserved instances.
| Not all of these affect EBS pricing, but we end up paying a
| lot less than the average AWS user due to our high-usage.
| neilv wrote:
| I didn't see "backups" mentioned in that, though I'm sure they
| have them. Depending on your needs, it's a big thing to keep in
| mind while weighing options.
|
| For a small startup or operation, a managed service having
| credible snapshots, PITR backups, failover, etc. is going to save
| a business a lot of ops cost, compared to DIY designing,
| implementing, testing, and drilling, to the same level of
| credibility.
|
| One recent early startup, I looked at the amount of work for me
| or a contractor/consultant/hire to upgrade our Postgres recovery
| capability (including testing and drills) with confidence. I soon
| decided to move from self-hosted Postgres to RDS Postgres.
|
| RDS was a significant chunk of our modest AWS bill (otherwise,
| almost entirely plain EC2, S3, and traffic), but easy to justify
| to the founders, just by mentioning the costs it saved us for
| business existential protection we needed.
| nunopato wrote:
| Thanks for bringing this up. We do have backups running daily,
| and we will have "backups on demand" soon as well.
| nunopato wrote:
| (Nhost)
|
| Sorry for not answering everyone individually, but I see some
| confusion duo to the lack of context about what we do as a
| company.
|
| First things first, Nhost falls into the category of backend-as-
| a-service. We provision and operate infrastructure at scale, and
| we also provide and run the necessary services for features such
| as user authentication and file storage, for users creating
| applications and businesses. A project/backend is comprised of a
| Postgres Database and the aforementioned services, none of it is
| shared. You get your own GraphQL engine, your own auth service,
| etc. We also provide the means to interface with the backend
| through our official SDKs.
|
| Some points I see mentioned below that are worth exploring:
|
| - One RDS instance per tenant is prohibited from a cost
| perspective, obviously. RDS is expensive and we have a very
| generous free tier.
|
| - We run the infrastructure for thousands of projects/backends
| which we have absolutely no control over what they are used for.
| Users might be building a simple job board, or the next Facebook
| (please don't). This means we have no idea what the workloads and
| access patterns will look like.
|
| - RDS is mature and a great product, AWS is a billion dolar
| company, etc - that is all true. But is it also true that we do
| not control if a user's project is missing an index and the fact
| that RDS does not provide any means to limit CPU/memory usage per
| database/tenant.
|
| - We had a couple of discussions with folks at AWS and for the
| reasons already mentioned, there was no obvious solution to our
| problem. Let me reiterate this, the folks that own the service
| didn't have a solution to our problem given our constraints.
|
| - Yes, this is a DIY scenario, but this is part of our core
| business.
|
| I hope this clarifies some of the doubts. And I expect to have a
| more detailed and technical blog post about our experience soon.
|
| By the way, we are hiring. If you think what we're doing is
| interesting and you have experience operating Postgres at scale,
| please write me an email at nuno@nhost.io. And don't forget to
| star us at https://github.com/nhost/nhost.
| cloudbee wrote:
| And what are your cost savings from RDS perspective. I'd a
| similar problem where we'd to provision like 5 databases for 5
| different teams. RDS is really expensive. And your solution is
| open source ? I would like to try.
| SOLAR_FIELDS wrote:
| RDS and similar managed databases are over half of our total
| cloud bill at my place of work. Managed databases in general
| are _really expensive_.
| nunopato wrote:
| I hope to have a more detailed analysis to share when we have
| more accurate data. We launched individual instances recently
| and although I don't have exact numbers, the price difference
| will be significant. Just imagine how much it would cost to
| have 1 RDS instance per tenant (we have thousands).
|
| We haven't open-sourced any of this work yet but we hope to
| do it soon. Join us on discord if you want to follow along
| (https://nhost.io/discord).
| mp3tricord wrote:
| In a production data base why are people executing long running
| queries on the primary. They should be using a DB replica.
| xwowsersx wrote:
| I've recently been spending a fair amount of time trying to
| improve query performance on RDS. This includes reviewing and
| optimizing particularly nasty queries, tuning PG configuration
| (min_wal_size, random_page_cost, work_mem, etc). I am using a
| db.t3.xlarge with general purpose SSD (gp2) for a web server that
| sees moderate writes and a lot of reads. I know there's no real
| way to know other than through testing, but I'm not clear on
| which instance type best serves our needs -- I think it may very
| well be the case that the t3 family isn't fit for our purposes.
| I'm also unclear on whether we ought to switch to provisioned
| IOPS SSD. Does anyone have any general pointers here? I know the
| question is pretty open-ended, but would be great if anyone has
| general advice from personal experience?
| notac wrote:
| I'd recommend hopping off of t3 asap if you're searching for
| performance gains - performance can be extremely variable (by
| design). M class will even you out.
|
| General storage IOPS is governed by your provisioned storage
| size. You can again get much more consistent performance by
| using provisioned IOPS.
|
| Feel free to email me if you want to chat through things
| specific to your env - email is in my about:
| xwowsersx wrote:
| Thank you so much, will definitely take you up on the offer.
| Nextgrid wrote:
| It's hard to say without metrics; what does your CPU load look
| like? In general, unless your CPU is often maxing out, changing
| the CPU is unlikely to help, so you're left with either memory
| or IO.
|
| Unused memory on Linux will be automatically used to cache IO
| operations, and you can also tweak PG itself to use more memory
| during queries (search for "work_mem", though there are
| others).
|
| If your workload is read-heavy, just giving it more memory so
| that the majority of your dataset is always in the kernel IO
| cache will give you an immediate performance boost, without
| even having to tweak PG's config (though that might help even
| further). This won't transfer to writes - those still require
| an actual, uncached IO operation to complete (unless you want
| to put your data at risk, in which case there are parameters
| that can be used to override that).
|
| For write-heavy workloads, you will need to upgrade IO; there's
| no way around the "provisioned IOPS" disks.
| xwowsersx wrote:
| Thanks very much for the reply. CPU is not often maxing out.
| Here's a graph of max CPU utilization from the last week
| https://ibb.co/tzw5p3L
| Nextgrid wrote:
| You've got some spikes that could signify some large or
| unoptimized queries, but otherwise yes, the CPU doesn't
| look _that_ hot.
|
| I suggest upgrading to an instance type which gives you
| 32GB or more of memory. You'll get a bigger CPU along with
| it as well, but don't make the CPU your priority, it's not
| your main bottleneck at the moment.
| xwowsersx wrote:
| Makes sense, thank you. Sounds like M class is the way to
| go as other commenter suggested. Also, yes. There are
| many awful queries that I'm aware of and working to
| correct.
| stunt wrote:
| What's the benefit of running Postgres in Kubernetes vs VMs (with
| replication obviously)?
| radimm wrote:
| Having recently heard a lot of about PostgreSQL in Kubernetes
| (cloudNativePG for example) it always makes me wonder about the
| actual load and the complexity of the cluster in the question.
|
| > This is the reason why we were able to easily cope with 2M+
| requests in less than 24h when Midnight Society launched
|
| This gives the answer, while it's probably not evenly distributed
| gives 23 req/sec (guess peak 60 - 100 might be already stretching
| it). Always wonder about use cases around 3 - 5k req/sec as
| minimum.
|
| [edit] PS: not really ditching neither k8s pg nor AWS RDS or
| similar solutions. Just being curious.
| kccqzy wrote:
| > This is the reason why we were able to easily cope with 2M+
| requests in less than 24h
|
| I thought this was referring to 2M+ requests per second over a
| ramp period of 24h, not 2M requests per 24h?
| xani_ wrote:
| It's essentially just a process running in a cgroup so
| performance shouldn't be all that different than bare metal/VM
| postgresql.
|
| Main difference would be storage speed and how it exactly is
| attached to a container.
| brand wrote:
| I've personally deployed O(TBs) and O(10^4 TPS) Postgres
| clusters on Kubernetes with a CNPG-style operator based
| deployment. There are some subtleties to it but it's not
| exceeding complicated, and a good project like CNPG goes a long
| way to shaving off those sharp edges. As other commenters have
| suggested it's good to really understand Kubernetes if you want
| to do it, though.
| radimm wrote:
| Thanks for the confirmation. As mentioned I'm not saying no
| to it. It is really that "really understand" part which holds
| me back for now - mainly the observability and dealing with
| edge cases in high-throughput environment.
| Nextgrid wrote:
| > 23 req/sec (guess peak 60 - 100 might be already stretching
| it)
|
| That kind of load is something a decent developer laptop with
| an NVME drive can serve, nothing to write home about.
|
| It is sad that the "cloud" and all these supposedly "modern"
| DevOps systems managed to redefine the concept of "performance"
| for a large chunk of the industry.
| rrampage wrote:
| It depends a lot on the backend architecture. Number of DB
| requests per web request can also be high due to the
| pathological cases in some ORMs which can result in N+1 query
| problems or eagerly fetching entire object hierarchies. Such
| problems in application code can get brushed under the carpet
| due to "magical" autoscaling (be it RDS or K8s). There can
| also be fanout to async services/job queues which will in
| turn run even more DB queries.
| AccountAccount1 wrote:
| Hey, this is not a problem for us at Nhost since most of
| the interfacing with Postgres is through Hasura (a GraphQL
| SQL-to-GraphQL) it solves the n+1 issue by compiling a
| performant sql statement from the gql query (it's also
| written in haskell, you can read more here
| https://hasura.io/blog/architecture-of-a-high-performance-
| gr...)
| robertlagrant wrote:
| I don't think K8s at least will autoscale quickly enough to
| mask something like that.
| singron wrote:
| RDS tops out at about 18000 IOPS since it uses a single ebs
| volume. Any decent ssd will do much better. E.g. a 970 evo
| will easily do >100K IOPS and can do more like 400K in ideal
| conditions.
|
| You can get that many IOPS with aurora, but the cost is
| exorbitant.
| mcbain wrote:
| I don't think it has been a single EBS volume for a while,
| but in any case, 256k is more than 18k. https://docs.aws.am
| azon.com/AmazonRDS/latest/UserGuide/CHAP_...
| mhuffman wrote:
| It does depend on the architecture and framework they are
| using imo. I have a single Hetzer machine with spinning plate
| HDs that serves between 1-2 million requests per day hitting
| DB and ML models and rarely every gets over 1% CPU usage. I
| have pressure-tested it to around 3k reqs/sec. On the other
| hand I have seen WP and CodeIgniter setups that even with 5
| copies running on the largest AWS instances available,
| "optimized" to the hilt, caching everywhere possible, etc.
| absolute crumble under the load of 3k req per min. (not sec
| ... min).
|
| Many frameworks that make early development easy fuck you
| later during growth with ORM calls, tons of unnecessary text
| in the DB, etc.
| Nextgrid wrote:
| Keep in mind that your Hetzner instance has locally-
| attached storage and a real CPU as opposed to networked
| storage and a slice of a CPU, so I'm not surprised at all
| that this beats an AWS setup even on the more expensive
| instances.
|
| Yes, frameworks can be a problem (although including WP in
| the list is an insult to other, _actually decent_
| frameworks), but I would bet good money if they moved their
| setup to a Hetzner setup it would still fly. Non-optimal
| ORM calls can be optimized manually without necessarily
| dropping the framework altogether.
| marcosdumay wrote:
| Hum... The Hetzner instance is very likely cheaper than
| any AWS setup, so while there is a point in that part,
| it's not a very relevant one. (And that's exactly the
| issue with the "modern DevOps" tooling.)
| acdha wrote:
| > On the other hand I have seen WP and CodeIgniter setups
| that even with 5 copies running on the largest AWS
| instances available, "optimized" to the hilt, caching
| everywhere possible, etc. absolute crumble under the load
| of 3k req per min. (not sec ... min).
|
| This sounds like some other architectural problems -
| running nowhere near the largest instances available that
| was single node performance on EC2 in the 2000s.
|
| There are concerns switching from local to SAN storage, of
| course, but that's also shifting the problem if you care
| about durability.
| derefr wrote:
| Depends on the queries. Point queries that take 1ms each?
| Sure. Analytical queries that take 1000ms+ each? Not so much.
| jerf wrote:
| I can't blame it on "cloud", though it's not helping that
| there are an awful lot of cloud services that claim to be
| "high performance" and are often mediumish at best. But in
| general I see a lot of ignorance in the developer community
| as to how fast things should be able to run, even in terms of
| reading local files and doing local manipulations with no
| "cloud" in sight.
|
| Honestly, if I had to pin it on just one thing, I'd blame
| networking everything. Cloud would fit as a subset of that.
| Networking slows things down at the best of times, and the
| latency distribution can be a nightmare at the worst. Few
| developers think about the cost of using the network, and
| even fewer can think about it holistically (e.g., to avoid
| making 50 network transactions spread throughout the system
| when you could do it all in one transaction if you rearranged
| things).
| geggam wrote:
| Are you talking about the cloud host to cloud host
| networking or the POD networking inside the single host ?
|
| The dizzying amount of NAT layers has to be killing
| performance. I haven't had the chance to ever sit down and
| unravel a system running a good load. The lack of TCP
| tuning combined with the required connection tracking is
| interesting to think about
| kazen44 wrote:
| i still dont understand why nearly all CNI's are so hell
| bent on implementing a dozen layers of NAT to tunnel
| their overlay networks, instead of implementing a proper
| control plane to automate it all away between routes.
|
| Calico seems to be doing it semi-okeish, and even their
| the control plane is kind of unfinished?
|
| The only software based solution which seem to properly
| have this figured out is VMware NSX-T. (i am not counting
| all the traditional overlay networks in use by ISP's
| based on MPLS/BGP).
| geggam wrote:
| Before you even get to the CNI, I think AWS VM to
| internet is at least 3 NAT layers.
|
| So we have 3 layers from container to pod. The virtual
| host kernel is tracking those layers. Once connection to
| one container is 3 tracked connections. Then you have
| whatever else you put on top to go in and out of the
| internet.
|
| The funny think to me is HaProxy recommended getting rid
| of connection tracking for performance while everyone is
| doubling down on that alone and calling it performant.
| kazen44 wrote:
| > Few developers think about the cost of using the network.
|
| Developers do not seem to realise how slow the network is
| compared to everything else.
|
| Sure, 100gbit network itnerfaces do exist, but most servers
| are attached with 10gbit interfaces, and most of the actual
| implementations will not actually manage to hit something
| like 10gbit/s because of latency and window scaling.
|
| You cannot escape latency (without inventing another
| universe in which physics do not apply). And latency is
| detrimental to performance.
|
| Getting anything across a large enough network under
| 1millisecond is hard, and compared to a IOP on a local NVME
| disk, it is painfully slow.
| whoisthemachine wrote:
| > You cannot escape latency (without inventing another
| universe in which physics do not apply). And latency is
| detrimental to performance.
|
| This. So few people distinguish between bandwidth and
| latency. One can be increased arbitrarily and fairly
| easily with new encoding techniques (which generally only
| improves edge cases), and the other has a floor that is
| hard-coded into our universe. I've gotten into debates
| with folks who think a 10GB connection from the EU to
| Texas should be as fast as a connection from Texas to the
| Midwest, or to speed up the EU-TX connection they just
| need to spend more on bandwidth.
| briffle wrote:
| it seems most of the tools for running postgresql in K8s
| seem to just default to creating a new copy of the DB at
| the drop of a hat. When your DB is in the multi-TB sizes,
| that can come with a noticable cost in network fees, plus
| a very long delay, even on modern fast networks.
| ayende wrote:
| You are off by a couple of orders of magnitude
|
| I have run 500+ req/sec on a raspberry pi using 4 TB dataset
| with 2 GB of RAM, with under 100ms for the 99.99 percentile
|
| A few hundreds req a second is basically nothing.
| c2h5oh wrote:
| That kind of a load you can handle on spinning rust without
| breaking a sweat.
| MBCook wrote:
| So they switch from one giant RDS instance with all tenants per
| AZ to per-tenant PG in Kubernetes.
|
| So really we don't know how much RDS was a problem compared to
| the the tenant distribution.
|
| For the purposes of an article like this it would be nice if the
| two steps were separate or they had synthetic benchmarks of the
| various options.
|
| But I understand why they just moved forward. They said they
| consulted experts, it would also be nice to discuss some of what
| they looked or asked about.
| 0xbadcafebee wrote:
| Ah, the 'ol sunk cost fallacy of infrastructure. We are already
| investing in supporting K8s, so let's throw the databases in
| there too. Couldn't possibly be that much work.
|
| Sure, a decade-old dedicated team at a billion-dollar
| multinational corporation has honed a solution designed to
| support hundreds of thousands of customers with high
| availability, and we could pay a little bit extra money to spin
| up a new database per tenant that's a little bit less flexible,
| ..... or we could reinvent everything they do on our own software
| platform and expect the same results. All it'll cost us is extra
| expertise, extra staff, extra time, extra money, extra planning,
| and extra operations. But surely it will improve our product
| dramatically.
| gw99 wrote:
| I'm not so sure. All you have is another layer of abstraction
| between you and the problem that you are facing. And that level
| of abstraction may violate your SLAs unless you pitch $15k for
| the enterprise support option. And that may not even be
| fruitful because it relies on an uncertain network of folk at
| the other end who may or may not even be able to interpret
| and/or solve your problem. Also you are at the whim of their
| changes which may or may not break your shit.
|
| Source: AWS user on very very large scale stuff for about 10
| years now. It's not magic or perfection. It's just someone
| else's pile of problems that are lurking. The only consolation
| is they appear to try slightly harder than the datacentres that
| we replaced.
| xani_ wrote:
| [deleted]
| KaiserPro wrote:
| > I bet you also hate on people making their own espresso
| instead of just going to starbucks
|
| Hobbies are not the same as bottom line business.
|
| As with everything, managing state at scale is _very_ hard.
| Then you have to worry about backing it up.
| [deleted]
| wbl wrote:
| Running a statefull service in K8S is its own ball of wax
| foobarian wrote:
| Yes, Postgres on K8S... <shudder>
| patrec wrote:
| It is, but then I never understood why on earth you'd use
| k8s if you don't have stateful services. I mean really,
| what's the point?
| mijamo wrote:
| Because it's easy? What alternative would you suggest?
| patrec wrote:
| The idea that something of the monstrous complexity of
| k8s is easy is pretty funny to me. I think if you have
| less than than 2 full time experts on k8s at hand, you're
| basically nuts if you use it for some non-toy project. In
| my experience, you can and will experience interesting
| failure scenarios.
|
| If you don't have state, why not just either use
| something serverless/fully-managed (beanstalk, lambda,
| cloudflare workers whatever) if you really need to scale
| up and down (or have very limited devops/sysadmin
| capacity) or deploy like 2 or 3 bare metal machines or
| VMs?
|
| Either sounds like a lot less work to manage and
| troubleshoot than some freaking k8s cluster.
| janee wrote:
| Bare metal I'd think is the first choice for a large
| rdbms where you have skilled dedicated personnel that can
| manage it.
|
| If not rather use a specialist service like RDS for
| anything with serious uptime/throughput requirements.
|
| k8s doesn't really make sense to me unless it's for
| spinning up lots of instances, like for test or dev envs
| or like in the article where they host DBs for people.
| deathanatos wrote:
| ... I do it, in my day job. It's really not. StatefulSets
| are explicitly for this.
|
| We also have managed databases, too.
|
| Self-managed stuff means I can, generally, get shit done
| with it, when oddball things need doing. Managed stuff is
| fine right up until it isn't (i.e., yet another outage with
| the status page being green), or until there's a
| requirement that the managed system inexplicably can't
| handle (despite the requirement being the sort of obvious
| thing you would expect of $SYSTEM, but which no PM thought
| to ask before purchasing the deal...), and then you're in
| support ticket hell.
|
| (E.g., we found out the hard way that there is not way to
| move a managed PG database from one subnet in a network to
| another, in Azure! _Even if you 're willing to restore from
| a backup._ We had to deal with that ourselves, by taking a
| pgdump -- essentially, un-managed-solution the backup.
|
| ... the whole reason we needed to move the DB to a
| different subnet was because of a _different_ flaw, in a
| _different_ managed service, and Azure 's answer on _that_
| ticket was "tough luck DB needs to move". Tickets,
| spawning tickets. Support tickets for managed services take
| up an unholy portion of my time.)
| [deleted]
| folkhack wrote:
| I'd posit that it's not as simple. Maybe if you're just
| cranking out your one-off app or something of the sort...
|
| But getting a good replication setup that's HA, potentially
| across multiple regions/zones, all abstracted under K8s -
| yea. That's not trivial. And, it can go _very_ wrong.
|
| > I bet you also hate on people making their own espresso
| instead of just going to starbucks
|
| This is just unnecessary.
| sn0wf1re wrote:
| >> I bet you also hate on people making their own espresso
| instead of just going to starbucks
|
| >This is just unnecessary.
|
| I agree the ad hominem is not required, although the
| analogy is itself decent.
| folkhack wrote:
| I mean I can make up ad hominem analogies about this
| stuff too - but it practice it makes people feel
| attacked/defensive, and rarely ever adds nuance or
| context to the conversation. I feel like in this
| situation it could have been omitted as-per HN
| guidelines:
|
| > In Comments:
|
| > Be kind. Don't be snarky.
| coenhyde wrote:
| You're talking like managing stateful services in an
| ephemeral environment is as simple as installing and
| configuring Postgres. Postgres is its self is 1% of the
| consideration here.
| suggala wrote:
| AWS RDS is 10x slower than BareMetal MySQL (both reads and
| writes). Slowness is mainly due to the reason that Storage is
| over network for RDS.
|
| Not bad to invest some extra time to get better performance.
|
| You are falling to "Appeal to antiquity" fallacy if you think
| something old is better.
| 0xbadcafebee wrote:
| What you describe is still a fallacy because it's assuming
| that just because you _can_ get better performance with
| BareMetal, that somehow this is a cheaper or better option.
| In fact it will be either more error-prone, or more
| expensive, or both, because you are trying to reproduce from
| scratch what the whole RDS team has been doing for 10 years.
| Nextgrid wrote:
| It's unlikely running it on K8S (which is itself going to run
| on underpowered VMs with networked storage) is going to help.
|
| If you're gonna spend effort in running Postgres manually, do
| it on bare-metal and at least get some reward out of it
| (performance _and_ reduced cost).
| derefr wrote:
| > It's unlikely running it on K8S (which is itself going to
| run on underpowered VMs with networked storage) is going to
| help.
|
| On GCP, at least, you can provision a GKE node-pool where
| the nodes have direct-attached NVMe storage; deploy a
| privileged container that formats and RAID0s up the drives;
| and then make use of the resulting scratch filesystem via
| host-mounts.
| qeternity wrote:
| > It's unlikely running it on K8S (which is itself going to
| run on underpowered VMs with networked storage) is going to
| help.
|
| What?? We run replicated Patroni on local NVMEs and it's
| incredibly fast.
| dijit wrote:
| And when it all goes bottoms up it will be much more difficult
| to resolve.
| baq wrote:
| Fortunately Postgres doesn't do that often by itself. It
| usually needs some creative developer's assistance.
| dijit wrote:
| I think you're triggering the worst case a lot more often
| when it comes to running Postgres on k8s: the storage can
| be removed independently from the workload and the pod can
| be evicted much easier than it would be in traditional
| database hosting methods.
|
| No need for developers to do anything strange at all.
| throwawaymaths wrote:
| Depends. A lot of postgres usage is often "things that might
| as well be redis", like session tokens (but the library we
| imported came configured for postgres) so if the postgres
| goes down, as long as it can be restarted it won't be the end
| of the world even if all the data were wiped.
|
| Probably there is also an 80/20 for most users where it's not
| awful if you can restore from a cold storage, say 12h,
| backup.
| HL33tibCe7 wrote:
| Couldn't you just spin up an RDS instance for each project (so,
| single-tenant RDS instances) to avoid the noisy neighbour
| problem? Or is that too expensive?
| elitan wrote:
| We could, yes. But way to expensive compared to our current
| setup.
|
| We're offering free projects (Postgres, GraphQL (Hasura), Auth,
| Storage, Serverless Functions) so we need to optimize costs
| internally.
___________________________________________________________________
(page generated 2022-09-26 23:00 UTC)