[HN Gopher] Prometheus, but Bigger
___________________________________________________________________
Prometheus, but Bigger
Author : kiyanwang
Score : 43 points
Date : 2021-06-13 12:25 UTC (10 hours ago)
(HTM) web link (luizrojo.medium.com)
(TXT) w3m dump (luizrojo.medium.com)
| Ne02ptzero wrote:
| > For the monthly expenses, with most of the components running
| on-premises, there was a 90.61% cost reduction, going from US$
| 38,421.25 monthly to US$ 3,608.99, including the AWS services
| cost.
|
| I might be missing something, but $3k/month (let alone
| $38k/month) sounds absolutely insane to me for how little metrics
| they're collecting (4k5 metrics per second, 2.7TB of data per
| year). Is the money going for network bandwidth or something
| along those lines?
| lcw wrote:
| I agree. I'm working with a metrics system that takes just over
| 1 million metrics a second and it has a similar run rate to
| ~38k a month.
| ericbarrett wrote:
| AWS, for example, charges for cross-AZ data transfers. Naive
| setups with multiple AZs (us-west-1a, 1b, etc.) and a
| centralized Prometheus setup will rack up quite the cost.
| marcinzm wrote:
| They say they ingest 226gb per month. The cross-az transfer
| cost is $0.01 per gb. So that should come out to only
| $2.26/month for them.
| gregimba wrote:
| The reducing cross-AZ data transfer savings on one service
| resulted in a low 6 figure per year savings. Its something we
| overlooked during initial setup and now its something I check
| for when dealing with AWS networking.
| otterley wrote:
| Out of curiosity, what's your plan for recovery during an
| AZ or regional outage?
| manyxcxi wrote:
| If RDS or other "hard to replicate very quickly during
| disaster" infra is being run I personally would still
| have cross A-Z replication at minimum, to reduce network
| costs I would configure the "other zone" as a backup
| replica only and not for performance clustering.
|
| With automation we can spin up full new compute stacks,
| including load balancers and DNS in about 5-10 minutes
| per "unique" environment configuration.
|
| While it guarantees we could never have a no downtime
| failover, we're okay with it and have more than halved
| our network costs (which admittedly were about number 8
| on our AWS bill by cost).
| nwmcsween wrote:
| I dont understand the need to NiH things, why not clickhouse with
| weekly, monthly, etc aggregation. It comes with built in
| sharding, can store time series data somewhat efficiently and
| doesn't have some hack h/a setup (in general not thanos).
| [deleted]
| benchess wrote:
| The key constraint here is that the author has no access to
| persistent disks and can only use object storage for persistence.
| Otherwise Thanos would be extreme overkill for this number of
| metrics.
|
| Single-node VictoriaMetrics can easily handle 1M metrics/sec
| zzyzxd wrote:
| > Thanos would be extreme overkill for this number of metrics
|
| Data volume is just one thing. Thanos makes Prometheus
| stateless and easy to shard, all in a non-invasive approach
| that is solid, boring, and just works. The architecture works
| well in small scale systems. I even use it in a single node k8s
| cluster in my homelab, pays only about ~$1 a month for
| Backblaze B2 so I never worry about data retention or disk
| usage.
|
| > Single-node VictoriaMetrics can easily handle 1M metrics/sec
|
| Even if I have disk access, I would think twice before
| deploying a database and manage it myself when I don't have to.
| Besides the maintenance burden and potential scaling issues in
| the future, it may cost you more to use block storage like EBS
| than S3.
|
| Also, Prometheus memory usage overhead for remote write was
| wild[1], so, good luck with capacity planning and config
| tweaking.
|
| 1. https://prometheus.io/docs/practices/remote_write/
| znpy wrote:
| > Single-node VictoriaMetrics can easily handle 1M metrics/sec
|
| yeah sure but then what will you be posting on Linkedin with
| buzzwords? "I have installed a boring software on a single
| machine because it works, performs well enough and it's cheap"
| ? what are you going to say, "I take daily snapshots of the vm
| because 1-day rpo is fine for me"?
|
| How will you get them likes??
|
| (this is an ironic comment, before anyone starts a flame)
___________________________________________________________________
(page generated 2021-06-13 23:01 UTC)