[HN Gopher] Using AZs can eat up your budget - From Prometheus t...
___________________________________________________________________
Using AZs can eat up your budget - From Prometheus to
VictoriaMetrics
Author : shscs911
Score : 65 points
Date : 2024-12-26 08:09 UTC (3 days ago)
(HTM) web link (engineering.prezi.com)
(TXT) w3m dump (engineering.prezi.com)
| tomalaci wrote:
| I've used VictoriaMetrics in past (~4 years ago) for collection
| of not just service monitoring data but also for network switch
| and cell tower module metrics. At the time I found it to be the
| most efficient Prometheus-like service in terms of query speed,
| data compression and, more importantly, being able to handle high
| cardinality (over 10s or 100s of millions of series).
|
| However, I later switched to Clickhouse because I needed extra
| flexibility of running occasional async updates or deletes. In
| VictoriaMetrics you usually need to wipe out the entire series
| and re-ingest it. That may not be possible or would be quite
| annoying if you are dealing with a long history and you just
| wanted to update/delete some bad data in a month.
|
| So, if you want a more efficient Prometheus drop-in replacement
| and don't think limited update/delete ability is an issue then I
| highly recommend VictoriaMetrics. Otherwise, Clickhouse (larger
| scale) or Timescale (smaller scale) has been my go to for
| anything time series.
| brunoqc wrote:
| Btw, both clickhouse and timescale are open core. If you care
| about that.
| thayne wrote:
| So is VictoriaMetrics
| brunoqc wrote:
| You are right. I guess I just saw the Apache 2 license and
| assumed it was foss.
| hipadev23 wrote:
| Is there a reason you drop this comment on every product
| mention that's not 100% OSS
| presspot wrote:
| Because it's helpful and adds context? Why do you care?
| hipadev23 wrote:
| Because it's frustrating. They do it to belittle the
| projects and shame the authors for trying to make a
| living.
|
| Not every project wants to end up as another bloated
| abandonware in Apache Software Foundation.
| simfree wrote:
| FOSS washing software is similarly frustrating.
|
| When I see a license on a project I expect that project
| will provide the code under that license and function
| fully at runtime, not play games of "Speak to a sales rep
| to flip that bit or three to enable that codepath".
| raffraffraff wrote:
| I'd love to see a comparison with Mimir. Some of the problems
| that this article describes with Prometheus are also solved by
| Mimir. I'm running it in single binary mode, and everything is
| stored in S3. I'm deploying Prometheus in agent mode so it just
| scrapes and remote writes to Mimir, but doesn't store anything.
| The helm chart is a bit hairy because I have to use a fork for
| single binary mode, but it has actually been extremely stable and
| cheap to run. The same AZ cost saving rules apply, but my traffic
| is low enough right now for it not to matter. But I suppose I
| could also run ingesters per AZ to eliminate cross-AZ traffic.
| thelittleone wrote:
| Interesting. I'm fairly new to the field, but would this
| configuration help reduce the cost of logging security events
| from multiple zones/regions/providers to a colocated cluster?
| raffraffraff wrote:
| Not really. On AWS, you're always going to pay an egress cost
| to get those logs out of AWS to your colo. If you were to
| ship your security logs to S3 and host your security log
| indexing and search services on EC2 within the same AWS
| region as the S3 bucket, you wouldn't have to worry about
| egress.
| FridgeSeal wrote:
| I was on a team once where we ran agent-mode Prometheus into a
| Mimir cluster and it was endless pain and suffering.
|
| Parts of it would time out and blow up, one of the dozen
| components (slight hyperbole) they have you run would go down
| and half the cluster would go with it. It often had to be
| nursed back to health by hand, it was expensive to run, queries
| ate not even that fast.
|
| Absolutely would not repeat the experience. We cheered the
| afternoon we landed the PR to dump it.
| raffraffraff wrote:
| I definitely think that running the microservices deployment
| of Mimir (and Loki) looks hairy. But the monolithic
| deployments can handle pretty large volumes.
| dantillberg wrote:
| This excessive inter-AZ data transfer pricing is distorting
| engineering best practices. It _should_ be cheap to operate HA
| systems across 2-3 AZs, but because of this price distortion on
| inter-AZ traffic charges, we lean towards designs that either
| silo data within an AZ, or that leverage S3 or other hosted
| solutions as a sort of accounting workaround (i.e. there are no
| data transfer charges to read/write an S3 bucket from any AZ in
| the same region).
|
| While AWS egress pricing gets a lot of attention, I think that
| the high cost of inter-AZ traffic is much less defensible. This
| is transfer on short fat pipes completely owned by Amazon. And at
| $0.01/GB, that's 2~10X what smaller providers charge for
| _internet_ egress.
| hipadev23 wrote:
| I assume it's to discourage people from architecting designs
| that abuse the network. A good example would be collecting
| every single metric, every second, across every instance for no
| real business reason.
| thayne wrote:
| Or maybe it is price discrimination. A way to extract more
| money from customers that need higher availability and
| probably have higher budgets.
| hipadev23 wrote:
| Price discrimination is when you charge different amounts
| for the same thing to different customers. And usually the
| difference in those prices are not made apparent. Like when
| travel websites quote iOS users more than Android because
| they generally can afford to pay more.
|
| This is just regular ole pricing.
| thayne wrote:
| So what is the correct term for "charge an extremely high
| markup for a feature that some, but not all, of your
| customers need"?
| spondylosaurus wrote:
| Price gouging?
| jchanimal wrote:
| I came here to say the same thing. When you're selling
| cloud services, the hardest thing to do is segment your
| customers by willingness to pay.
|
| Cross AZ traffic is exactly the sort of thing companies
| with budgets need, that small projects don't.
| mcmcmc wrote:
| Supply and demand
| hansvm wrote:
| It's a bit of a mix, but price discrimination isn't far
| off. It's like the SSO tax; all organizations are paying
| for effectively the same service, but the provider has
| found a minor way to cripple the service that selectively
| targets people who can afford to pay more.
|
| If we want to call this just regular ole pricing, it's
| not a leap to call most textbook cases of price
| discrimination "regular ole pricing" as well. An online
| game charges more if your IP is from a certain geography?
| That's not discrimination; we've simply priced the
| product differently if you live in Silicon Valley; don't
| buy it it you don't want it.
| cowsandmilk wrote:
| Sending traffic between AZs doesn't necessarily improve
| availability and can decrease it. Each of your services can
| be multi-az, but have hosts that talk just to other
| endpoints in their AZ.
| thayne wrote:
| Unless your app is completely stateless, you will need
| some level of communication across AZs.
|
| And you often want cross-zone routing on your load
| balancers so that if you lose all the instances in one AZ
| traffic will still get routed to healthy instances.
| KaiserPro wrote:
| > Or maybe it is price discrimination.
|
| It very much is, because scaling bandwidth between phyical
| datacenters which are not located next to each other is
| very expensive. So pricing it means that people don't use
| it as much as if it was free.
| mcmcmc wrote:
| That's not what price discrimination is
| koolba wrote:
| You can still do that if you buffer it and push it to S3. Any
| AZ to S3 is free and the net result is the same.
| KaiserPro wrote:
| _I don 't work for AWS_
|
| However I do work for a company with >1 million servers.
| Scaling inter datacentre bandwidth is quite hard. Sure the
| datacentres might be geographically close, but laying network
| cables over distance is expensive. Moreover unless you spend
| uber millions, you're never going to get as much bandwidth as
| you have inside the datacentre.
|
| So you either apply hard limits per account, or price it so
| that people think twice about using it.
| themgt wrote:
| OK but $10/TB has gotta be like >99% profit margin for AWS.
| _After_ massively jacking up their prices, Hetzner internet
| egress is only EUR1 /TB. Also AWS encourages / in some cases
| practically forces you to do multi-AZ.
|
| I remember switching to autoscaling spot instances to save a
| few bucks, then occasionally spot spinup would fail due to
| lack of availability within an AZ so I enabled multi-AZ spot.
| Then got hit with the inter-AZ bandwidth charges and wasn't
| actually saving any money vs single-AZ reserved. This was
| about the point I decided DIY Kubernetes was simpler to
| reason about.
| everfrustrated wrote:
| Apples and Oranges. Hetzner doesn't even have multiple AZ's
| by AWS's definition - all Hetzners DCs eg Falkenstein 1-14
| would be the same AZ zone.
|
| AWS network is designed with a lot more internal capacity
| and reliability than Hetzner which costs a lot more -
| multiple uplinks to independent switches, etc.
|
| AWS is also buying current gen network gear which is much
| more pricey - Hetzner is mostly doing 1 Gig ports or 10 gig
| at a push which means they can get away with >10 year old
| switches (if you think they buy new switches I have a
| bridge you might be interested in buying).
|
| This costs at least an order of magnitude more.
| iscoelho wrote:
| I agree with this post that Hetzner is a bad example.
| They are focused on a budget deployment.
|
| I do not agree that a state-of-the-art high capacity
| deployment is as expensive as you think it is. If an
| organization pays MSRP on everything, has awful
| procurement with nonexistent negotiation, and multiple
| project failures, sure, maybe. In the real world though,
| we're not all working for the federal government (-:
| dantillberg wrote:
| While your caveats are all noteworthy, I'll add that
| Hetzner also offers unlimited/free bandwidth between
| their datacenters in Germany and Finland. That's sort of
| like AWS offering free data transfer between us-east-1
| and us-east-2.
| iscoelho wrote:
| In Ashburn, VA, I can buy Dark Fiber for $750 MRC to any
| datacenter in the same city. I can buy Dark Fiber for $3-5K
| MRC to any random building in the same city.
|
| That Duplex Dark Fiber with DWDM can run 4TBPS of capacity at
| 100GE (40x 100GE). Each 100GE transceiver costs $2-4K NRC
| dependent on manufacturer - $160K NRC for 40x. (There are
| higher densities as well, like 200/400/800GE, 100GE is just
| getting cheap.)
|
| In AWS, utilizing 1x100GE will cost you >$1MM MRC. For
| significantly less than that, let's say absolutely worst-case
| 5K MRC + 200K NRC, you can get 40x100GE.
|
| Now you have extra money for 4x redundancy, fancy routers,
| over-spec'd servers, world-class talent, and maybe a yacht if
| your heart desires.
| bobbob1921 wrote:
| I'm just throwing out a hypothetical, so I may be
| completely off base: perhaps aws charges high inter-AZ
| bandwidth prices to keep users from tunneling traffic
| between availabilities zones to arbitrage lower
| Internet/egress costs at AZ 1 vs AZ 3.
|
| Outside of my statement above, I do agree that the cost
| Amazon pays for bandwidth between their sites, has to be
| practically nothing at their scale/size (and thus they
| should charge their customers very little for it,
| especially considering easy-multi AZ is a big
| differentiator for cloud vs self-hosting / colo). The user
| above's dark fiber MRC prices are spot on.
| dilyevsky wrote:
| You should do the math though because it's expensive but
| nowhere near 1c/g expensive
| thayne wrote:
| > while it's tempting to use the infinitely-scalable object
| storage (like S3), the good old block storage is just cheaper and
| more performant
|
| How is it cheaper? Object storage is cheaper per GB. Does using
| s3 have another component that is more expensive, maybe a caching
| layer? Is the storage format significantly less efficient? Are
| you not using a vpc enpoint to avoid egress charges?
| jdreaver wrote:
| You are correct that storage is cheaper in S3, but S3 charges
| per request to GET, LIST, POST, COPY, etc objects in your
| bucket. Block storage can be cheaper when you are frequently
| modifying or querying your data.
| thayne wrote:
| That's a lot of requests.
| hansvm wrote:
| It is, but it's not _that_ many. AWS pricing is
| complicated, but for fairly standard services and assuming
| bulk discounts at ~100TB level, your break-even points for
| requests/network vs storage happens at:
|
| 1. (modifications) 4200 requests per GB stored per month
|
| 2. (bandwidth) Updating each byte more than once every 70
| days
|
| You'll hit the break-even sooner, typically, since you
| incur both bandwidth and request charges.
|
| That might sound like a lot, but updating some byte in each
| 250KB chunk of your data once a month isn't that hard to
| imagine. Say each user has 1KB of data, 1% are active each
| month, and you record login data. You'll have 2.5x the
| break-even request count and pay 2.5x more for requests
| than storage, and that's only considering the mutations,
| not the accesses.
|
| You can reduce request costs (not bandwidth though) if you
| can batch them, but that's not even slightly tenable till a
| certain scale because of latency, and even when it is you
| might find that user satisfaction and retention are more
| expensive than the extra requests you're trying to avoid.
| Batching is a tool to reduce costs for offline workloads.
| nathan_jr wrote:
| Is it possible to query cloud watch to calculate the current cost
| attributed to inter AZ traffic ?
| sgarland wrote:
| My only hope is that as more and more companies find stuff like
| this out, there will be a larger shift towards on-prem / colo.
| Time is a circle.
___________________________________________________________________
(page generated 2024-12-29 23:01 UTC)