[HN Gopher] We compress Pub/Sub messages and more, saving a load...
___________________________________________________________________
We compress Pub/Sub messages and more, saving a load of money
Author : kiyanwang
Score : 32 points
Date : 2021-01-03 13:04 UTC (9 hours ago)
(HTM) web link (blog.lawrencejones.dev)
(TXT) w3m dump (blog.lawrencejones.dev)
| memetherapy wrote:
| Genuine question from someone from an entirely different world -
| why on earth do you have 10 billion log entries? What is in them
| and do you ever do anything with them that requires you to store
| so much data rather than just a representative subset?
| xeromal wrote:
| Anytime its something ridiculous like this, I assume its for
| compliance. A few industries require all info to be retained
| for 7 years.
| user5994461 wrote:
| >>> why on earth do you have 10 billion log entries?
|
| It's pretty low volume actually. A small company with < 100
| developers and servers can generate a billion logs over a few
| weeks.
|
| Normal logs from the system, syslog, applications, databases,
| web servers... nothing fancy really. It's common practice to
| centralize all these into ElasticSearch or Splunk.
|
| Their scale of 10 billion logs 60 TB means they're a regular
| small to medium company.
| Kranar wrote:
| This seems suspect, that works out to approximately 25 log
| messages per developer per second assuming a 10 hour work
| day.
|
| I work in a tightly regulated industry (finance), and even my
| company doesn't have a need to log 25 messages per second per
| person.
|
| Is anyone else able to validate this claim that regular small
| companies log this much data?
| lawrjone wrote:
| You've nailed this!
|
| This logging system was for all https://gocardless.com/
| systems. We're a B2C company which means we have different
| economies of scale than many scale-ups of our size, but you
| were close with your guess:
|
| Currently 450 people worldwide, ~150 in product development,
| of which ~100 fulltime developers.
| lawrjone wrote:
| Author here! These 10B log lines are from the last 60 days of
| activity from https://gocardless.com/ systems.
|
| It includes:
|
| - System logs, such as our Kubernetes VM host logs, or our Chef
| Postgres machines
|
| - Application logs from Kubernetes pods
|
| - HTTP and RPC logs
|
| - Audit logs from Stackdriver (we use GCP for all our
| infrastructure)
|
| > do you ever do anything with them that requires you to store
| so much data rather than just a representative subset?
|
| Some of the logs are already sampled, such as VPC flow logs,
| but the majority aim for 100% capture.
|
| Especially for application logs, which are used for audit and
| many other purposes, developers expect all of their logs to
| stick around for 60d.
|
| Why we do this is quite simple: for the amount of value we get
| from storing this data, in terms of introspection,
| observability and in some cases differentiated product
| capabilities like fraud detection, the cost of running this
| cluster is quite a bargain.
|
| I suspect we'll soon cross a threshold where keeping everything
| will cost us more than it's worth, but I'm confident we can
| significantly reduce our costs with a simple tagging system,
| where developers mark logs as requiring shorter retention
| windows.
|
| Hopefully that gives you a good answer! In case you're
| interested, my previous post mentioned how keeping our HTTP
| logs around in a queryable form was really useful for helping
| make a product decision:
|
| https://blog.lawrencejones.dev/connected-data/
| memetherapy wrote:
| Thanks for the response, really interesting to see how this
| stuff is used.
| lawrjone wrote:
| Hey all- I'm the author and just noticed this post. Thanks for
| the repost!
|
| If you're interested, there was a nice discussion in /r/devops
| about this the other day:
| https://www.reddit.com/r/devops/comments/kmltbx/how_we_compr...
| 29athrowaway wrote:
| Compression has a CPU time cost, though. You spend less on
| storage but use your CPUs more. Is the extra load from
| compression cause your cluster to autoscale? if so, you may not
| be saving money.
| lawrjone wrote:
| Author here! A few others have pointed this out, but to
| restate: in our situation, compression was costing us almost
| nothing.
|
| 60TB of logs was 60 days worth of retention, so 1TB a day. That
| means we process about 11MB/s on average, peaking at 100MB/s.
|
| A single CPU core can manage 100MB/s of compression, so if you
| assume we compress and decompress in multiple places, let's say
| we're paying about 4 CPU cores constantly for this.
|
| That's a pretty worse case scenario, and it would cost us
| $37.50/month on GCP for those cores, in order to save about
| 100x that amount.
|
| The takeaway (for me at least) is that compressing an in-flight
| transfer is almost always worthwhile if you're working in the
| cloud and the transfer will eventually be stored somewhere. The
| economics of CPU vs the total amount of data storage cost is a
| no brainer.
|
| Hope that makes sense!
| zbjornson wrote:
| FTA: "This post aimed to cover a scenario where the cost of
| compression, in both compute resource and time-to-build, was
| significantly outweighed by the savings it would make."
| dilatedmind wrote:
| my intuition is you can save cpu time when compressing before
| sending over the network (and certainly wall time).
|
| a quick test copying a 24M file (with similiar compression
| ratios) to s3 showed a 6% decrease in cpu time when piping
| through gzip.
| saurik wrote:
| Data sent to S3 is usually hashed (depending on
| authentication type) in addition to being transport
| encrypted; I imagine the majority of this cost here is the
| encryption of a larger payload (which many would consider
| indispensable, but I point this out because I do not
| generally assume this when I merely consider "over the
| network").
| user5994461 wrote:
| You assume incorrectly. SSL encryption is in the order of 1
| GB/s on a recent CPU with AES instructions (anything from
| this decade).
|
| Gzip is in the order of 10 MB/s with default settings, down
| to 1 MB/s with the strongest compression setting. It's
| really really slow.
| Kranar wrote:
| GNU gzip the application is slow on the order of 10 MB/s
| because of how it does file IO, but the DEFLATE algorithm
| that gzip is based off of is much faster than 10 MB/s at
| the default "level 6". For example the slz implementation
| of DEFLATE compresses text at 1 GB/s [1]. Even the fairly
| common zlib implementation can compress text at close to
| 300 MB/s.
|
| http://www.libslz.org/
| EricBurnett wrote:
| Depends on selected compressor, but yes, you can. I've
| definitely observed zstd-1 to be a net savings, where
| compression/decompression costs were offset by pushing fewer
| bytes through the RPC and network layers - and this was only
| from observing the endpoints, not even counting CPU on
| intermediate proxies/firewalls/etc.
|
| I wouldn't normally expect gzip to be a net savings (it's
| comparatively more expensive), but depending on compression
| ratio achieved and what layers you're passing the bytes
| through, I'd definitely believe it can be in some contexts.
| nextweek2 wrote:
| Beware of on the fly compression, it adds to network latency if
| you aren't careful. It's an important metric that gets overlooked
| in many articles on compression.
| EricBurnett wrote:
| Related tip: anywhere you're looking to deploy compression in a
| backend, consider zstd (https://facebook.github.io/zstd/). I've
| found it suitable for streaming compression (Gbps+ flows without
| exorbitant CPU usage), storage compression (high compression
| ratios at higher levels; fast decompression), and found
| implementations available to every language I've needed. It comes
| out strictly better than gzip ~across the board in my experience,
| and should be the default compressor to choose or start
| evaluations from if you have no other constraints.
|
| I don't think it's yet deployed in browsers, so I'm restricting
| my recommendation to tools and backends. IIRC brotli is worth
| considering for browsers, but I haven't deployed it myself.
| dilatedmind wrote:
| did you use any kind of message framing? Got bit by this at a
| previous job where we needed to change the message format to
| improve compression. Wound up figuring something out, but would
| have been easier if we had reserved a byte for versioning.
| lawrjone wrote:
| I'm not quite sure what you mean by message framing.
|
| If you mean marking messages as having been compressed, then
| absolutely yet. The Pub/Sub messages were tagged with
| `compress=true|false` so we could update downstream and
| upstream independently.
|
| If you mean buffering log messages into a batched 'frame', then
| yes we did do this. We were taking about 500 log entries and
| compressing them into a single message, which was part of why
| the compression was so effective.
|
| If you mean something different, then I'm at a loss and very
| interested in what you meant by the term!
___________________________________________________________________
(page generated 2021-01-03 23:02 UTC)