[HN Gopher] Binance built a 100PB log service with Quickwit
       ___________________________________________________________________
        
       Binance built a 100PB log service with Quickwit
        
       Author : samber
       Score  : 189 points
       Date   : 2024-07-11 11:57 UTC (11 hours ago)
        
 (HTM) web link (quickwit.io)
 (TXT) w3m dump (quickwit.io)
        
       | AJSDfljff wrote:
       | Unfortunate the interesting part is missing.
       | 
       | Its not hard at all to scale to PB. Junk your data based on time,
       | scale horizontally. When you can scale horizontally it doesn't
       | matter how much it is.
       | 
       | Elastic is not something i would use for scaling horizontally
       | basic logs, i would use it for live data which i need live with
       | little latency or if i do constantly a lot of log analysis live
       | again.
       | 
       | Did Binance really needed elastic or did they just start pushing
       | everything into elastic without every looking left and right?
       | 
       | Did they do any log processing and cleanup before?
        
         | fulmicoton wrote:
         | This is their application logs. They need to search into it in
         | a comfortable manner. They went for a search engine with
         | Elasticsearch at first, and Quickwit after that because even
         | after restriction the search on a tag and a time window
         | "grepping" was not a viable option.
        
           | AJSDfljff wrote:
           | Would be curious what they are searching exactly.
           | 
           | At this size and cost, aligning what you log should save a
           | lot of money.
        
             | BiteCode_dev wrote:
             | Financial institutions have to log a lot just to comply
             | with regulations, including every user activity and every
             | money flow. On an exchange that does billions of operation
             | per seconds, often with bots, that's a lot.
        
               | AJSDfljff wrote:
               | Yes but audit requirements doesn't mean you need to be
               | able to search everything very fast.
               | 
               | Binance might not have a 24/7 constant load, there might
               | be plenty of time to compact and write audit data away at
               | lower load while leveraging existing infrastructure.
               | 
               | Or extracting audit logging into binary format like
               | protobuff and writing it away highly optimized.
        
               | throwaway2037 wrote:
               | > Financial institutions have to log a lot just to comply
               | with regulations
               | 
               | Where is Binance regulated? Wiki says: "currently has no
               | official company headquarters".                   > On an
               | exchange that does billions of operation per seconds
               | 
               | Does Binance have this problem?
        
               | arandomusername wrote:
               | https://www.binance.com/en/legal/licenses
        
             | fulmicoton wrote:
             | The data is just Binance's application logs for
             | observability. Typically what a smaller business would
             | simply send to Datadog.
             | 
             | This log search infra is handled by two engineers who do
             | that for the entire company.
             | 
             | They have some standardized log format that all teams are
             | required to observe, but they have little control on how
             | much data is logged by each service.
             | 
             | (I'm quickwit CTO by the way)
        
               | AJSDfljff wrote:
               | Do they understand the difference between logs and
               | metrics?
               | 
               | Feels like they just log instead of having a separation
               | between logs and metrics.
        
           | jcgrillo wrote:
           | This position has always confused me. IME logs search tools
           | (ELK and their SaaS ilk) are _always_ far too restrictive and
           | _uncomfortable_ compared to Hadoop /Spark. I'd much rather
           | have unfettered access to the data and have to wait a couple
           | seconds for my query to return than be pigeonholed into some
           | horrible DSL built around an indexing scheme. I couldn't care
           | less about my logs queries returning in sub-second time, it's
           | just not a requirement. The fact that people index logs is
           | baffling.
        
             | fulmicoton wrote:
             | If you can limit your research to GBs of logs, I kind of
             | agree with you. It's ok if a log search request takes 100ms
             | instead of 2s, and the "grep" approach is more flexible.
             | 
             | Usually our users search into > 1TB.
             | 
             | Let's imagine you have to search into 10TB (even after
             | time/tag pruning). Distributing over 10k cores over 2
             | second is not practical and does not always economically
             | make sense.
        
               | AJSDfljff wrote:
               | The question is why would someone need search through TBs
               | of data.
               | 
               | If you are not google cloud and just have your workers
               | ready to stream all data in parallel on x amount of
               | workers in parallel, i would force usefull limitations
               | and for broad searches, i would add a background system.
               | 
               | Start your query, come back later or get streaming
               | results.
               | 
               | On the other hand, if not toooo many people search in
               | parallel constantly and you go with data pods like
               | backblaze, just add a little bit more cpu and memory and
               | use the cpu of the datapods for parallisation. Should
               | still be much cheaper than putting it on s3 / cloud.
        
               | jcgrillo wrote:
               | I guess I was a little too prescriptive with "a couple
               | seconds". What I really meant was a timescale of seconds
               | to minutes is fine, probably five minutes is too long.
               | 
               | > Let's imagine you have to search into 10TB (even after
               | time/tag pruning).
               | 
               | I'd love to know more about this. How frequently do users
               | need to scan 10TB of data? Assuming it's all on one
               | machine on a disk that supports a conservative 250MB/s
               | sequential throughout (and your grep can also run at
               | 250MB/s) that's about 11hr, so you could get it down to
               | 4min on a cluster with 150 disks.
               | 
               | But I still have trouble believing they actually need to
               | scan 10TB each time. I guess a real world example would
               | help.
               | 
               | EDIT: To be clear, I really like quickwit, and what
               | they've done here is really technically impressive! I
               | don't mean to disparage this effort on its technical
               | merits, I just have trouble understanding where the
               | impulse to index everything comes from when applied
               | specifically to the problem of logging and logs analysis.
               | It seems like a poor fit.
        
             | esafak wrote:
             | It sounds like you are doing ETL on your logs. Most people
             | want to search them when something goes wrong, which means
             | indexing.
        
               | jcgrillo wrote:
               | No, what I'm doing is analysis on logs. That could be as
               | simple as "find me the first N occurrences of this
               | pattern" (which you might call search) but includes
               | things like "compute the distribution of request
               | latencies for requests affected by a certain bug" or
               | "find all the tenants impacted by a certain bug, whose
               | signature may be complex and span multiple services
               | across a long timescale".
               | 
               | Good luck doing that in a timely manner with Kibana.
               | Indexed search is completely useless in this case, and it
               | solves a problem (retrieval latency) I don't (and, I
               | claim, you don't) have.
               | 
               | EDIT: another way to look at this is the companies I've
               | worked at where I've been able to actually do detailed
               | analysis on the logs (they were stored sensibly such that
               | I could run mapreduce jobs over them) I never reached a
               | point where a problem was unsolvable. These days where
               | we're often stuck with a restrictive "logs search
               | solution as a service" I often run into situations where
               | the answer simply isn't obtainable. Which situation is
               | better for customers? I guess cynically you could say
               | being unable to get to the bottom of an issue keeps me
               | timeboxed and focused on feature development instead of
               | fixing bugs.. I don't think anyone but the most craven
               | get-rich-quick money grubber would actually believe
               | that's better though.
        
       | randomtoast wrote:
       | I wonder how much their setup costs. Naively, if one were to
       | simply feed 100 PB into Google BigQuery without any further
       | engineering efforts, it would cost about 3 million USD per month.
        
         | AJSDfljff wrote:
         | Good question. I thought it would be a no brainer to put it on
         | s3 or similiar but thats already way to expensive at 2m/month
         | without api requests.
         | 
         | Backplace storage pods are an initial investment of 5 Million,
         | thats probably the best bet you could do and on that savings
         | level, having 1-3 good people dedicated to this is probably
         | still cheaper.
         | 
         | But you could / should start talking to the big cloud providers
         | to see if they are flexible enough going lower on the price.
         | 
         | I have seen enough companies, including big ones, being absolut
         | shitty in optimizing these types of things. At this level of
         | data, i would optimize everyting including encoding, date
         | format etc.
         | 
         | But i said it in my other comment: the interesting questions
         | are not answered :D
        
           | orf wrote:
           | The compressed size is 20pb, so it's about 500k per month in
           | S3 fees
        
             | francoismassot wrote:
             | Indeed. They benefit from a discount, but we don't know the
             | discount figure.
             | 
             | To further reduce the storage costs, you can use S3 Storage
             | Classes or cheaper object storage like Alibaba for longer
             | retention. Quickwit does not handle that, so you need to
             | handle this yourself, though.
        
               | AJSDfljff wrote:
               | I would probably build my own storage pods, keep a day or
               | a week on cloud and move everything over every night.
        
               | jcgrillo wrote:
               | Logs should compress better than that, though, right? 5:1
               | compression is only about half as good as you'd expect
               | even naive gzipped json to achieve, and even that is an
               | order of magnitude worse than the state of the art for
               | logs[1]. What's the story there?
               | 
               | [1] https://news.ycombinator.com/item?id=40938112
        
         | Aurornis wrote:
         | They provide some big hints about the number of vCPUs and the
         | size of the compressed data set on S3:
         | 
         | > Size on S3 (compressed): 20 PB
         | 
         | There are also charts about vCPUs and RAM for the indexing and
         | searching clusters.
        
         | gaogao wrote:
         | Yeah, doing some preferred cloud Data Warehouse with an
         | indexing layer seems fine for this sort of thing. That has an
         | advantage over something specialized like this of still being
         | able to easily do stream processing / Spark / etc, plus
         | probably saves some money.
         | 
         | Maybe Quickwit is that indexing layer in this case? I haven't
         | dug too much into the general state of cloud dw indexing.
        
           | fulmicoton wrote:
           | Quickwit is designed to do full-text search efficiently with
           | an index stored on an object storage.
           | 
           | There are no equivalent technology, apart maybe:
           | 
           | - Chaossearch but it is hard to tell because they are not
           | opensource and do not share their internals. (if someone from
           | chaossearch wants to comment?)
           | 
           | - Elasticsearch makes it possible to search into an index
           | archived on S3. This is still a super useful feature as a way
           | to search punctually into your archived data, but it would be
           | too slow and too expensive (it generates a lot of GET
           | requests) to use as your everyday "main" log search index.
        
             | BiteCode_dev wrote:
             | Click house does have it, but it's experimental.
        
         | Daviey wrote:
         | "Object storage as the primary storage: All indexed data
         | remains on object storage, removing the need for provisioning
         | and managing storage on the cluster side."
         | 
         | So the underlying storage is still Object storage, so base that
         | around your calculations depending if you are using S3, GCP
         | Object Storage, self hosted Ceph, MinIO, Garage or SeaweedFS.
        
         | onlyrealcuzzo wrote:
         | A lot.
         | 
         | 1PB with triple redundancy costs around ~$20k just in hard
         | drive costs per year. That's ~$2.5M per year just in disks.
         | 
         | I'd be impressed if they're doing this for less than $1.5M per
         | month (including SWE costs).
         | 
         | Obviously, if they can, saving $1.5M a month vs BigQuery seems
         | like maybe a decent reason to DIY.
        
           | BiteCode_dev wrote:
           | Why per year? If they buy their own server, they keep the
           | disk several years.
           | 
           | The money motivation to self host on bare metal at this scale
           | is huge.
        
             | onlyrealcuzzo wrote:
             | > Why per year? If they buy their own server, they keep the
             | disk several years.
             | 
             | The cost per year is much higher - that's using a 5-year
             | amortization.
        
               | BiteCode_dev wrote:
               | Seems high.
               | 
               | You can get a spinning disk of 18TB (not need for SSD if
               | you can parallel write) for 224EUR. Let's round that to
               | $300 for easy calculations.
               | 
               | To store 100 petabytes of data by purchasing disks
               | yourself, you would need approximately 5556 18TB hard
               | drives totaling $1,666,800.
               | 
               | Of course, you'll pay more than the disks.
               | 
               | Let's add the cost of 93 enclosures at $3,000 each
               | ($279,000), and accounting for controllers, network
               | equipment ($100,000), and power and cooling
               | infrastructure ($50,000, although it's probably already
               | cool where they will host the thing), that would be a
               | about $2.1 M.
               | 
               | That's total, and that's for the uncompressed data.
               | 
               | You would need 3 times that for redundancy, but it would
               | still be 40% cheaper over 5 years, not to mention I used
               | retail price. With their purchasing power they can get a
               | big discount.
               | 
               | Now, you do have the cost of having a team to maintain
               | the whole thing but they likely have their own data
               | center anyway if they go that route.
        
               | JackSlateur wrote:
               | HDD have terrible IOPS
        
               | ddorian43 wrote:
               | > disk of 18TB (not need for SSD if you can parallel
               | write)
               | 
               | Do note that you can put, like, at most?, 1TB of hot/warm
               | data on this 18TB drive.
               | 
               | Imagine you do a query, and 100GB of the data to be
               | searched are on 1 HDD. You will wait 500s-1000s just for
               | this hard drive. Imagine a bit higher concurrency with
               | searching on this HDD, like 3 or 5 queries.
               | 
               | You can't fill these drives full with hot or warm data.
               | 
               | > To store 100 petabytes of data by purchasing disks
               | yourself, you would need approximately 5556 18TB hard
               | drives totaling $1,666,800.
               | 
               | You want to have 1000x more drives and only fill 1/1000
               | of them. Now you can do a parallel read!
               | 
               | > You would need 3 times that for redundancy
               | 
               | With erasure coding you need less, like 1.4x-2x.
        
               | pas wrote:
               | quickwit seems to be designed such that it prefers to
               | talk S3 to a sweet storage subsystem, so by running Ceph
               | you can shuffle your data around evenly
        
               | rthnbgrredf wrote:
               | For this purpose you would likely not buy ordinary
               | consumer disks but rather bullet proof enterprise HDDs.
               | Otherwise a signifcant amount of the 5556 disks would not
               | survive the first year, assuming the are under constant
               | load.
        
               | pas wrote:
               | quickwit's big advantage is that you can target it at
               | something that speaks S3 and it will be happy. so ideally
               | you delegate the whole storage story by hiring someone
               | who knows their way around Ceph (erasure coding, load
               | distribution) and call a few DC/colo/hosting providers
               | (initial setup and the regular HW replacements).
        
             | TestingWithEdd wrote:
             | Remember they'd want to run raid, maybe have backups, and
             | manage disk failure. At that size it'll be a daily event
             | (off the top of my head).
        
           | the_arun wrote:
           | DIY also comes with the cost of managing it. We need a team
           | to maintain, bug fix etc., not hard but cost
        
         | francoismassot wrote:
         | Good question.
         | 
         | Let's estimate the costs of compute.
         | 
         | For indexing, they need 2800 vCPUs[1], and they are using c6g
         | instances; on-demand hourly price is $0.034/h per vCPU. So
         | indexing will cost them around $70k/month.
         | 
         | For search, they need 1200 vCPUs, it will cost them around
         | $30k/month.
         | 
         | For storage, it will cost them $23/TB * 20000 = $460k/month.
         | 
         | Storage costs are an issue. Of course, they pay less than
         | $23/TB but it's still expensive. They are optimizing this
         | either by using different storage classes or by moving data to
         | cheaper cloud providers for long term storage (less requests
         | mean you need less performant storage and usually you can get a
         | very good price on those object storages).
         | 
         | On quickwit side, we will also improve the compression ratio to
         | reduce the storage footprint.
         | 
         | [1]: I fixed the num vCPUs number of indexing, it was written
         | 4000 when I published the post, but it corresponded to the
         | total number of vCPUs for search and indexing.
        
           | rcaught wrote:
           | Savings plans, spot, EDP discounts. Some of these have to be
           | applied, right?
        
             | Onavo wrote:
             | At this level they can just go bare metal or colo. Use
             | Hetzner's pricing as reference. Logs don't need the same
             | level of durability as user data, some level of failure is
             | perfectly fine. I would estimate 100k per month or less,
             | maximum 200K.
        
       | endorphine wrote:
       | What would you use for storing and querying long-term audit logs
       | (e.g. 6 months retention), which should be searchable with
       | subsecond latency and would serve 10k writes per second?
       | 
       | AFAICT this system feels like a decent choice. Alternatives?
        
         | jjordan wrote:
         | NATS?
        
           | packetlost wrote:
           | NATS doesn't really have advanced query features though. It
           | has a lot of really nice things, but advanced querying isn't
           | one of them. Not to mention I don't know if NATS does well
           | with large datasets, does it have sharding capability for
           | it's KV and object stores?
        
             | Zambyte wrote:
             | I use NATS at work, and I have had the privilege to speak
             | with some of the folks at Synadia about this stuff.
             | 
             | Re: advanced querying: the recommended way to do this is to
             | build an index out of band (like Redis (or a fork) or
             | SQLite or something) that references the stored messages by
             | sequence number. By doing that, your index is just this
             | ephemeral thing that can be dynamically built to exactly
             | optimize for the queries you're using it for.
             | 
             | Re: sharding: no, it doesn't support simple sharding. You
             | can achieve sharding by standing up multiple NATS
             | instances, and making a new stream (KV and object store are
             | also just streams) on each instance, and capture some
             | subset of the stream on each instance. The client (or
             | perhaps a service querying on behalf of the client) would
             | have to me smart enough to be able to mux the sources
             | together.
        
               | packetlost wrote:
               | Does it handle clustering/redundancy for the data stored
               | in KV/object store? My intuition says yes because I
               | believe it supports it at the "node" level
        
               | Zambyte wrote:
               | Yes. When you create a stream (including a KV or object
               | store) you say what cluster you want to put it on, and
               | how many replicas you want it to have.
        
         | bojanz wrote:
         | You'll find many case studies about using Clickhouse for this
         | purpose.
        
           | hipadev23 wrote:
           | Do you know any specific case studies for unstructured logs
           | on clickhouse?
           | 
           | I think achieving sub-second read latency of adhoc text
           | searching over ~150B rows of unstructured data is going to be
           | quite challenging without a high cost. Clickhouse's inverted
           | indices are still experimental.
           | 
           | If the data can be organized in a way that is conducive to
           | the searching itself, or structured it into columns, that's
           | definitely possible. Otherwise I suppose a large number of
           | CPUs (150-300) to split the job and just brute force each
           | search?
        
             | buzer wrote:
             | There is at least
             | https://news.ycombinator.com/item?id=40936947 though it's a
             | bit of mixed in terms how they handle schema.
        
               | SSLy wrote:
               | not sure if an excellent joke or a honest mistake
        
         | SSLy wrote:
         | What if I don't have such latency requirements? I'm willing to
         | trade that for flexibility or anything else
        
         | AJSDfljff wrote:
         | I would question first if the system needs to search with
         | subsecond latency and if the same system needs to be which can
         | handle 10k writes/sec.
         | 
         | Even google cloud and others let you wait for longer search
         | queries. If not business ciritical, you can definitly wait a
         | bit.
         | 
         | And the write system might not need to write it in the
         | endformat. Especially as it also has to handle transformation
         | and filtering.
         | 
         | Nonetheless, as mentioned in my other comment, the interesting
         | details of this is missing.
        
           | endorphine wrote:
           | Let's say that it powers a "search logs" page that an end
           | user wants to see. And let's say that they want last 1d, 14d,
           | 1m, 6m.
           | 
           | So subsecond I would say is a requirement.
           | 
           | And no, it doesn't have to be the same system that
           | ingests/indexes the logs.
        
         | jakjak123 wrote:
         | 10k audit logs per sec? I think we have different definitions
         | of audit logs.
        
       | kstrauser wrote:
       | How? If Binance had a trillion transactions, that's 100KB per
       | transaction. What all are they logging?
        
         | francoismassot wrote:
         | They have 181 trillion logs
        
           | kstrauser wrote:
           | But of _what_? What has Binance done 181 trillion times?
           | 
           | Obviously they _have_. I don't think they're throwing away
           | money for logs they don't generate or need. I just can't
           | imagine the scope of it.
           | 
           | That is, I know this is a failing of my imagination, not
           | their engineering decisions. I'd love to fill in my knowledge
           | gaps.
        
             | xboxnolifes wrote:
             | They are application logs, so probably nearly every click
             | on their website.
        
             | robxorb wrote:
             | If it's 181 trillion each year, it's only 6 million per
             | second. There's a thousand milliseconds in each second so
             | Binance would need only several thousand high frequency
             | traders creating, and adjusting orders, through their API,
             | to end up with those logs.
             | 
             | Binance has hundreds of trading pairs available so a
             | handful on each pair average would add up.
        
         | askl wrote:
         | Don't forget the logs produced by the logging infrastructure.
        
           | kstrauser wrote:
           | And what if _that_ infra goes down? Who 's watching that?
        
         | TacticalCoder wrote:
         | Yeah I don't get it either, something seems deeply wrong here.
         | 
         | These are impressive numbers but I wonder about something...
         | Binance is a centralized cryptocurrencies exchange. But AIUI
         | "defi" is a thing: instead of using a CEX (Centralized
         | EXchange), people can use a DEX (Decentralized EXchange).
         | 
         | And apparently there's a gigantic number of trades happening in
         | the defi world. And it's all happening on public ledgers right?
         | (I'm asking, I don't know: are the "level 2" chains public?).
         | 
         | And the sum of all the public ledgers / blockchains
         | transactions do not represent anywhere near 1.6 PB a day.
         | 
         | And yet DEXes do work (and often now, seen that volume picked
         | up, have liquidity and fees cheaper than centralized
         | exchanges).
         | 
         | Someone who know this stuff better than I do could comment but
         | from a quick googling here's what I found:                   -
         | a full Ethereum node today is 1.2 TB (not PB, but TB)         -
         | a full Bitcoin node today is 585 GB
         | 
         | There are other blockchains but these two are the two most
         | successful ones?
         | 
         | So let's take Ethereum... For 1.2 TB you have the history of
         | all the transactions that ever happened on Ethereum, since 2015
         | or something. And that's not just Ethereum but also all the
         | "tokens" and "NFTs" on ethereum, including stablecoins like
         | Circle/Coinbase's USDC.
         | 
         | How do we go from a decentralized "1.2 TB for the entire
         | Ethereum history" to "1.6 PB _per day_ for Binance transactions
         | "?
         | 
         | That's 1500x the size of the freaking entire Ethereum
         | blockchain generated by logs, in a day.
         | 
         | And Ethereum is, basically, a public ledger. So it is... Logs?
         | 
         | Or let's compared Binance's numbers to, say, the US equities
         | options market. That feed is a bit less than 40 Gb/s I think,
         | so 140 TB for a day of actual equities options trading
         | datafeed.
         | 
         | I understand these aren't "logs" but, same thing...
         | 
         | How do we go from a daily 140 TB datafeed from the CBOE (where
         | the big guys are and the real stuff if happening) to 10x that
         | amount in daily Binance logs.
         | 
         | Something doesn't sound right.
         | 
         | You can say it's apples vs oranges but I don't it's that much
         | of apples vs oranges.
         | 
         | I mean: if these 1.6 PB of Binance logs per day are justified,
         | it makes me think it's time to sell everything and go all-in in
         | cryptocurrencies, because there may be _way_ more interest and
         | activity than people think. (yeah, I 'm kidding)
         | 
         | EDIT: 1.2 TB for Ethereum seems to be a full but not an
         | "archive" node. An "archive" node is 6 TB. Still a far cry from
         | 1.6 PB a day.
        
           | arandomusername wrote:
           | Archive node can go as low as ~2TB depending on the client,
           | but regardless you do want to compare it to a full node, as
           | the only difference is an "archive" node keeps the state at
           | any block (e.g you want to replay a transaction that occured
           | in the past, or see a state at a certain block). The full
           | nodes still contain all events/txs (DEXes will emit events,
           | akin to logs, for transfers) - so it's fair to compare it to
           | a full node.
           | 
           | On CEXes you see a lot more trades, market makers/high
           | frequency trades just doing a ton of volume/creating orders
           | and so on, since it's a lot cheaper. CEXes without a doubt
           | have more trades than DEX by orders of magnitude.
           | 
           | Additionally, they probably log multiple stuff for every
           | request, including API request, which they most definitely
           | get a ton of.
        
           | tommek4077 wrote:
           | Defi is a fraction of whats happening on cefi exchanges. I
           | have a customer placing hundreds of billions of orders per
           | day on binance. And they are by far not the biggest players
           | there.
        
         | tommek4077 wrote:
         | High frequency traders are making hundreds of billions of
         | orders per day. And there are many bigger and smaller players.
        
       | KaiserPro wrote:
       | A word of caution here: This is very impressive, but almost
       | entirely wrong for your organisation.
       | 
       | Most log messages are useless 99.99% of the time. Best likely
       | outcome is that its turned into a metric. The once in the blue
       | moon outcome is that it tells you what went wrong when something
       | crashed.
       | 
       | Before you get to shipping _petabytes_ of logs, you really need
       | to start thinking in metrics. Yes, you should log errors, you
       | should also make sure they are stored centrally and are
       | searchable.
       | 
       | But logs shouldn't be your primary source of data, metrics should
       | be.
       | 
       | things like connection time, upstream service count, memory
       | usage, transactions a second, failed transactions,
       | upsteam/downstream end point health should all be metrics emitted
       | by your app(or hosting layer), directly. Don't try and derive it
       | from structured logs. Its fragile, slow and fucking expensive.
       | 
       | comparing, cutting and slicing metrics across processes or even
       | services is simple, with logs its not.
        
         | p-o wrote:
         | I would say from my experience, for _application logs_, it's
         | the exact opposite. When you deal with a few GB/day of data,
         | you want to have logs, and metrics can be derived from those
         | logs.
         | 
         | Logs are expensive compared to metrics, but they convey a lot
         | more information about the state of your system. You want to
         | move towards metrics over time only one hotspot at a time to
         | reduce cost while keeping observability of your overall system.
         | 
         | I'll take logs over metrics any day of the week, when cost
         | isn't prohibitive.
        
           | KaiserPro wrote:
           | I was at a large financial news site, They were a total
           | splunk shop. We had lots real steel machines shipping and
           | chunking _loads_ of logs. Every team had a large screen
           | showing off key metrics. Most of the time they were badly
           | maintained and broken, so only the _really_ key metrics
           | worked. Great for finding out what went wrong, terrible at
           | alerting when it went wrong.
           | 
           | However, over the space of about three years we shifted
           | organically over to graphite+grafana. There wasn't a top down
           | push, but once people realised how easy it was to make a
           | dashboard, do templating and generally keep things working,
           | they moved in droves. It also helped that people put metrics
           | emitting system into the underlying hosting app library.
           | 
           | What really sealed the deal was the non-tech business owners
           | making or updating dashboards. They managed to take pure tech
           | metrics and turn them into service/business metrics.
        
             | p-o wrote:
             | It's fair that you had a different experience than I had.
             | However, your experience seems to be very close to what I
             | was describing. Cost got prohibitive (splunk), and you
             | chose a different avenue. It's totally acceptable to do
             | that, but your experience doesn't reflect mine, and I don't
             | think I'm the exception.
             | 
             | I've used both grafana+metrics and logs to different
             | degrees. I've enjoyed using both, but any system I work on
             | starts with logs and gradually add metrics as needed, it
             | feels like a natural evolution to me, and I've worked at
             | different scale, like you.
        
             | hanniabu wrote:
             | I feel like I shouldn't need to mention this, but comparing
             | a news site to a financial exchange with money at stake is
             | not the same. If there is a glitch you need to be able to
             | trace it back and you can't do that with some abstracted
             | metrics.
        
               | pixl97 wrote:
               | Yea, on a news site, the metrics are important. If
               | suddenly you start seeing errors accrue above background
               | noise and it's affecting a number of people you can act
               | on it. If it's affecting one user, you probably don't
               | give a shit.
               | 
               | In finance if someone puts and entry for 1,000,000,000
               | and it changes to 1,000,000 the SEC, fraud investigators,
               | lawyers, banks, and some number of other FLAs are shining
               | a flashlight up your butt as to what happened.
        
               | KaiserPro wrote:
               | right, and the SEC see that you're mixing verbose k8s
               | logging with financial records, you're going to get a
               | bollocking.
        
               | KaiserPro wrote:
               | You are misreading me.
               | 
               | I'm not saying that you can't log, I'm saying that
               | logging _everything_ on debug in an unstructured way and
               | then hoping to devine a signal from it, is madness. You
               | will need logs, as they eventually tell you _what_ went
               | wrong. But they are very bad at telling you that
               | something is going wrong _now_.
               | 
               | Its also exceptionally bad at allowing you quickly
               | pinpointing _when_ something changed.
               | 
               | Even in a logging only environment, you get an alert, you
               | look at the graphs, then dive into the logs. The big
               | issue is that those metrics are out of date, hard to
               | derrive and prone to breaking when you make changes.
               | 
               | verbose logging is not a protection in a financial
               | market, because if something goes wrong you'll need to
               | process those logs for consumption by a third party.
               | You'll then have to explain why the format changed three
               | times in the two weeks leading up to that event.
               | 
               | Moreover you will need to seperate the money audit trail
               | from the verbose application logs, ideally at source. as
               | its "high value data" you can't be mixing those stream
               | _at all_
        
             | david38 wrote:
             | This.
             | 
             | I was an engineer at Splunk for many years. I knew it cold.
             | 
             | I then joined a startup where they just used metrics and
             | the logs TTLed out after just a week. They were just used
             | for short term debugging.
             | 
             | The metrics were easier to put in, keep organized, make
             | dashboards from, lighter, cheaper, better. I had been doing
             | it wrong this whole time.
        
               | brabel wrote:
               | > the logs TTLed out after just a week
               | 
               | "expired" is the word you're looking for.
        
           | lmpdev wrote:
           | I'm not well versed in QA/Sysadmin/Logs but surely metrics
           | suffer from Simpson's paradox compared to properly probed
           | questions only answered through having access to the entirety
           | of the logs?
           | 
           | If you average out metrics across all log files you're
           | potentially reaching false or worse inverse conclusions about
           | multiple distinct subsets of the logs
           | 
           | It's part of the reason why statisticians are so pedantic
           | about the wording of their conclusions and to which
           | subpopulation their conclusions actually apply to
        
           | FridgeSeal wrote:
           | > Logs are expensive compared to metrics, but they convey a
           | lot more information about the state of your system.
           | 
           | My experience has been the kind of opposite.
           | 
           | Yes, you can put more fields in a log, and you can nest
           | stuff. In my experience however, attics tend to give me a
           | clearer picture into the overall state (and behaviour) of my
           | systems. I find them easier and faster to operate, easier to
           | get an automatic chronology going, easier to alert on, etc.
           | 
           | Logs in my apps are mostly relegated to capturing warning
           | error and error states for debugging reference as the metrics
           | give us a quicker and easier indicator of issues.
        
         | zarathustreal wrote:
         | I've got to disagree here, especially with memoization and
         | streaming, deriving metric from structured logs is extremely
         | flexible, relatively fast, and can be configured to be as cheap
         | as you need it to be. With streaming you can literally run your
         | workload on a raspberry pi. Granted, you need to write the code
         | to do so yourself, most off-the-shelf services probably are
         | expensive
        
           | KaiserPro wrote:
           | > memoization and streaming,
           | 
           | memoization isn't free in logs, you're basically deduping an
           | unbounded queue and its difficult to scale from one machine.
           | Its both CPU and Memory heavy. I mean sure you can use scuba,
           | which is great, but that's basically a database made to look
           | like a log store.
           | 
           | > deriving metric from structured logs is extremely flexible
           | 
           | Assuming you can actually generate structured logs reliably.
           | but even if you do, its really easy to silently break it.
           | 
           | > With streaming you can literally run your workload on a
           | raspberry pi
           | 
           | no, you really can't. Streaming logs to a centralised place
           | is exceptionally IO heavy. If you want to generate metrics
           | from it, its CPU heavy as well. If you need speed, then
           | you'll also need lots of RAM, otherwise searching your logs
           | will cause logging to stop. (either because you've run out of
           | CPU, or you've just caused the VFS cache to drop because
           | you're suddenly doing no predictable IO. )
           | 
           | greylog exists for streaming logs. hell, even rsyslog does
           | it. Transporting logs is fairly simple, storing and
           | generating signal from it is very much not.
        
         | reisse wrote:
         | Metrics are only good when you can disregard some amount of
         | errors without investigation. But they're a financial
         | organization, they have a certain amount of liability.
         | Generalized metrics won't help to understand what happened to
         | that one particular transaction that failed in a cumbersome way
         | and caused some money to disappear.
        
           | fells wrote:
           | It's always struck me that these are two wildly different
           | concerns though.
           | 
           | Use metrics & SLOs to help diagnose the health of your
           | systems. Derive those directly from logs/traces, keep a
           | sample of the raw data, and now you can point any alert to
           | the sampled data to help go about understanding a client-
           | facing issue.
           | 
           | But, for auditing of a particular transaction, you don't need
           | full indexing of the events? You need a transactional journal
           | for every account/user, likely with a well-defined schema to
           | describe successful changes and failed attempts. Perhaps
           | these come from the same stream of data as the observability
           | tooling, but I can only imagine it must be a much smaller
           | subset of the 100PB that you can avoid doing full inverse
           | indexes on this, because your search pattern is simply
           | answering "what happened to this transaction?"
        
             | alexchantavy wrote:
             | > You need a transactional journal for every account/user,
             | likely with a well-defined schema to describe successful
             | changes and failed attempts.
             | 
             | Sounds like a row in a database to me.
             | 
             | Dumb question, but is that how structured log systems are
             | implemented?
        
             | renewiltord wrote:
             | The reality is that when their service delays something
             | they owe us tens to hundreds of thousands of dollars. This
             | is the tool they're using but if they can't even get a
             | precise notion of when a specific request arrived at their
             | gateway they're in trouble.
        
           | KaiserPro wrote:
           | You can still have logs. What I'm suggesting is that _vast_
           | amounts of unstructured logs, are worse than useless.
           | 
           | Metics tell you where and when something when wrong. Logs
           | tell you why.
           | 
           | However, a logging framework, which is generally lossy, and
           | has the lowest level of priority in terms of deliverability
           | is not an audit mechanism. especially as nowhere are ACLs or
           | verifiability is mentioned. How do they prove that those logs
           | originates from that machine?
           | 
           | If you're going to have an audit mechanism, some generic
           | logging framework is almost certainly a bad fit.
        
             | nostrebored wrote:
             | Why would you assume they're unstructured?
             | 
             | Even at very immature organizations, log data within a
             | service is usually structured.
             | 
             | Even in my personal projects if I'm doing anything parallel
             | structured logging is the first helper function I write. I
             | don't think I'm unrepresentative here.
        
               | KaiserPro wrote:
               | Because most logs are unstructured.
               | 
               | > Even at very immature organizations, log data within a
               | service is usually structured.
               | 
               | unless the framework provides it by default, I've never
               | seen this actually happen in real life. Sure I've seen a
               | lot of custom telegraf configs, status end points and the
               | like, but never actual working structured logging.
               | 
               | When I have seen structure logs, each team did it
               | differently, The "ontology" was different (protip: if
               | you're ever discussing ontology in logging then you might
               | as well scream and run away.)
        
             | sangnoir wrote:
             | > You can still have logs. What I'm suggesting is that vast
             | amounts of unstructured logs, are worse than useless
             | 
             | Until you need them, the you'd trade anything to get them.
             | Logs are like backups, you don't need them most of the
             | times, but when you need them, you _really_ need them.
             | 
             | On the flip side, the tendency is to over-log "just in
             | case". A good compromise is to allocate a per-project
             | storage budget for logs with log expiration, and let the
             | ones close to the coal-face figure out how they use their
             | allocation.
        
           | pavlov wrote:
           | _> "But they're a financial organization, they have a certain
           | amount of liability."_
           | 
           | In the loosest possible sense. Binance is an organization
           | that pretended it doesn't have any physical location in any
           | jurisdiction. Its founder is currently in jail in the United
           | States.
        
         | _boffin_ wrote:
         | Hogwash. I'll agree that it's not as simple with logs, but
         | amazingly powerful, and even more so with distributed tracing.
         | 
         | They both have their places and are both needed.
         | 
         | Without logs, I would not have been able to pinpoint multiple
         | issues that plagued our systems. With logs, we were able to
         | tell google, Apigee, it was there problem, not ours. With
         | tracing, we were able to tell a legacy team they had an issue
         | and was able to pinpoint it after them telling us for 6 months
         | that it was our fault. Without logging and tracing, we wouldn't
         | have been able to tell our largest client, that we never
         | received a 1/3 of their requests they sent us as our company
         | was running around frantically.
         | 
         | They're both needed, but for different things...ish.
        
           | KaiserPro wrote:
           | You're missing my main point: logs should not be your primary
           | source of information.
           | 
           | > Without logs, I would not have been able to pinpoint
           | multiple issues that plagued our systems.
           | 
           | Logs are great for finding out _what_ went wrong, but
           | terrible at telling there _is_ a problem. This is what I mean
           | by primary information source. If you are sifting through TBs
           | logs to pinpoint a issue, it sucks. Yes, there are tools, but
           | its still hard.
           | 
           | Logs are shit for deriving metrics, it usually requires some
           | level of bespoke processing which is easy to break silently,
           | especially for rarer messages.
        
             | _boffin_ wrote:
             | > You're missing my main point: logs should not be your
             | primary source of information.
             | 
             | I think you're missing my point. They're both needed.
             | Metrics are outside blackbox and logs are inside -- they're
             | both needed. I don't recall saying that logs should be the
             | primary source.
             | 
             | > Logs are shit for deriving metrics, it usually requires
             | some level of bespoke processing which is easy to break
             | silently, especially for rarer messages.
             | 
             | Truthfully, you're probably just doing it wrong if you
             | can't derive actionable metrics from logs / tracing. I'm
             | willing to hear you out though. Are you using structured
             | logs? if so, please tell me more how you're having issues
             | deriving metrics from those. if not, that's your first
             | problem.
             | 
             | > logs are great for finding out what went wrong, but
             | terrible at telling there is a problem
             | 
             | see prior comment.
        
               | KaiserPro wrote:
               | > Truthfully, you're probably just doing it wrong if you
               | can't derive actionable metrics from logs
               | 
               | I have ~200 services, each composed of many sub services,
               | each made up of a number of processes. something like
               | 150k processes.
               | 
               | Now, we are going to ship all those logs, where every
               | transaction emits something like 500-2000 bytes of data.
               | Storing that is easy, evne storing it in a structured way
               | is easy. making sure we don'y leak PII is a lot harder,
               | so we have to have fairly strict ACLs.
               | 
               | now, I want process them to generate metrics and then
               | display them. But that takes a _lot_ of horse power.
               | Moreover when I want to have metrics for more than a week
               | or so, the amount of data I have to process grows
               | linearly. I also need to back up that data, and derived
               | metrics. We are looking at a large cluster just for
               | processing.
               | 
               | Now, if we make sure that our services emit metrics for
               | all useful things, the infra for recording, processing
               | and displaying that is much smaller, maybe two/three
               | instances. Not only that but custom queries are way
               | quicker, and much more resistant to PII leaking. Just
               | like structured logging, it does require some dev effort.
               | 
               | At no point is it _impossible_ to use logs as the data
               | store/transport, its just either fucking expensive,
               | fragile, or dogshit slow.
               | 
               | or to put it another way:
               | 
               | old system == >PS1million in licenses and servers
               | (yearly)
               | 
               | metric system == PS100k in licenses and servers + PS12k
               | for the metrics servers (yearly)
        
         | pawelduda wrote:
         | With logs you can get an idea of what events happened in what
         | order during some complex process, stretched over long
         | timeframe, and so on. I don't think you can do this with a
         | metric
        
           | KaiserPro wrote:
           | > With logs you can get an idea of what events happened in
           | what order
           | 
           | Again, if you're at that point, you need logs. But thats
           | never going to be your primary source of information. if you
           | have more than a few services running at many transactions a
           | second, you can't scale that kind of understanding using
           | logs.
           | 
           | This is my point, if you have >100 services, each with many
           | tens or hundreds of processes, your primary (well it
           | shouldn't be, you need pre SLA fuckup alerts)alert to
           | something going wrong is something breeching an SLA. That's
           | almost certainly a metric. Using logs to derive that metric
           | means you have a latency of 60-1500 seconds
           | 
           | Getting your apps to emit metrics directly means that you are
           | able to make things much more observable. It also forces your
           | devs to think about _how_ their app is observed.
        
         | derefr wrote:
         | I would note that a notional "log store", doesn't have to just
         | be used for things that are literally "logs."
         | 
         | You know what else you could call a log store? A CQRS/ES event
         | store.
         | 
         | (Specifically, a "log store" is a CQRS/ES event store that just
         | so happens to _also_ remember a primary-source textual
         | representation for each structured event-document it ingests --
         | i.e. the original  "log line" -- so that it can spit "log
         | lines" back out unchanged from their input form when asked. But
         | it might not even have this feature, if it's a _structured_ log
         | store that expects all  "log lines" to be "structured logging"
         | formatted, JSON, etc.)
         | 
         | And you know what the most important operation a CQRS/ES event
         | store performs is? A continuous streaming-reduction over
         | _particular filtered subsets of the events_ , to compute CQRS
         | "aggregates" (= live snapshot states / incremental state
         | deltas, which you then continuously load into a data warehouse
         | to power the "query" part of CQRS.)
         | 
         | Most CQRS/ES event stores are built atop _message queues_ (like
         | Kafka), or _row-stores_ (like Postgres). But neither are
         | actually very good backends for powering the  "ad-hoc-filtered
         | incremental large-batch streaming" operation.
         | 
         | * With an MQ backend, _streaming_ is easy, but MQs maintain no
         | indices for events per se, just copies of events in different
         | topics; so _filtered_ streaming would either have the filtering
         | occur mostly client-side; or would involve a bolt-on component
         | that is its own  "client-side", ala Kafka Streams. You _can_
         | use topics for this -- but only if you know exactly what
         | reduction event-type-sets you 'll need _before_ you start
         | publishing any events. Or if you 're willing to keep an
         | archival topic of every-event-ever online, so that you can
         | stream over it to retroactively build new filtered topics.
         | 
         | * With a row-store backend, filtered streaming without pre-
         | indexing is _tenable_ -- it 's a query plan consisting of a
         | primary-key-index-directed seq scan with a filter node. But
         | it's still a lot more expensive than it'd be to just be
         | streaming through a flat file containing the same data, since a
         | seq scan is going to be reading+materializing+discarding all
         | the rows that don't match the filtering rule. You _can_ create
         | (partial!) indices to avoid this -- and nicely-enough, in a
         | row-store, you can do this retroactively, once you figure out
         | what the needs of a given reduction job are. But it 's still a
         | DBA task rather than a dev task -- the data warehouse needs to
         | be tweaked to respond to the needs of the app, _every time the
         | needs of the app change_. (I would also mention something about
         | schema flexibility here, but Postgres has a JSON column type,
         | and I presume CQRS /ES event-store backends would just use
         | that.)
         | 
         | A CQRS/ES event store built atop a fully-indexed document store
         | / "index store" like ElasticSearch (or Quickwit, apparently)
         | would have all the same advantages of the RDBMS approach, but
         | wouldn't require any manual index creation.
         | 
         | Such a store would perform as if you took the RDBMS version of
         | the solution, and then wrote a little insert-trigger stored-
         | procedure that reads the JSON documents out of each row, finds
         | any novel keys in them, and creates a new partial index for
         | each such novel key. (Except with much lower storage-overhead
         | -- because in an "index store" all the indices share data; and
         | much better ability to combine use of multiple "indices", as in
         | an "index store" these are often not actually separate indices
         | at all, but just one index where the key is part of the index.)
         | 
         | ---
         | 
         | That being said, you know what you can use the CQRS/ES model
         | _for_? Reducing your literal  "logs" _into_ metrics, as a
         | continuous write-through reduction -- to allow your platform to
         | _write_ log events, but have its associated observability
         | platform _read back_ pre-aggregated metrics time-series data,
         | rather than having to crunch over logs itself at query time.
         | 
         | And AFAIK, this "modelling of log messages _as_ CQRS /ES events
         | in a CQRS/ES event store, so that you can do CQRS/ES reductions
         | to them to compute metrics as aggregates" approach is already
         | widely in use -- but just not much talked about.
         | 
         | For example, when you use Google Cloud Logging, Google seems to
         | be shoving your log messages into something approximating an
         | event-store -- and specifically, one with exactly the filtered-
         | streaming-cost semantics of an "index store" like ElasticSearch
         | (even though they're actually probably using a structured
         | column-store architecture, i.e. "BigTable but append-only and
         | therefore serverless.") And this event store then powers Cloud
         | Logging's "logs-based metrics" reductions
         | (https://cloud.google.com/logging/docs/logs-based-metrics).
        
         | londons_explore wrote:
         | When you have metrics, you should also keep sampled logs.
         | 
         | Ie. 1 per million log entries is kept. Write some rules to try
         | and keep more of the more interesting ones.
         | 
         | One way to do this is to have your logging macro include the
         | source file and line number the logline came from, and then,
         | for each file and line number emit/store no more than 1 logline
         | per minute.
         | 
         | That way you get detailed records of rare events, while
         | filtering most of the noise.
        
           | hooverd wrote:
           | There are also different types of logs. Maybe you want every
           | transaction action but don't need a full fidelity copy of
           | every load balancer ping from the last ten years.
        
         | ryukoposting wrote:
         | Metrics are useful when you know what to measure, which implies
         | that you already have a good idea for what _can_ go wrong. If
         | your entire product exists in some cloud servers that you fully
         | control, that 's probably feasible. Binance probably could have
         | done something more elegant than storing extraordinary amounts
         | of logs.
         | 
         | However, if you're selling a physical product, and/or a service
         | that integrates deeply with third party products/services, it
         | becomes a lot more difficult to determine what's even worth
         | measuring. A conservative approach to metrics collection will
         | limit the usefulness of the metrics, for obvious reasons. A
         | "kitchen sink" approach will take you right back to the same
         | "data volume" problem you had with logs, but now your
         | developers have to deal with more friction when creating
         | diagnostics. Neither extreme is desirable, and finding the
         | middle ground would require information that you simply don't
         | have.
         | 
         | On a related note, one approach I've found useful (at a certain
         | scale) is to shove metrics inside of the logs themselves. Put a
         | machine-readable suffix on your human-readable log messages.
         | The resulting system requires no more infrastructure than what
         | your logs are already using, and you get a reliable timeline of
         | when certain metrics appear vs. when certain log messages
         | appear.
        
           | temporarely wrote:
           | Any system has a 'natural set' of metrics. And metrics are
           | not about "what [went] wrong" rather system health. So
           | Metrics -> Alert -> Log Diagnostics.
        
             | ryukoposting wrote:
             | > Any system has a 'natural set' of metrics
             | 
             | I'm trying to offer the perspective of someone who works
             | with products that don't exist entirely on a server. If
             | your product is a web service, the following might not
             | apply to you.
             | 
             | IME creating diagnostic systems for various IoT and
             | industrial devices, the "natural" stuff is relatively easy
             | to implement (battery level, RSSI, connection state, etc)
             | but it's rarely _informative_. In other words, it doesn 't
             | meaningfully correlate with the health of the system unless
             | failure is already imminent.
             | 
             | It's the obscure stuff that tends to be informative
             | (routing table state, delivery ratio, etc). But, complex
             | metrics demand a greater engineering lift in their
             | development and testing. There's also a non-trivial amount
             | of effort involved in developing tools to interpret the
             | resulting data.
             | 
             | Even if _natural_ and _informative_ were tightly
             | correlated, which they aren 't, an informative metric isn't
             | necessarily _actionable_. You have to be able to _use_ the
             | data to improve your product. I can 't charge the battery
             | in a customer's device for them. I also can't move their
             | phone closer to a cell tower. If you can't _act_ on a
             | metric, you 're just wasting your time.
        
               | temporarely wrote:
               | Fine, but I'm now wondering what sort of "data" is going
               | to help you "charge the battery in a customer's device
               | for them [or] move their phone closer to a cell tower."
               | 
               | A natural metric for a distributed system is connectivity
               | (or conversely partition detection). A metric on
               | connectivity is informative. Can the information help you
               | heal the partition? Maybe, maybe not. Time to hit the
               | logs and see _why_ the partition occurred and if an
               | actionable remedy is possible.
               | 
               | (I'm trying to understand your pov btw, so clarify as you
               | will.)
        
         | BonusPlay wrote:
         | When performing forensic analysis, metrics don't usually help
         | that much. I'd rather sift 2PB of logs, knowing that
         | information I'm looking for is in there, than sit at the usual
         | "2 weeks of nginx access logs which roll over".
         | 
         | Obviously running everything with debug logging just burns
         | through money, but having decent logs can help a lot other
         | teams, not just the ones working on the project (developers,
         | sysadmins, etc.)
        
         | anonygler wrote:
         | "Once in a blue moon" -- you mean the thing that constantly
         | happens? If you're not using logs, you're not practicing
         | engineering. Metrics can't really diagnose problems.
         | 
         | It's also a lot easier to inspect a log stream that maps to an
         | alert with a trace id than it is to assemble a pile of metrics
         | for each user action.
        
           | hot_gril wrote:
           | I think the above comment is just saying that you shouldn't
           | use logs to do the job of metrics. Like, if you have an alert
           | that goes off when some HTTP server is sending lots of 5xx,
           | that shouldn't rely on parsing logs.
        
         | zzyzxd wrote:
         | > Most log messages are useless 99.99% of the time. Best likely
         | outcome is that its turned into a metric. The once in the blue
         | moon outcome is that it tells you what went wrong when
         | something crashed.
         | 
         | If it crashes, it's probably some scenario that was not
         | properly handled. If it's not properly handled, it's also
         | likely not properly logged. That's why you need verbose logs --
         | once in a blue moon you need to have the ability to
         | retrospectively investigate something in the past that was not
         | thought through, without using a time machine.
         | 
         | This is more common in the financial world where audit trail is
         | required to be kept long term for regulation. Some auditor may
         | ask you for proof that you have done a unit test for a function
         | 3 years ago.
         | 
         | Every organization needs to find their balance between storage
         | cost and quality of observability. I prefer to keep as much
         | data as we are financially allowed. If Binance is happy to pay
         | to store 100PB logs, good for them!
         | 
         | "Do we absolutely need this data or not" is a very tough
         | question. Instead, I usually ask "how long do we need to keep
         | this data" and apply proper retention policy. That's a much
         | easier question to answer for everyone.
        
           | KaiserPro wrote:
           | > If it's not properly handled, it's also likely not properly
           | logged
           | 
           | Then you're blue moon probability if it being useful rapidly
           | drops. Verbose logs are simply a pain in the arse, unless you
           | have a massive processing system. but even then it just
           | either kneecaps your observation window, or makes your
           | queries take ages.
           | 
           | I am lucky enough to work at a place that has really ace
           | logging capability, but, and I cannot stress this enough, it
           | is colossally expensive. literal billions.
           | 
           | but, logging is not an audit trail. Even here where we have
           | fancy PII shields and stuff, logging doesn't have the SLA to
           | record anything critical. If there is a capacity crunch,
           | logging resolution gets turned down. Plus logging anything of
           | value to the system gets you a significant bollocking.
           | 
           | If you need something that you can hand to a government
           | investigator, if you're pulling logs, you're already in deep
           | shit. An audit framework needs to have a super high SLA,
           | incredible durability and strong authentication for both
           | people and services. All three of those things are generally
           | foreign to logging systems.
           | 
           | Logging is useful, you should log things, but, you _should
           | not_ use it as a way to generate metrics. verbose logs are
           | just a really efficient way to burn through your
           | infrastructure budget.
        
             | Wingy wrote:
             | How about only save the verbose logs if there's an error?
        
               | chhabraamit wrote:
               | yup, nice idea. keep collecting logs in a flow and only
               | log when there is an error. Or
               | 
               | Start logging in a buffer and only flush when there is an
               | error.
        
             | zzyzxd wrote:
             | > Verbose logs are simply a pain in the arse, unless you
             | have a massive processing system. but even then it just
             | either kneecaps your observation window, or makes your
             | queries take ages.
             | 
             | which is why this blog post brags about their capability.
             | Technologies advances, and something difficult to do today
             | may not be as difficult tomorrow. If your logging infra is
             | overwhelmed, by all means drop some data and protect the
             | system. But if Binance is happily storing and querying
             | their 100PB logs now, that's their choice and it's totally
             | fine. I won't say they are doing anything wrong. Again, we
             | are talking about blue moon scenarios here, which is all
             | about hedging risks and uncertainties. It's fine if Netflix
             | drops a few frames of pictures in a movie, but my bank
             | can't drop my transaction.
        
         | pphysch wrote:
         | > But logs shouldn't be your primary source of data, metrics
         | should be.
         | 
         | Metrics, logs, relational data, KVs, indexes, flat files, etc.
         | are all equally valid forms of data for different shapes of
         | data and different access patterns. If you are building for a
         | one-size-fits-all database you are in for a nasty surprise.
        
         | galkk wrote:
         | Just to be sure, I'm speaking below about application/system
         | logs, not as "our event sourcing uses log storage"
         | 
         | Yes, you probably don't want to store debug logs of 2 years
         | ago, but logs and metrics solve very different problems.
         | 
         | Logs need to have determined lifecycle, e.g. most detailed logs
         | are stored for 7/14/30/release cadence days, then discarded.
         | But when you need to troubleshoot something, metrics give you
         | signal, but logs give you information about what was going on.
        
         | andrewf wrote:
         | As an engineer I generally want logs so I can dive into
         | problems that weren't anticipated. Debugging.
         | 
         | I get a lot of pushback from ops folks. They often don't have
         | the same use case. The logs are for the things that'll be
         | escalated beyond the ops folks to the people that wrote the
         | bug.
         | 
         | Yes, most (> 99.99%) of them will never be looked at. But
         | storage is supposed to be cheap, right? If we can waste bytes
         | on loading a copy of Chromium for each desktop application,
         | surely we can waste bytes on this.
         | 
         | My argument is completely orthogonal to "do we want to generate
         | metrics from structured logs".
        
           | andmarios wrote:
           | Most probably, said ops folks have quite a few war stories to
           | share about logs.
           | 
           | Maybe a JVM-based app went haywire, producing 500GB of logs
           | within 15 minutes, filling the disk, and breaking a critical
           | system because no one anticipated that a disk could go from
           | 75% free to 0% free in 15 minutes.
           | 
           | Maybe another JVM-based app went haywire inside a managed
           | Kubernetes service, producing 4 terabytes of logs, and the
           | company's Google Cloud monthly usage went from $5,000 to
           | $15,000 because storing bytes is supposed to be cheap when
           | they are bytes and not when they are terabytes.
           | 
           | I completely agree that logs are useful, but developers often
           | do not consider what to log and when. Check your company's
           | cloud costs. I bet you the cost of keeping logs is at least
           | 10%, maybe closer to 25% of the total cost.
        
             | andrewf wrote:
             | Agreed you need to engineer the logging system and not just
             | pray. "The log service slowed down and our writes to it are
             | synchronous" is one I've seen a few times.
             | 
             | On "do not consider what to log and when" .. I'm not saying
             | don't think about it at all, but if I could anticipate bugs
             | well enough to know exactly what I'll need to debug them,
             | I'd just not write the bug.
        
           | jiggawatts wrote:
           | Something I've discovered is that Azure App Insights can
           | capture memory snapshots when an exception happens. You can
           | download these with a button press and open in Visual Studio
           | with a double-click.
           | 
           | It's _magic!_
           | 
           | The stack variables, other threads and most of the heap is
           | right there as-if you had set a breakpoint and it was an
           | interactive debug session.
           | 
           | IMHO this eliminates the need for 99% of the typical detailed
           | tracing seen in large complex apps.
        
         | tuyguntn wrote:
         | > Most log messages are useless 99.99% of the time.
         | 
         | Things are useless until first crash happens, same thing
         | applies to replication, you don't need replication until your
         | servers start crashing.
         | 
         | > But logs shouldn't be your primary source of data, metrics
         | should be.
         | 
         | There are different types of data related to the product:
         | * product data - what's in your db         * logs - human
         | readable details of a journey for a single request         *
         | metrics - approximate health state of overall system, where
         | storing high cardinality values are bad (e.g. customer_uuid)
         | * traces - approximate details of a single request to be able
         | to analyze request journey through systems, where storing high
         | cardinality values might still be bad.
         | 
         | Logs are useful, but costly. Just like everything else which
         | makes system more reliable
        
       | sebstefan wrote:
       | > On a given high-throughput Kafka topic, this figure goes up to
       | 11 MB/s per vCPU.
       | 
       | There's got to be 2x to 10x improvement to be made there, no? No
       | way CPU is the limitation these days and even bad hard drives
       | will support 50+mB/s write speeds.
        
         | ddorian43 wrote:
         | Building inverted index is very CPU intensive.
        
         | fulmicoton wrote:
         | Building an inverted index is actually very cpu intensive. I
         | think we are the fastest on that (if someone knows something
         | faster than tantivy at indexing I am interested).
         | 
         | I'd be really surprised if you can make a 10x improvement here.
        
       | ZeroCool2u wrote:
       | There was a time at the beginning of the pandemic where my team
       | was asked to build a full text search engine on top of a bunch of
       | SharePoint sites in under 2 weeks and with frustratingly severe
       | infrastructure constraints, (No cloud services, single box on
       | prem for processing, among other things), and we did and it
       | served its purpose for a few years. Absolutely no one should
       | emulate what we built, but it was an interesting puzzle to work
       | on and we were able to cut through a lot of bureaucracy quickly
       | that had held us back for a few years wrt accessing the sensitive
       | data they needed to search.
       | 
       | But I was always looking for other options for rebuilding the
       | service within those constraints and found Quickwit when it was
       | under active development. I really admire their work ethic and
       | their engineering. Beautifully simple software that tends to Just
       | Work(tm). It's also one of the first projects that made me really
       | understand people's appreciation for Rust as well outside of just
       | loving Cargo.
        
         | fulmicoton wrote:
         | Thank you for the kind word @ZeroCool2u ! :)
        
         | hanniabu wrote:
         | > we were able to cut through a lot of bureaucracy quickly that
         | had held us back for a few years wrt accessing the sensitive
         | data they needed to search
         | 
         | Doesn't sound like a benefit for your users
        
           | shortrounddev2 wrote:
           | In what way?
        
         | totaa wrote:
         | I don't know what brings me more happiness in this career.
         | Building systems with no political constraints, or building
         | something that's functional with severe restraints.
        
       | elchief wrote:
       | maybe drop the log level from debug to info...
        
       | piterrro wrote:
       | Reminds me of the time Coinbase paid DataDog $65M for storing
       | logs[1]
       | 
       | [1] https://thenewstack.io/datadogs-65m-bill-and-why-
       | developers-...
        
       | inssein wrote:
       | Why do so many companies insist on shipping their logs via Kafka?
       | I can't imagine deliverability semantics are necessary with logs,
       | and if they are, they shouldn't be in your logs?
        
         | bushbaba wrote:
         | Don't forget about all the added cost. never got it as many
         | shops can tolerate data loss for their melt data. So long as
         | it's collected 99.9% of the time it's good enough.
        
         | mdaniel wrote:
         | My experience has been a mixture of "when all you have is a
         | hammer ..." and Pointy Haired Bosses _LOVE_ kafka, and tend to
         | default to it because it 's what all their Pointy Haired Boss
         | friends are using
         | 
         | In a more generous take, using some buffered ingest does help
         | with not having to choose between a c500.128xl ingest machine
         | and dropping messages, but I would never advocate for standing
         | up kafka _just_ for log buffering
        
           | inssein wrote:
           | at that point you are likely slowing down your applications -
           | I think a basic OpenTelemetry collector mostly solves this,
           | and if you go beyond the available buffer there, then
           | dropping it is the appropriate choice for application logs.
        
             | jcgrillo wrote:
             | Dropping may be an unacceptable choice for some
             | applications, though. For example dropping request logs is
             | _really_ bad, because now you have no idea who is
             | interacting with your service. If a security breach happens
             | and your answer is  "like, bro, idk what happened man, we
             | load shedded the logs away" that's not a great look...
        
         | jcgrillo wrote:
         | Kafka is a big dumb pipe that moves the bytes real fast, it's
         | ideal for shipping logs. It accepts huge volumes of tiny writes
         | without breaking a sweat, which is exactly what you want--get
         | the logs off the box ASAP and persisted somewhere else durably
         | (e.g. replicated).
        
       | RIMR wrote:
       | I am having trouble understand how any organization could ever
       | need a collection of logs larger than the size of the entire
       | Internet Archive. 100PB is staggering, and the idea of filling
       | that with logs, while entirely possible, just seems completely
       | useless given the cost of managing that kind of data.
       | 
       | This is on a technical level quite impressive though, don't get
       | me wrong, I just don't understand the use case.
        
         | jcgrillo wrote:
         | Same, also I'd love to know more about the technical details of
         | their logging format, the on-disk storage format, and why they
         | were only able to reduce the storage size to 20% of the
         | uncompressed size. For example, clp[1] can achieve much, much
         | better compression on logs data.
         | 
         | [1] https://github.com/y-scope/clp
         | 
         | EDIT: See also[2][3].
         | 
         | [2] https://www.uber.com/blog/reducing-logging-cost-by-two-
         | order...
         | 
         | [3] https://www.uber.com/blog/modernizing-logging-with-clp-ii/
        
         | tommek4077 wrote:
         | These are order and trade logs probably. You want to have them
         | and you need them for auditing. Binance wants to be more
         | professional in that way probably. HFT is making billions of
         | orders per day per trader.
        
           | jcgrillo wrote:
           | OK, so let's do some napkin math... I'm guessing something
           | like this is the information you might want to log:
           | 
           | user ID: 128bits
           | 
           | timestamp: 96bits
           | 
           | ip address: 32bits
           | 
           | coin type: idk 32bits? how many fake internet money types can
           | there be?
           | 
           | price: 32bits
           | 
           | quantity: 32bits
           | 
           | So total we have 352bits. Now let's double it for teh lulz so
           | 704bits wtf not. You know what fuck it let's just round up to
           | 1024bits. Each trade is 128bytes why not, that's a nice
           | number.
           | 
           | That means 200Pb--2e17 bytes mind you--is enough to store
           | 1.5625e16 trades. If all the traders are doing 1e9
           | trades/day, and we assume this dataset is 13mo of data, that
           | means there are 38772 HFT traders all simultaneously making
           | 11574 trades _per second_.. That seems like a lot..
           | 
           | In other words, that means Binance is processing 448.75
           | _million_ orders per second.. Are they though?
           | 
           | EDIT: No, indeed some googling indicates they claim they
           | _can_ process something like 1.4 million TPS. But I 'd hazard
           | a guess the actual figure on average is less..
           | 
           | EDIT: err sorry, shoulda been 100Pb. Divide all those numbers
           | by two. Still two orders of magnitude worth of absurd.
        
             | RIMR wrote:
             | The only thing I can think of is that they are collecting
             | every single line of log data from every single production
             | server with absolutely zero expiration so that they can
             | backtrack any future attack with precision, maybe even
             | finding the original breach.
             | 
             | That's the only actual use case I can think of for
             | something like this, which makes sense for a cryptocurrency
             | exchange that is certainly expecting to get hacked at some
             | point.
        
       | cletus wrote:
       | Just browsing the Quickwit documentation it seems like the
       | general architecture here is to write JSON logs but stores them
       | compressed. Is this just something like gzip compression? 20%
       | compressed size does seem to align to ballpark estimates of JSON
       | GZIP compression. This is what Quickwit (and this page) calls a
       | "document": a single JSON record (just FYI).
       | 
       | Additionally you need to store indices because this is what you
       | actually search. Indices have a storage cost when you write them
       | too.
       | 
       | When I see a system like this my thoughts go to questions like:
       | 
       | - What happens when you alter an index configuration? Or add or
       | remove an index?
       | 
       | - How quickly do indexes update when this happens?
       | 
       | - What about cold storage?
       | 
       | Data retention is another issue. Indexes have config for
       | retention [1]. It's not immediately clear to me how document
       | retention works, possibly from S3 expiration?
       | 
       | So, network transfer from S3 is relatively expensive ($0.05/GB
       | standard pricing [2] to the Internet, less to AWS regions). This
       | will be a big factor in cost. I'm really curious to know how much
       | all of this actually costs per PB per month.
       | 
       | IME you almost never need to log and store this much data and
       | there's almost no reason to ever store this much. Most logs are
       | useless and you also have to question what the purpose is of any
       | given log. Even if you're logging errors, you're likely to get
       | the exact same value out of 1% sampling of logs than you are with
       | logging everything.
       | 
       | You might even get more value with 1% sampling because your query
       | and monitoring might be a whole lot easier with substantially
       | less data to deal with.
       | 
       | Likewise, metrics tend to work just as well from sampled data.
       | 
       | This post suggests 60 day log retention (100PB / 1.6PB daily). I
       | would probably divide this into:
       | 
       | 1. Metrics storage. You can get this from logs but you'll often
       | find it useful to write it directly if you can. Getting it from
       | logs can be error-prone (eg a log format changes, the sampling
       | rate changes and so on);
       | 
       | 2. Sampled data, generally for debugging. I would generally try
       | to keep this at 10TB or less;
       | 
       | 3. "Offline" data, which you would generally only query if you
       | absolutely had to. This is particularly true on S3, for example,
       | because the write costs are basically zero but the read costs are
       | expensive.
       | 
       | Additionally, you'd want to think about data aggregation as a lot
       | of your logs are only useful when combined in some way
       | 
       | [1]: https://quickwit.io/docs/overview/concepts/indexing
       | 
       | [2]: https://aws.amazon.com/s3/pricing/
        
         | JackSlateur wrote:
         | You have very good questions, I can only guess one answer: s3
         | network transfer is free for AWS services
         | 
         | Your link[1] said:                 You pay for all bandwidth
         | into and out of Amazon S3, except for the following:
         | [...]       - Data transferred from an Amazon S3 bucket to any
         | AWS service(s) within the same AWS Region as the S3 bucket
         | (including to a different account in the same AWS Region).
        
       | ATsch wrote:
       | It's always very amusing how all of the blockchain companies wax
       | lyrical about all of the huge supposed benefits of blockchains
       | and how every industry and company is missing out by not adopting
       | them and should definitely run a hyperledger private blockchain
       | buzzword whatever.
       | 
       | And then, even when faced with implementing a huge, audit
       | critical, distributed append-only store, the thing they tell us
       | blockchains are so useful for, they just use normal database tech
       | like the rest if us. With one centralized infrastructure where
       | most of the transactions in the network actually take place.
       | Who's tech stack looks suspiciously like every other financial
       | institution.
       | 
       | I'm so glad we're ignoring 100 years of securities law to let all
       | of this incredible innovation happen.
        
         | tommek4077 wrote:
         | Binance is not a blockchain company. It is a centealized
         | exchange. Nothing is happening on-chain unless getting coins
         | from or to the exchange. And this has nothing tondo with them
         | then.
        
       | ram_rar wrote:
       | >Limited Retention: Binance was retaining most logs for only a
       | few days. Their goal was to extend this to months, requiring the
       | storage and management of 100 PB of logs, which was prohibitively
       | expensive and complex with their Elasticsearch setup.
       | 
       | Just to give some perspective. The Internet Archive, as of
       | January 2024, attests to have stored ~ 99 petabytes of data.
       | 
       | Can someone from Binance/quickwit comment on their use case that
       | needed log retention for months? I have rarely seen users try to
       | access actionable _operations_ log data beyond 30 days.
       | 
       | I wonder how much $$ can they save more by leveraging tiered
       | storage and engs being mindful of logging.
        
         | drak0n1c wrote:
         | Government regulators take their time and may not investigate
         | or alert firms to identified theft, vulnerability, criminal or
         | sanctioned country user trails for months. However, that does
         | not protect those companies from liability. There is recent
         | pressure and targeted prosecution from the US on Binance and CZ
         | along this angle. They've been burned on US users getting into
         | their international exchange, so keeping longer forensic logs
         | helps surveil, identify, and restrict Americans better (as well
         | as the bad guys they're not supposed to interact with).
        
       | evdubs wrote:
       | Lots of storage to log all of the wash trading on their platform.
        
       ___________________________________________________________________
       (page generated 2024-07-11 23:01 UTC)