[HN Gopher] A Practitioner's Guide to Wide Events
       ___________________________________________________________________
        
       A Practitioner's Guide to Wide Events
        
       Author : dmazin
       Score  : 15 points
       Date   : 2024-12-23 20:29 UTC (1 days ago)
        
 (HTM) web link (jeremymorrell.dev)
 (TXT) w3m dump (jeremymorrell.dev)
        
       | zahlman wrote:
       | Practitioner of what? What is a "wide event"? In what context is
       | this concept relevant? It took several sentences before I was
       | even confident that this is something to do with programming.
        
         | djhope99 wrote:
         | It's about observability and strongly related to Honeycombs
         | o11y 2.0 vision.
        
           | zahlman wrote:
           | Okay, so a web search and some looking around gives me
           | https://www.honeycomb.io/frontend-observability. I guess this
           | is something to do with tools for sending telemetry back from
           | web applications and then doing statistics on them and giving
           | the user some nice reports.
           | 
           | "Observability" seems like a weird term for that to me, but
           | okay.
           | 
           | But I don't understand why not just give the appropriate
           | context in the submission, rather than keeping a title that
           | only makes sense to a very specific niche audience and then
           | _not saying up front what the niche is_.
           | 
           | The concept of an "event" is coherent in many other
           | programming contexts, so the possibility that one could be
           | coherently "wide" is at least plausibly interesting. But then
           | I get there and find myself completely disoriented, and
           | eventually figure out that it's not actually relevant to
           | anything I do. And anyway it looks like a lot of this jargon
           | is really just not necessary to convey the core ideas... ?
        
         | Etheryte wrote:
         | They link to three separate articles right at the start that
         | cover all of this. Not every article needs to start from first
         | principles. You wouldn't expect an article about a new Postgres
         | version to start with what databases are and why someone would
         | need them.
        
           | zahlman wrote:
           | >Not every article needs to start from first principles.
           | 
           | Sure, but it would be nice if _title submissions_ made it
           | feasible to _predict the topic category_ of the article for
           | people who are not already in the relevant niche.
        
         | cookie_monsta wrote:
         | I felt like I got the gist after the first two:
         | 
         | > Adopting Wide Event-style instrumentation has been one of the
         | highest-leverage changes I've made in my engineering career.
         | The feedback loop on all my changes tightened and debugging
         | systems became so much easier.
        
           | zahlman wrote:
           | >I felt like I got the gist after the first two:
           | 
           | What I get is: here's a thing that made a big improvement to
           | how I debug systems.
           | 
           | Except, it turns out that the systems in question are very
           | specific ones.
           | 
           | > The tl;dr is that for each unit-of-work in your system
           | (usually, but not always an HTTP request / response) you emit
           | one "event" with all of the information you can collect about
           | that work.
           | 
           | Okay, but... as opposed to what? And why is it better this
           | way?
           | 
           | >"Event" is an over-loaded term in telemetry so replace that
           | with "log line" or "span" if you like. They are all
           | effectively the same thing.
           | 
           | In the programming I do, "event" doesn't mean anything to do
           | with logging or telemetry.
        
           | treyfitty wrote:
           | That doesn't really give an objective definition of what wide
           | events are, just an opinion and example in this one persons
           | life.
           | 
           | I had to lookup wide events in the middle of the article, and
           | I can't say I can viscerally see and feel the benefits the OP
           | was espousing. Just felt like an adderall-fueled dump of
           | information being thrown at me.
        
       | valyala wrote:
       | Wide events is a great concept for observability space! This a
       | superset of structured logs and traces. Wide events is basically
       | structured logs, where every log entry contains hundreds of
       | fields with various properties of the log entry. This allows
       | slicing and dicing the collected events by arbitrary subsets of
       | thier fields. This opens an infinite possibilities to obtain
       | useful analytics from the collected events.
       | 
       | Wide events can be stored in traditional databases. But this
       | approach has a few drawbacks:
       | 
       | - Every wide event can have different sets of fields. Such fields
       | cannot be mapped to the classical relational table columns, since
       | the full set of potential fields, which can be seen in wide
       | events, isn't known beforehand.
       | 
       | - The number of fields in wide events is usually quite big - from
       | tens to a few hundreds. If we are going to store them in a
       | traditional relational table, this table will end up with
       | hundreds of columns. Such tables aren't processed efficiently by
       | traditional databases.
       | 
       | - Typical queries over wide events usually refer only a few
       | fields out of hundreds of available fields. Traditional databases
       | usually store every row in a table as a contiguous chunk of data
       | with all the values for all the fields of the row (aka row-based
       | storage). Such a scheme is very inefficient when the query needs
       | to process only a few fields out of hundreds of available fields,
       | since the database needs to read all the hundreds fields per each
       | row and then extract the needed few fields.
       | 
       | It is much better to use analytical databases such as ClickHouse
       | for storing and processing of big volumes of wide events. Such
       | databases usually store values per every field in contiguous data
       | chunks (aka column-oriented storage). This allows reading and
       | processing only the needed few fields mentioned in the query,
       | while skipping the rest of hundreds fields. This also allows
       | efficiently compressing field values, which reduces storage space
       | usage and improves performance for queries limited by disk read
       | speed.
       | 
       | Analytical databases don't resolve the first issue mentioned
       | above, since they usually need creating a table with the pre-
       | defined columns before storing wide events into it. This means
       | that you cannot store wide events with arbitrary sets of fields,
       | which can be unknown before creating the table.
       | 
       | I'm working on a specialized open-source database for wide
       | events, which resolves all the issues mentioned above. It doesn't
       | need creating any table schemas before starting ingesting wide
       | events with arbitrary sets of fields (e.g. it is schemaless). It
       | automatically creates the needed columns for all the fields it
       | sees during data ingestion. It uses column-oriented storage, so
       | it provides query performance comparable to analytical databases.
       | The name of this database is VictoriaLogs. Strange name for the
       | database specialized for efficient processing of wide events :)
       | This is because initially it was designed for storing logs - both
       | plaintext and structured. Later it has been appeared that it's
       | architecture ideally fits wide events. Check it out -
       | https://docs.victoriametrics.com/victorialogs/
        
         | bonobocop wrote:
         | Thoughts on stuff like ClickHouse with JSON column support?
         | Less upfront knowledge of columns needed.
        
       | thom wrote:
       | I'm quite looking forward to a future where we've finally
       | accepted that all this stuff is just part of the domain and
       | shouldn't be treated like an ugly stepchild, and we've merged
       | OLTP and OLAP with great performance for both, and the wolf also
       | shall dwell with the lamb, and we'll all get lots of work done.
        
       ___________________________________________________________________
       (page generated 2024-12-24 23:00 UTC)