[HN Gopher] A Practitioner's Guide to Wide Events
___________________________________________________________________
A Practitioner's Guide to Wide Events
Author : dmazin
Score : 15 points
Date : 2024-12-23 20:29 UTC (1 days ago)
(HTM) web link (jeremymorrell.dev)
(TXT) w3m dump (jeremymorrell.dev)
| zahlman wrote:
| Practitioner of what? What is a "wide event"? In what context is
| this concept relevant? It took several sentences before I was
| even confident that this is something to do with programming.
| djhope99 wrote:
| It's about observability and strongly related to Honeycombs
| o11y 2.0 vision.
| zahlman wrote:
| Okay, so a web search and some looking around gives me
| https://www.honeycomb.io/frontend-observability. I guess this
| is something to do with tools for sending telemetry back from
| web applications and then doing statistics on them and giving
| the user some nice reports.
|
| "Observability" seems like a weird term for that to me, but
| okay.
|
| But I don't understand why not just give the appropriate
| context in the submission, rather than keeping a title that
| only makes sense to a very specific niche audience and then
| _not saying up front what the niche is_.
|
| The concept of an "event" is coherent in many other
| programming contexts, so the possibility that one could be
| coherently "wide" is at least plausibly interesting. But then
| I get there and find myself completely disoriented, and
| eventually figure out that it's not actually relevant to
| anything I do. And anyway it looks like a lot of this jargon
| is really just not necessary to convey the core ideas... ?
| Etheryte wrote:
| They link to three separate articles right at the start that
| cover all of this. Not every article needs to start from first
| principles. You wouldn't expect an article about a new Postgres
| version to start with what databases are and why someone would
| need them.
| zahlman wrote:
| >Not every article needs to start from first principles.
|
| Sure, but it would be nice if _title submissions_ made it
| feasible to _predict the topic category_ of the article for
| people who are not already in the relevant niche.
| cookie_monsta wrote:
| I felt like I got the gist after the first two:
|
| > Adopting Wide Event-style instrumentation has been one of the
| highest-leverage changes I've made in my engineering career.
| The feedback loop on all my changes tightened and debugging
| systems became so much easier.
| zahlman wrote:
| >I felt like I got the gist after the first two:
|
| What I get is: here's a thing that made a big improvement to
| how I debug systems.
|
| Except, it turns out that the systems in question are very
| specific ones.
|
| > The tl;dr is that for each unit-of-work in your system
| (usually, but not always an HTTP request / response) you emit
| one "event" with all of the information you can collect about
| that work.
|
| Okay, but... as opposed to what? And why is it better this
| way?
|
| >"Event" is an over-loaded term in telemetry so replace that
| with "log line" or "span" if you like. They are all
| effectively the same thing.
|
| In the programming I do, "event" doesn't mean anything to do
| with logging or telemetry.
| treyfitty wrote:
| That doesn't really give an objective definition of what wide
| events are, just an opinion and example in this one persons
| life.
|
| I had to lookup wide events in the middle of the article, and
| I can't say I can viscerally see and feel the benefits the OP
| was espousing. Just felt like an adderall-fueled dump of
| information being thrown at me.
| valyala wrote:
| Wide events is a great concept for observability space! This a
| superset of structured logs and traces. Wide events is basically
| structured logs, where every log entry contains hundreds of
| fields with various properties of the log entry. This allows
| slicing and dicing the collected events by arbitrary subsets of
| thier fields. This opens an infinite possibilities to obtain
| useful analytics from the collected events.
|
| Wide events can be stored in traditional databases. But this
| approach has a few drawbacks:
|
| - Every wide event can have different sets of fields. Such fields
| cannot be mapped to the classical relational table columns, since
| the full set of potential fields, which can be seen in wide
| events, isn't known beforehand.
|
| - The number of fields in wide events is usually quite big - from
| tens to a few hundreds. If we are going to store them in a
| traditional relational table, this table will end up with
| hundreds of columns. Such tables aren't processed efficiently by
| traditional databases.
|
| - Typical queries over wide events usually refer only a few
| fields out of hundreds of available fields. Traditional databases
| usually store every row in a table as a contiguous chunk of data
| with all the values for all the fields of the row (aka row-based
| storage). Such a scheme is very inefficient when the query needs
| to process only a few fields out of hundreds of available fields,
| since the database needs to read all the hundreds fields per each
| row and then extract the needed few fields.
|
| It is much better to use analytical databases such as ClickHouse
| for storing and processing of big volumes of wide events. Such
| databases usually store values per every field in contiguous data
| chunks (aka column-oriented storage). This allows reading and
| processing only the needed few fields mentioned in the query,
| while skipping the rest of hundreds fields. This also allows
| efficiently compressing field values, which reduces storage space
| usage and improves performance for queries limited by disk read
| speed.
|
| Analytical databases don't resolve the first issue mentioned
| above, since they usually need creating a table with the pre-
| defined columns before storing wide events into it. This means
| that you cannot store wide events with arbitrary sets of fields,
| which can be unknown before creating the table.
|
| I'm working on a specialized open-source database for wide
| events, which resolves all the issues mentioned above. It doesn't
| need creating any table schemas before starting ingesting wide
| events with arbitrary sets of fields (e.g. it is schemaless). It
| automatically creates the needed columns for all the fields it
| sees during data ingestion. It uses column-oriented storage, so
| it provides query performance comparable to analytical databases.
| The name of this database is VictoriaLogs. Strange name for the
| database specialized for efficient processing of wide events :)
| This is because initially it was designed for storing logs - both
| plaintext and structured. Later it has been appeared that it's
| architecture ideally fits wide events. Check it out -
| https://docs.victoriametrics.com/victorialogs/
| bonobocop wrote:
| Thoughts on stuff like ClickHouse with JSON column support?
| Less upfront knowledge of columns needed.
| thom wrote:
| I'm quite looking forward to a future where we've finally
| accepted that all this stuff is just part of the domain and
| shouldn't be treated like an ugly stepchild, and we've merged
| OLTP and OLAP with great performance for both, and the wolf also
| shall dwell with the lamb, and we'll all get lots of work done.
___________________________________________________________________
(page generated 2024-12-24 23:00 UTC)