[HN Gopher] Discussion: structured, leveled logging
___________________________________________________________________
Discussion: structured, leveled logging
Author : philosopher1234
Score : 135 points
Date : 2022-09-11 15:44 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| svnpenn wrote:
| Every time I come across a logging package, I don't understand
| what the point of them are. Finally in the linked issue, I see
| this:
|
| > For many programs that aren't run directly by person, like
| servers, logging is the main way for developers to observe the
| detailed behavior of the system, and often the first place they
| go to debug it.
|
| and it makes sense. I don't deal with servers at all, other than
| a couple I have made just to host some basic sites or programs
| locally. I think Go team might be overlooking the fact that a big
| chunk of the userbase doesnt understand, and has no need for a
| logging package. Couple that with the giant API proposed, along
| with a poor design decision (kvs ...any), and the large amount of
| bikeshedding that already happened in this issue, I really hope
| they think hard before just shoving this into "std".
| xani_ wrote:
| Develop bigger app and you will understand.
|
| But one example - having an logging interface that is not shit
| (the current builtin is so garbage it might as well not exist
| at all) allows for packages to adopt it as default method and
| you magically can just fed same logger to every dep you pull
| with no loss of features or having to write wrapper.
|
| Contexted logger means you can just fed ["plugin":pluginname]
| tagged one to every plugin and get logs pointing you to the
| place where it happened.
|
| The structured logging part is mostly to avoid woes of then
| having to parse it back to structured format via bunch of
| regexpes that will inevitably break next release. If you still
| don't get it, here is a regexp to parse HAProxy logs:
| .*haproxy\[(\d+)]: (.+?):(\d+) \[(.+?)\] (.+?)(|[\~])
| (.+?)\/(.+?)
| ([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+) ([\-\d]+)
| ([\-\d]+) (\S+) (\S+) (\S)(\S)(\S)(\S)
| ([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+)\/([\-\d]+)
| ([\-\d]+)\/([\-\d]+)(| \{.*\}) (".*)([\n|\s]*?)$`)
|
| > I think Go team might be overlooking the fact that a big
| chunk of the userbase doesnt understand, and has no need for a
| logging package
|
| How does that proposal impede them ?
|
| if you don't need any of the extras you'd be able to just
| `log.Info"
| tptacek wrote:
| It's not going into std; it's going into /x.
| marcus_holmes wrote:
| effectively the same place at this point. There's so many
| "standard" libs in /x that trying to constraint to "just the
| standard lib" has to include /x to be actually useful.
| philosopher1234 wrote:
| Such as?
| marcus_holmes wrote:
| goimports, godoc and the other tools
| (https://pkg.go.dev/golang.org/x/tools - you basically
| can't use Go properly without these)
|
| all the crypto stuff
| (https://pkg.go.dev/golang.org/x/crypto) if you're doing
| anything with crypto then this is needed
|
| others are really useful (see index at
| https://pkg.go.dev/golang.org/x) - I've imported the
| image, sync and text packages
|
| it's a lot of the "missing bits" from the standard lib. I
| get why they're in /x but the distinction is moot in most
| cases.
| tptacek wrote:
| Almost nobody uses goimports or godoc as a declared
| dependency; they're tools, not standard library
| components.
|
| It's perfectly possible to implement cryptography without
| touching x/crypto; std includes SHA2, HMAC, an AEAD
| interface, ED25519, and, of course, Go's entire TLS
| stack. Arguably there's _too much_ cryptography in std
| (almost nobody is better off directly building on cipher
| cores and signature algorithms). Thankfully: most Go
| programs don 't implement cryptography, which is
| dangerous even with Go's relatively solid library
| interfaces.
| tptacek wrote:
| They're not at all effectively the same place, and much of
| what's in /x competes with third-party packages, which are
| on a somewhat level playing field in that both /x and
| third-party code have to be explicitly brought in to a
| project as a dep.
| marcus_holmes wrote:
| meh, I'm always going to prefer an /x package to a 3rd-
| party package.
| atsjie wrote:
| The majority of Golang devs actually do work on web services,
| API's, or libs/frameworks that would benefit from a
| standardized structured logging interface.
|
| Source: https://go.dev/blog/survey2022-q2-results (scroll
| almost to the bottom)
|
| For the audience of this proposal it's probably fairly evident
| what the advantages of standardization would be.
|
| Also you argue quite strongly about the poor API design
| decisions being made; while you similarly admit not having
| worked with such tools or with web services in Golang general.
| That's odd.
| svnpenn wrote:
| I don't work with servers much, but I do have extensive
| experience in Go. I know a bad API when I see one.
| strings.Replacer [1] uses the same syntax, and several others
| have commented in the discussion about it being a bad idea.
|
| 1. https://godocs.io/strings#Replacer
| xani_ wrote:
| From what I've skimmed the most problems seem to be
| "golang's type system won't allow expressing all of that on
| compile time so it will be costly" and I don't think that
| can be sensibly fixed in logging package
|
| > I know a bad API when I see one. strings.Replacer [1]
| uses the same syntax, and several others have commented in
| the discussion about it being a bad idea.
|
| What's wrong with it ? Never used it. But I think the
| complaints has been about wasting time for runtime type
| handling and Replacer doesn't have that problem.
| Joker_vD wrote:
| The main performance problem is with "kvs ...any"
| parameter in the Logger's functions because it allocates
| _lots_ of stuff: the array itself, and then the wrappers
| around the values. I 've tried to use something like this
| in a central loop of a CPU-emulator to trace every
| instruction executed and IIRC it slowed things about 3x
| times down. The only thing you can do to escape this cost
| is if logging.Level() >= logging.DEBUG {
| logging.Logger.Debug(...) }
|
| but it's ugly and can't be abstracted away. And yes, I've
| put several of such guards in the several hot paths of
| some actual production code base and seen about 25% speed
| up after that edit.
| jen20 wrote:
| > What's wrong with it ?
|
| It relies on lining up pairs of values in a way that
| cannot be checked by the compiler rather than having
| dedicated functions that ensure they are kept together.
|
| This is the fundamental difference between bad logging
| libraries (like klog, hclog etc) and good logging
| libraries (like zerolog).
| xani_ wrote:
| I guess simplest fix for that particular one would be
| accepting `map[string]string` but that won't exactly work
| for the any type, and you can't easily add more.
|
| Maybe `...map[string]interface{}` but we're back to "the
| overhead is kinda shit"
| kaz-inc wrote:
| I also didn't understand the need for a logging package for a
| while, but I think it makes sense for go because they're
| focusing on an audience that writes server applications.
| xani_ wrote:
| It's less about package and more about interface.
|
| One "good enough" in stdlib means libraries could just
| require passing a interface matching it. Then no matter which
| logging package you use, as long as it fulfils the interface
| of stdlib logger it can be used with any library or other
| component you pulled.
|
| So with simple app you can just have simple logger that dumps
| everything on stderr, complex app can have whatever complex
| logging logic it needs, but lib "just using stdlib logger"
| doesn't care about that.
|
| And as a lib author, again, you can just use the stdlib one
| and most likely cover all your use cases.
| GauntletWizard wrote:
| Levels are a great abstraction for debug logs. Debug logs are
| often abused for much more than their reasonable purpose.
|
| Audit logs are different from debug logs, and it's pretty common
| for people to not think about the reasons why. Audit logs are
| records, and they don't have levels; they might have metadata,
| but you need to capture all of them, reliably. Debug logs are
| ephemeral and if you miss some of them you shouldn't care.
|
| The problem with structured logging libraries is that they try to
| do both. There's some in common, but the mechanisms of output
| should be really different - Debug logs in a circular buffer in
| RAM makes a lot of sense, but you can't do that for Audit logs.
| The current approach seems to be "Write them all to the same
| place, then filter them out", and that only adds more latency in
| which to lose your audit logs.
| jen20 wrote:
| > Audit logs are different from debug logs
|
| Very much so - audit logs, where they are important, are often
| part of the actual business domain and should be explicitly
| modelled, not delegated to infrastructure with a slightly
| different level.
| marcus_holmes wrote:
| When I last discussed this we ended up in a huge debate about
| whether we should be sending different log levels to different
| destinations, because of this dichotomy.
|
| Debug messages shouldn't be stored on permanent logs (they
| frequently contain sensitive/private info). Alerts need to be
| sent to someone immediately (even if that's a different process
| monitoring the log file for alerts and raising them). Errors
| need context in order to diagnose them (ideally a stack trace).
| All of these are doing different things using the same
| interface.
|
| Log levels end up being abused because of this. What we want is
| to be able to handle different situations differently. What we
| get is log levels.
| tptacek wrote:
| This appears to be why there's a separation of concerns between
| "handlers" and "loggers".
| GauntletWizard wrote:
| That's exactly what I'm railing against. You shouldn't be
| pushing all these things into the same stream and then
| separating them out again. You may want to make it easy to
| generate a debug log when you generate an alert. You
| shouldn't do that by writing a log. You should have a very
| different set of parameters for
|
| If you're a bank, your login "Logs" shouldn't just be an
| annotated web request; you should have an explicit
| "AuthenticationSuccess" type in a non-ephemeral data store,
| and it should contain more than just "301 /auth 1.2.3.4", it
| should include information about the factors used to make
| that decision.
|
| Semi-structured logs don't have that, but will be abused to
| provide that. It should be a type system error to create an
| AuthSuccess without all that, not a set of default parameters
| to a generic logging call.
|
| Structured logging is great for moving fast, and I use it all
| the time to make it easy to filter through my debug logs, but
| levels are an abstraction in debug logs - and handlers are
| completely an anti-pattern
| tptacek wrote:
| You've lost me, because the point of having a handler
| interface is exactly to be able to do things like ensuring
| that your "authentication success" events get stored
| differently than your debug messages.
|
| You can see upthread that the trend is kind of in exactly
| the opposite direction to what you're advocating for:
| plenty of people think there's no such thing as a log or an
| audit record, only "events", expressed as otel spans. Have
| fun with those people! :)
| reggieband wrote:
| I see some asking for stack support in the logs, and that is
| something I would be looking for. Often when an error happens on
| a server it might be inside some generic utility. At least with a
| stack, you might realize that all of the calls to that utility
| function are coming from the same location a few steps up the
| stack. In my experience that can significantly speed up
| identifying the root cause of issues.
|
| I'm not sure I like the way they are handling attributes either.
| One pattern I strongly discourage on projects I work on is
| formatting data into the log message. When I've had access to
| decent log aggregation tools (e.g. Splunk, Datadog) I like to
| generate reports on error messages. That is usually possible with
| regex magic but in my experience it is much much cleaner to have
| a static unchanging message for all messages. I then store all
| additional details in a separate part of the log line. I tend to
| prefer JSON for those details since it is well supported and even
| parsed by default by many tools. Another benefit of JSON for
| those additional details is that it is supports arrays and hash-
| tables. The attribute system proposed here doesn't seem to have
| that same hierarchical attribute support.
|
| IMO, unified, clean and consistent logging is one of the most
| underrated techniques for maintaining long running services.
| didip wrote:
| If Zap and Zerolog can agree on a common interface, I think the
| rest of Go world would follow.
| xani_ wrote:
| Seriously, Zap nails most of what I want from logger and the
| remaining few % isn't exactly hard to add on top.
|
| Could maybe get a bit of ergonomics pass but that's it
| glenjamin wrote:
| I'm increasingly of the opinion that we should stop using
| "logging" libraries, and instead we should produce telemetry via
| events using something like OpenTelemetry's trace API.
|
| These traces can be rendered as strings for people who like their
| logs in files
| chrsig wrote:
| I'm inclined to agree.
|
| Traditional logging has a number of problems. Developers tend
| to just stick `log.Info` (or whatever equivalent) anywhere.
|
| Using an http server as an example:
|
| The result is that one http request may produce dozens of log
| messages. But because multiple http requests are running at the
| same time, they all get interleaved in the actual log.
|
| They can be tied together after the fact if you're using a
| request id, but that requires post processing. It can be
| difficult to know when a request ends, so that means there
| needs to be an explicit "end of request" type message,
| otherwise assume that it could still be active until the end of
| the file...unless your log is rotating, in which case it could
| be in the next file, or the next.
|
| On top of that, it's a big performance hit. On every log
| message, you're serializing the message, acquiring a mutex,
| flushing it out to disk. Depending on the library, some of that
| may or may not get evaluated depending on message severity.
| It's a non-trivial overhead, and individual messages can cause
| quite a bit of contention for that mutex.
|
| Using something more like tracing, a log message gets attached
| to a context for the task, and it's not until the task is
| complete does anything get logged. This makes for a single log
| message that has a list of all the individual informational
| messages, as well as other metadata about the request. One
| mutex acquisition, one serialization, one file flush.
|
| It's at that point where tracing and logging are arguably the
| same thing.
| xani_ wrote:
| > Using something more like tracing, a log message gets
| attached to a context for the task, and it's not until the
| task is complete does anything get logged. This makes for a
| single log message that has a list of all the individual
| informational messages, as well as other metadata about the
| request. One mutex acquisition, one serialization, one file
| flush.
|
| That's kinda terrible design. You don't see anything in
| progress, only if it finishes, and anything in progress just
| disappears, never to be seen again, if app crashes.
|
| Splitting to event per action in a given context is a
| feature, not a bug
| wokkel wrote:
| Just look into log4j: mdc/ndc to include context
| (user/session), include the thread name in your pattern and
| use an async appender to take care of performance. All of
| these things have been solved already.
| chrsig wrote:
| > All of these things have been solved already.
|
| I mean...yes, that's how I was able to highlight the
| benefits of having a single log message per request.
|
| > Just look into log4j: mdc/ndc to include context
| (user/session), include the thread name in your pattern and
| use an async appender to take care of performance.
|
| At a glance this looks like it will still produce 1 log
| line per logging call, versus buffering with the request.
| adding the thread id just helps stitch them together after
| the fact. This doesn't get rid of any performance overhead
| -- it may get it out of the way of a user request, but it's
| still there under the hood.
|
| In the go case, using the runtime/trace library as an
| example[0], a `context.Context` is passed down the call
| stack which gets decorated with trace spans/regions/log
| messages on the way down. At the end of a task, all of the
| collected context gets written to the active trace log.
|
| [0] https://pkg.go.dev/runtime/trace
| wokkel wrote:
| I like my log to include where something happened and in
| some complex cases having a few thousand lines of logging
| info per request is not unreasonable. So stuffing al that
| info into a single log event/line doesn't make sense to
| me. I recently had the unfortunate situation where two
| seemingly independant sessions were interfering, here a
| sequential logfile where things were logged in the order
| occurring was the only way to correlate them. Even though
| we do telemetry via events, old school logfiles are a
| godsend to debug the complex cases.
| GeneralMayhem wrote:
| > But because multiple http requests are running at the same
| time, they all get interleaved in the actual log.
|
| Many logging libraries will export a structured preamble
| including the timestamp and PID, which is enough to
| reconstruct single-threaded time histories when you want
| them, while still allowing you to see cross-thread time-
| correlated issues when you need that as well.
| KronisLV wrote:
| > It's at that point where tracing and logging are arguably
| the same thing.
|
| On some level I agree with this, but honestly tracing doesn't
| have nearly the same adoption that logging has (at least in
| my experience).
|
| Why? Because generally it's easier to approach logging, even
| if in ways that don't necessarily scale that well: like
| getting your application writing logs to a file on the server
| directory.
|
| Note: I mention Java here, but a lot of it also applies to
| Node/Python/Go/Ruby etc.
|
| Eventually they can be moved over to a networked share, have
| Logrotate running against them, or perhaps have proper log
| shipping in place. That is perhaps the hardest aspect, since
| for tracing (outside of something like JavaMelody which is
| integrated inside of your app) you'll also need to set up
| some sort of a platform for the logs to be shipped to and
| processed in any number of ways.
|
| Personally, I've found that something like Apache Skywalking
| is what many who want to look into it should consider, a
| setup utilizing which consists approximately of the
| following: - some sort of a database for it
| (ElasticSearch recommended, apparently
| PostgreSQL/MariaDB/MySQL viable) - the server
| application which will process data (can run as an OCI
| container) - the web application if you need an
| interface of this sort (can run as an OCI container) -
| an agent for your agent of choice, for example a set of .jar
| files for Java which can be setup with -javaagent -
| optionally, some JS for integrating with your web application
| (if that's what you're developing)
|
| Technically, you can also use Skywalking for log aggregation,
| but personally the setup isn't as great and their log view UI
| is a bit awkward (e.g. it's not easy to preview all logs for
| a particular service/instance in a file-like view), see the
| demo: https://skywalking.apache.org/
|
| For logs in particular, Graylog feels reasonably sane, since
| it has a similarly "manageable" amount of components, for a
| configuration example see:
| https://docs.graylog.org/docs/docker#settings
|
| Contrast that to some of the more popular solutions out
| there, like Sentry, which gets way more complicated really
| quickly: https://github.com/getsentry/self-
| hosted/blob/master/docker-...
|
| For most of the people who have to deal with self-hosted
| setups where you might benefit from something like tracing or
| log shipping, actually getting the platform up and running
| will be an uphill battle, especially if not everyone sees the
| value in setting something like this up, or setting aside
| enough resources for it. Sometimes people will be more okay
| with having no idea why a system goes down randomly, rather
| than administering something like this constantly and
| learning new approaches, instead of just rotating a bunch of
| files.
|
| For others, there are no such worries, because they can open
| their wallets (without worrying about certain regulations and
| where their data can be stored, hopefully) and have some
| cloud provider give them a workable solution, so they just
| need to integrate their apps with some agent for shipping the
| information.
|
| For others yet, throwing the requirement over to some other
| team who's supposed to provide such platform components for
| them is also a possibility.
| phillipcarter wrote:
| > Why? Because generally it's easier to approach logging
|
| That's due to a significant head start. When one practice
| has more than a decade to mature by spreading across an
| ecosystem and becoming more ergonomic, it's definitely
| going to be easier to adopt.
|
| However, tracing as come a LONG way with OpenTelemetry even
| in the past year. There's still a big mountain to climb,
| but making tracing better/more widespread and providing on-
| ramps to metrics and logging systems is exciting.
| nine_k wrote:
| Logging to syslog already solves the problem of asynchronous
| / streamed logging.
|
| Structuring log messages better than an arbitrary text string
| is harder (even though I heard the widespread systemd offers
| a way). Mostly it's because it's hard to agree upon a
| universal standard. Logging in a self-describing structure is
| basically the same as logging in JSON (which is not unheard
| of). Maybe you could do a bit better logging in protobuf or
| thrift records.
| r3trohack3r wrote:
| > Mostly it's because it's hard to agree upon a universal
| standard
|
| This is one of the things I've appreciated in my corner of
| the Nodejs ecosystem. It's fairly standardized around pino
| and bunyan. With async hooks you can instrument all
| downstream async APIs too and associate them with the
| incoming request. So, by default, in my services all logs
| are associated with their incoming requests.
|
| That being said - I'm inclined to agree that, as the
| distributed tracing ecosystem becomes increasingly mature,
| it really is eating the use case of logs. The real trick
| with tracing is being able to associate a spike in latency
| for 5xx requests in metrics with all their associated
| traces. With logs I can aggregate all 5xx logs, filter by
| latency, and get the subset of logs for the high latency
| requests to dig in. In more mature stacks, I can even
| conditionally increase log sampling on-demand for those
| requests to get a better view into the problem (especially
| helpful for rare events that are unlikely to make it past
| the sampler).
|
| I'm not good enough with traces yet to do this. Today I
| have to start with logs to find the request IDs to get to
| the traces.
| dfee wrote:
| My God. Logging in protobuf?
|
| Logging is the lowest of all debugging utilities - its the
| first thing you _ever_ do writing software - "hello world".
| And, while I admire structural logging, the truth is
| printing strings remains (truly) the lowest common
| denominator across software developers.
|
| I often see confusion on how to interpolate variables - let
| alone zero standardization on field formatting, and poorly
| managed logging infra, too.
|
| Adding another barrier to entry (protobufs) isn't going to
| drive better usage. Any solution I could prescribe would
| inevitably chop off the long tail of SWE (raise the common
| denominator), and I think that's going to be quite an
| unpopular position to advance in any established company.
|
| To be clear: structure is good, our expectations of how to
| structurally log are too low, and introducing a build step
| to compile PB to native, and then dealing with a poorly
| generated (my opinion) API to log, sounds like a miserable
| experience.
| arinlen wrote:
| > _My God. Logging in protobuf?_
|
| Yes, or any other data format and/or transport protocol.
|
| I'm surprised this is up for debate.
|
| > _Logging is the lowest of all debugging utilities - its
| the first thing you ever do writing software - "hello
| world". And, while I admire structural logging, the truth
| is printing strings remains (truly) the lowest common
| denominator across software developers._
|
| This sort of comment is terribly miopic. You can have a
| logging API, and then configure your logging to transport
| the events anywhere, any way. This is a terribly basic
| feature and requirement, and one that comes out of the
| box with some systems. Check how SLF4J[1] is pervasive in
| Java, and how any SLF4J implementation offers logging to
| stdout or a local file as a very specific and basic
| usecase.
|
| It turns out that nowadays most developers write software
| that runs on many computers that aren't stashed over or
| under their desks, and thus they need efficient and
| convenient ways to check what's happening either in a
| node or in all deployments.
|
| [1] https://www.slf4j.org/
| chrsig wrote:
| Syslog doesn't solve any of the problems outlined above -
| you're just trading a file for a unix domain socket. And it
| introduces it's own problems, such as sharing that socket
| with the containers (if you just mount /dev/log into the
| container, you're gonna have a bad time when the syslog
| daemon restarts). Log rotation also becomes a problem at
| higher volumes. Want a 5 minute rotation? logrotate isn't
| of much help.
|
| re: logging in protobuf/thrift. Thrift is more well suited
| towards it due to it's stream friendliness. With protobufs,
| you can do better performance wise than json, but does tend
| to be a bit more annoying because you need to write a
| length header between messages. (at this point, I've
| considered logging directly into a `.tar` file with one
| file per request. I haven't ever done it, and wouldn't in
| production, but it's an intriguing thought to me.) With
| either format, you have to deal with the fact that they're
| now binary logs, and you lose out on the ability to use a
| number of tools like grep & awk, or you have some program
| to cat the log as plain text.
|
| Logging to json, while it's a bit more overhead, offers
| access to a _lot_ of tooling which is lost with binary
| logging or unstructured logging. Athena, snowflake, sqlite,
| kafka /ksql, jq, it's incredibly ELK friendly...The list
| goes on. In general, it's usable without the message
| definitions, so it can be much easier to use.
|
| All of these tradeoffs wildly depend on what it is that
| you're logging and what your needs are.
| rcarmo wrote:
| Many tracing libraries get their data off logging libraries.
| Logging doesn't always mean "whoa, let's stop everything and
| write to a file", it is often a matter of buffering data or
| increasing counters until data can be sent out in various
| fashions.
|
| Log files are just one of those.
| glenjamin wrote:
| The key difference is my experience is not about destination
| - backends are as flexible as you say
|
| One key difference is having an event name + properties,
| rather than a human-readable description and maybe some
| fields if you're lucky
|
| Another key difference is wrapping units of work rather than
| just scattering log lines in places you think need them
| somewhat arbitrarily
| rcarmo wrote:
| Most decent logging frameworks allow for structured output.
| That change came about roughly when NewRelic became "the
| thing" to use.
| jayd16 wrote:
| >One key difference is having an event name + properties
| [...] Another key difference is wrapping units of work
| [...]
|
| Modern "enterprise" logging libraries already work this
| way, don't they?
| tptacek wrote:
| The otel trace API is nowhere near ergonomic enough to replace
| logging. I think people mostly tolerate otel because the
| idiomatic way to use it is just a funcall and a defer at the
| top of a subset of your functions, and because if you're otel-
| instrumenting, you're probably already passing a context.
|
| https://pkg.go.dev/go.opentelemetry.io/otel/trace
| phillipcarter wrote:
| FWIW tracing in Go is primarily made more difficult by having
| explicit context propagation and no helper to auto-close a
| span based on scope. If you use Python or .NET for example,
| it's implicit and you can "just create some spans" like you
| would a structured log somewhere in your app, then use built-
| in language constructs to have it handle closing for you.
| xani_ wrote:
| ...the helper like WithCancel/WithDeadline ?
| Rapzid wrote:
| .Net has task-local storage which is pretty amazing and
| Seri log made great use of it(my all-time fav structured
| log lib).
|
| But iirc task local is a perf concern, and JavaScript has
| yet to have anything comparable that's ubiquitous(nodejs
| async hooks though so one day maybe).
|
| So, explicit contexts for Go and JavaScript unfortunately.
| phillipcarter wrote:
| JS does have convenience APIs in the form of
| _startActiveSpan_ and _getActiveSpan_ that take care of
| most of the pains here. There 's likely some scenarios
| where you can't use these (e.g., instrumenting a tool
| must use the sdk-trace-base package) but I think in most
| cases they help a lot. Go doesn't have these kinds of
| convenience APIs though.
| tptacek wrote:
| Sure. If it was a different language, or a different otel
| library, my opinion would probably be different. I do not
| think it's realistic to replace logging with otel in Go,
| though.
| glenjamin wrote:
| For web-style application code my experience writing Go
| code says that you totally can stop using logs and just
| use traces and have just as much visibility into a
| running process, if not more.
|
| For library code or other deployment models I'm not sure
| - it's not something I've had a need to experiment with
| phillipcarter wrote:
| I think it's an inclusive OR here. When logging
| stabilizes in the Go SDK (which will be a while, perhaps
| a year) the idea will be that you can keep all your
| existing logging and then start to add tracing
| judiciously so they're easily correlated with requests. I
| think the juice is worth that particular squeeze for more
| folks at least.
|
| From there I would personally advocate for adding more
| manual traces rather than more logs, but it's
| understandable if it's a heavier lift to pass context
| around all the time. The more options the better.
| henvic wrote:
| I've played with trying to create an idiomatic wrapper for
| their Go module precisely because of this reason, but then
| didn't find time to continue with this.
|
| https://github.com/henvic/tel
| jyounker wrote:
| Or you include the trace ID as one field in the structured log.
| That ID can be threaded through the log library.
| akira2501 wrote:
| I'm of the opinion that we should use both. They're both
| fundamentally useful and solve separate problems, and I don't
| think there's any resource constraints that would make this an
| issue.
|
| A logging API that just lets me write either a textual string,
| a structured telemetry object, or just both would be ideal.
| Then I can have either source separately, or I can view how
| they generated events collectively.
| phillipcarter wrote:
| OpenTelemetry's design says: why not both! Logging is still
| early days, but the idea is that you can take your existing
| structured logs and have them automatically correlated with
| traces that you can add later.
|
| (there's other use cases some can envision, but I'm team
| "tracing is better than logging", while recognizing that most
| apps don't start from scratch and need to bring their logs
| along for the ride)
| jen20 wrote:
| I'm in the position of working on a green-field system
| currently, and spent a bit of time looking at OpenTelemetry
| both as a logging system and for metrics and tracing. If
| OpenTelemetry can indeed suit that use case today, it's not
| evident from any of the documentation or examples, and the
| naive "go get" approach in a test project led to
| incompatible versions of unstable libraries being pulled
| in.
|
| Further, the OpenTelemetry Go library pulls in all kind of
| dependencies which are not acceptable: a YAML parser which
| doesn't even use compatible tagging and versioning via
| Testify, logr and friends, and go-cmp. Combined with the
| insistence of GRPC on pulling glog and a bunch of other
| random stuff like appengine libraries, you end up with a
| huge dependency tree before you've even started writing
| code.
|
| The result is I'm sticking with zerolog, and a custom
| metrics library, and will let Splunk do it's thing. It's a
| shame, as distributed tracing is very desirable, but not
| desirable enough to ignore libraries that have chosen
| concrete dependencies vs interfaces and adapters.
|
| Fundamental things like this need to be part of the
| standard library, or have zero external dependencies. Rust
| does (in my opinion) a better job here despite requiring on
| average more libraries to achieve things, because features
| can be switched off via Cargo.toml, reducing the overall
| dependency set.
| phillipcarter wrote:
| (note: I'm not a maintainer in the Go SIG so I can't
| speak towards these issues with much authority)
|
| Definitely interested in how things can be improved here.
| What are you generally looking to accomplish? Hopefully
| it's just a matter of docs not accurately expressing
| what's possible today (a consistent theme for quite some
| time that's getting resolved, but it takes time).
|
| As for the dependency issues, is this something you'd
| have the time to file an issue about[0], or make an
| agenda item for the next Go SIG? At the end of each SIG
| there's also a moment where the SIG leader asks for
| stories about adoption (good or bad) and this would
| clearly count. I'd like to think that these issues are
| solvable, or at least solvable enough to make it worth it
| for you to adopt.
|
| [0]: https://github.com/open-telemetry/opentelemetry-
| go/issues?q=...
| jen20 wrote:
| Thanks for the thoughtful response.
|
| I think the aim is to have a single "sink" for metrics,
| logs and tracing, and it's clear that OpenTelemetry is
| aiming to be that. Ideally from a single "go get", one
| should be able to pull in a library without additional
| dependencies that:
|
| - permits recording of structured logs (perhaps the
| stdlib interface under discussion will help here!). From
| a personal perspective I'd rather not have an API that
| accepts key-value pairs as varargs, but if that was the
| only thing available I'd take it I think.
|
| - permits recording of counters, gauges, timers and
| histograms,
|
| - permits recording of spans, and associating both logs
| and metrics with those spans.
|
| The Go library README shows the logs section as "frozen"
| and states that no pull requests to improve logging are
| being accepted. With no visibility into the process
| (which is, to be clear, because I haven't looked into it
| and not the fault of the project per se!), it looks like
| this might be an option in future but not today?
| phillipcarter wrote:
| I see! Yeah, this is one where where otel-go is a lot
| harder to use, but it's something the SIG is looking at.
| A coworker of mine is helping drive a design that's sort
| of an "easy button" to configure all the things with the
| least-surprising defaults[0] and we're seeing how people
| like it in our SDK distribution that uses it[1]. I hope
| that sometime soon we'll have the design polished-up
| enough to get merged in. Like most OSS projects, it'll
| take some time but I'm confident we can get it done.
|
| The main challenge is that there's a large variety of use
| cases to fulfill (e.g., someone wants custom context
| propagation, a custom span processor, and export over
| HTTP+json but not HTTP+protobuf) and today the answer to
| that is that you have to pull in all the libraries for
| all the things you need. It's a lot more energy you need
| to expend to get started with all of this than it needs
| to be.
|
| As for logging support in the Go SDK, it's frozen mostly
| just due to lack of bandwidth and a need to finish what's
| already been started. Metrics have proven to be much more
| difficult and time-consuming to implement correctly
| across all languages, with Go being impacted harder than
| other languages (e.g., Python and .NET). I think you can
| expect logging integrations in the near-ish future
| though.
|
| This is great feedback. I'll pass it on folks who haven't
| seen it. Thank you! And please feel free to file issues
| about all the things that rub you the wrong way
|
| [0]: https://github.com/open-telemetry/opentelemetry-go-
| contrib/p...
|
| [1]: https://github.com/honeycombio/honeycomb-
| opentelemetry-go
| arinlen wrote:
| > _I'm increasingly of the opinion that we should stop using
| "logging" libraries, and instead we should produce telemetry
| via events using something like OpenTelemetry's trace API._
|
| That really depends on what you mean by "events". Even
| OpenTelemetry supports logging events. Logging is not replaced
| by emitting metrics, as they have very distinct
| responsibilities.
|
| > _These traces can be rendered as strings for people who like
| their logs in files_
|
| You're missing the whole point of logging stuff. The point of
| logging things is not to have stuff show up in files. Logging
| events are the epitome of observability, and metrics and
| tracing events end up specializations of logging events which
| only cover very precise and specific usecase.
| glenjamin wrote:
| > Logging events are the epitome of observability, and
| metrics and tracing events end up specializations of logging
| events which only cover very precise and specific usecase.
|
| This is a common way of thinking about things, but having now
| worked with event-based tracing for the last few years I no
| longer believe traditional logs are necessary.
|
| If you do "good" logging you likely have entirely structured
| data and no dynamic portions of the main log message. This is
| almost exactly equivalent to a span as represented by tracing
| implementations.
|
| I have come to believe that these events are the core piece
| of telemetry, and from them you can derive metrics, spans,
| traces and even normal-looking logs.
| arinlen wrote:
| > _This is a common way of thinking about things, but
| having now worked with event-based tracing for the last few
| years I no longer believe traditional logs are necessary._
|
| You are free to use stuff as you see fit, even if you
| intentionally miss out on features.
|
| Logging is a very basic feature whose usecases are not
| covered by metrics events, or even traces. Logs are used to
| help developers troubleshoot problems by giving them an
| open format to output data that's relevant to monitor
| specific aspects of a system's behavior.
|
| It makes no sense to emit a metrics event with tons of
| stuff as dimensions and/or annotations that's emitted in a
| single line of code that's executed by a single code path
| when all you need is a single log message.
|
| Also, stack traces are emitted as logging events, and
| metrics events are pretty useless at tracking that sort of
| info.
|
| But hey, to each its own. If you feel that emitting a
| counter or a timer gives you enough info to check why your
| code caught an exception and what made it crash, more power
| to you. The whole world seems to disagree, given standard
| logging frameworks even added support for logging stack
| traces as structured logs as a basic feature.
| glenjamin wrote:
| A counter or a timer is clearly no substitute for a log,
| that would be an absurd statement for me to have made -
| which is why I said no such thing.
|
| A span as modelled by OpenTelemetry is a a bag of
| developer-chosen structured data containing a name, a
| timestamp and some auto-generated IDs to correlate it
| with others.
|
| It is effectively an open standard for how to collect
| good structured logs - although it is currently not
| commonly pitched that way by either tracing advocates or
| structured logging advocates.
|
| These are commonly used to create trace waterfalls, but
| can also be used to derive metrics or simple dev-debug
| output.
|
| I have used this approach myself to great effect, and
| also spoken to a number of people who have also found
| success with such an approach - but currently the
| momentum of existing approaches (and the fact that
| traditional logging still works well-enough for most
| people) has not led to wide adoption
| cube2222 wrote:
| I agree.
|
| I'm not necessarily a fan of OpenTelemetry, as every time I
| looked at it it was still "in progress" (though the last time
| was almost 2 years ago), but in general I agree.
|
| We started with tracing+logging, but as time passed we moved
| more and more stuff into tracing. But not just span-per-service
| kind of tracing - detailed orchestration on the level of
| "interesting" functions, with important info added as tags.
|
| Right now we use almost exclusively tracing (and metrics ofc)
| and it's working great, even if each trace often has hundreds
| of spans.
|
| In our case we use Datadog, which works well for most bits and
| purposes, but we also send all traces over Firehose to S3 so
| that we can query them with Athena at 100% retention. This way
| you get your cake and can eat it too.
| phillipcarter wrote:
| If you've got the time for instrumenting something relatively
| small, it's definitely worth picking up OTel again to see how
| it works for you. Tracing is stable in most languages (Go
| included) and there's a lot more support in each language's
| ecosystem.
|
| FWIW I think OTel will always be in a state of "in progress"
| though. There's more scenarios and signal types that will get
| picked up (profiling, RUM) and more things to spread across
| the ecosystem (k8s operator improvements, collector
| processors for all kinds of things, etc.) and more
| integration into relevant tools and frameworks (.NET does
| this today, maybe flask or spring could do it tomorrow). But
| I wouldn't let that stop you from trying it out again to see
| if you can get some value out of it!
| jayd16 wrote:
| I guess Golang is catching up but in something like C# or Java,
| the logging libraries already feel like tracing libraries. They
| already have contexts and structure.
|
| What are you proposing should change? Log files should default
| to a verbose binary representation that captures more? What do
| you think is not captured by today's logging?
| phillipcarter wrote:
| Tracing gives you correlation between "log lines" and across
| process/network boundaries by propagating context for you.
| You could replicate everything that tracing does by manually
| propagating around context-like metadata yourself, stitching
| the log lines together into a "trace". But why not just use
| an SDK that does that for you?
| jbjbjbjb wrote:
| It is a config away in C# world, and even when it wasn't so
| in built it was very simple to add to structured logging.
| phillipcarter wrote:
| In the C# world, tracing is also just a config away. It's
| baked into the runtime. But you don't need to manually
| stitch together logs into a "trace" - it just does it for
| you.
| jbjbjbjb wrote:
| What I meant was you can easily add a correlation id to
| your logs, and send those to wherever you collect and
| query the logs. Then using that you can easily see the
| log across your services. So there isn't any manual
| stitching in the logging either.
| azlev wrote:
| Telemetry and logs are distinct. It makes sense to gather
| metrics and it makes sense to catch sporadic messages of what's
| going on in details.
| latchkey wrote:
| While we can debate all day long what should and should not be
| part of the standard lib (bike shed), I just ignore all of this
| and happily use zerolog.
|
| Problem solved, one dependency, it just works. Back to writing
| code.
| jen20 wrote:
| The problem is, it's not one dependency. Several important
| libraries pull in a variety of wrappers and adapters - just to
| name three:
|
| - glog (gRPC)
|
| - logr (OpenTelemetry and friends)
|
| - klog (Kubernetes client)
|
| It's quite conceivable that even having decided to use zerolog,
| you end up with at least four separate logging libraries in
| your dependency tree before having written any of your own
| code.
| latchkey wrote:
| True, but as long as I can either control the logging or
| write an adapter, I'm ok with that.
|
| At the end of the day, I'd still have that issue, unless the
| standard lib was being used everywhere... which is something
| that is equally not controllable.
| jen20 wrote:
| The hope of a standard structured logging interface in the
| standard library is that serious libraries will adapt
| themselves to use it, and the proliferation will disappear
| over time. Whether or not this will actually happen, or
| whether the libraries which do not play ball will have to
| be forked is a different matter.
| latchkey wrote:
| "serious" libraries today all provide the ability to pass
| in adapters.
|
| Having been in the group of a multi-year effort to do
| this in Java (JSR-47), there is always a limitation in
| the API that prevents someone from getting it right.
|
| Funny enough, even though Java ended up with JUL
| (java.util.logging), SLF4J ended up being the defacto
| standard that everyone went with.
| jen20 wrote:
| > "serious" libraries today all provide the ability to
| pass in adapters.
|
| Sadly they don't, unless we count things like gRPC,
| Kubernetes clients and so forth as non-serious.
| tptacek wrote:
| The proposal acknowledges that. Part of the point of working
| this out is that it creates a common interface so that your
| zerolog-logging code plays nice with someone else's hclog-
| logging code.
| latchkey wrote:
| A long time ago, I was part of JSR-47. I wonder if anything
| can be learned from that process (which was a bit of a mess).
| oezi wrote:
| Unless the new generic logging API is designed badly so that
| certain functionalities become hard in the future because
| they deviate.
|
| For instance when Rails did standardize ActiveJob (background
| tasks) they didn't include a way in the API to query the
| current length of a queue of background tasks. Because
| everyone is now only trying to conform to the API accessing
| such information is now hard in many of the implementations.
| motiejus wrote:
| If we ignore the proposal, this gives at least an awesome
| overview of the state-of-the-art of logging in go. That was the
| best part. Thanks for the heads up!
| egnehots wrote:
| _Prior Work_
|
| - The existing log package has been in the standard library
| since the release of Go 1 in March 2012. It provides formatted
| logging, but not structured logging or levels.
|
| - Logrus, one of the first structured logging packages, showed
| how an API could add structure while preserving the formatted
| printing of the log package. It uses maps to hold key-value
| pairs, which is relatively inefficient.
|
| - Zap grew out of Uber's frustration with the slow log times of
| their high-performance servers. It showed how a logger that
| avoided allocations could be very fast.
|
| - zerolog reduced allocations even further, but at the cost of
| reducing the flexibility of the logging backend.
|
| - All the above loggers include named levels along with key-
| value pairs. Logr and Google's own glog use integer verbosities
| instead of named levels, providing a more fine-grained approach
| to filtering high-detail logs.
|
| Other popular logging packages are Go-kit's log, HashiCorp's
| hclog, and klog.
___________________________________________________________________
(page generated 2022-09-11 23:00 UTC)