[HN Gopher] My Logging Best Practices (2020)
___________________________________________________________________
My Logging Best Practices (2020)
Author : isaacfrond
Score : 254 points
Date : 2021-10-05 13:52 UTC (9 hours ago)
(HTM) web link (tuhrig.de)
(TXT) w3m dump (tuhrig.de)
| yongjik wrote:
| Also, it goes without saying, but never buffer logs. Each line
| should end up with a write(2) call or something equivalent. If
| you buffer logs, and the process crashes, you lose logs exactly
| when you really really need it.
| Spivak wrote:
| I agree with you in theory if resources were infinite but there
| is no way to log at the rates developers actually want to log
| and have a performant application without buffering.
|
| I think the more practical advice is "buffer logs outside your
| application" or "buffer logs in a separate thread/process that
| crashes independently of your application code."
|
| Even the small places I've worked could easily push 250G logs a
| day.
| yongjik wrote:
| I guess "buffering in a separate process" can be handy, esp.
| if you have strict performance requirements, but I guess
| using shared memory correctly can be tricky...?
|
| (Without shared memory, I imagine you'll have one system call
| per log anyway - in which case it's not much different from
| calling one write() per line.)
| TameAntelope wrote:
| I love these suggestions, but what is most important is that you
| make these choices (or others) and stick to them.
|
| Consistency in how you log is probably Best Practice #0, IMO.
| adrianmsmith wrote:
| > Log after, not before
|
| My problem with that is what if the program crashes or produces
| an error. It's good to know what the program was attempting to
| do.
|
| Imagine the output Successfully connected to
| DB. Successfully read config file. Uncaught
| Exception: NullPointerException
|
| vs. Will connect to DB... Will read
| config file... Will connect to API server...
| Uncaught Exception: NullPointerException
| jlokier wrote:
| I've found the most useful log before/after pattern is a
| little-known dynamic variant.
|
| Imagine this success output, at a low (not tracing) log level:
| Opened DB connection
|
| Error output _at the same logging level_ :
| Opening DB connection Opening TCP connection to
| 127.0.0.1:5678 Error: No more file descriptors
| Error: Failed opening TCP connection Error: Failed
| opening DB connection
|
| Crash output _at the same logging level_ :
| Opening DB connection Opening TCP connection to
| 127.0.0.1:5678 Starting TLS negotiation
| Fatal: Segmentation fault <stacktrace...>
|
| Success output with tracing turned up and no error:
| Opening DB connection Opened TCP connection to
| 127.0.0.1:5678 TLS negotiation completed in 0.1ms
| DB version 5.1 features A, B, C Opened DB connection
|
| The key feature here is that there's a scoped before/after
| message pair ("Opening DB connection" and "Opened DB
| connection"), and the before part of the pair is output _if and
| only if_ any messages are output in between.
|
| The contour around messages in between is visible (here with
| indentation), and the before message is _not_ shown if there
| are no intermediate messages. Sometimes it 's useful to show a
| different after-message if the before/after pair is not output,
| to include a small amount of information in the single message,
| or just to improve the language in the quieter case. Sometimes
| it's useful to suppress the after-message just like the before-
| message, if there is nothing in between.
|
| Anything that might terminate the program, such as the
| Segmentation fault with stacktrace, counts as an intermediate
| message, and so triggers the output of all the before messages
| as part of the termination message logging. For this to work,
| of course you must have some way for the before messages to be
| queued that is available to the crash handler.
|
| This is a poorly known technique, which is a shame as in my
| opinion it's a very helpful technique.
|
| Its chief benefit isn't saving a few messages, it's that it
| _scales_ to complex, deeply called systems with heavy amounts
| of tracepoints encouraged throughout the code. When studying
| issues, you may turn on tracing for some components and
| actions. As you turn on tracing for a particular component,
| before-messages from _caller_ components become activated
| dynamically, ensuring there 's always context about what's
| happening.
|
| In case you're wondering, it works fine with multiple threads,
| async contexts, and even across network services, if you have
| the luxury of being able to implement your choice of logging
| across services, but of course there needs to be some context
| id on each line to disambiguate.
| spaetzleesser wrote:
| Agreed. Logging before gives you a lot of information to
| diagnose an issue. Logging just exceptions usually lacks a lot
| of necessary information
| goohle wrote:
| Just print out all necessary information, in case of
| exception, and suggest a potential solution to the problem.
| slaymaker1907 wrote:
| This does not work in practice a lot of the time. What if
| the problem was actually due to some other thread and said
| other thread didn't crash? If you only log after
| success/failure, you would have absolutely no idea why the
| program crashed. Oftentimes if the actual error were
| anticipated enough for good reporting, said error would not
| exist in the first place.
| AlexCoventry wrote:
| Would it make sense to log a traceback, on unexpected errors?
| jayknight wrote:
| I think he is assuming that the logged exception will include
| the details needed to track down the error (line number, stack
| trace)
| bobbyi wrote:
| Yes. I think logging before is still useful, but it is for
| hangs, not exceptions.
|
| If your program disappears into a black hole doing some
| operation that doesn't have (and needs) a timeout specified,
| you want to know what it was about to do when you last heard
| from it.
| adrianmsmith wrote:
| Of course in an ideal scenario that's the case. Nevertheless
| programs do sometimes just "crash" without an error message,
| or maybe they hang and never produce any output due to an
| infinite loop, or excessive memory allocation and GC
| overhead, etc. Logging what the program was attempting to do
| is useful in those scenarios.
| raldi wrote:
| That's what the DEBUG level is for, as explained at the end of
| the section you're referring. So it would be like:
| D: Connecting to DB D: Reading config file D:
| Connecting to API server D: Sending updated record
| I: Updated user record for $USER
|
| Or, if it fails, the last line is replaced by:
| E: Failed to connect to API server; could not update user
| record. [IP=..., ErrNo=..]
| mirekrusin wrote:
| Debug level is not for production, it's for debugging.
| yongjik wrote:
| Personally I'm not a fan of DEBUG level (unless you always
| keep it on). You can have a transient system issue that
| appeared for 30 minutes and then went away. If DEBUG logs
| weren't turned on at the time, they won't help me.
| Jowsey wrote:
| Why not have the program write logs to a temp file
| somewhere at trace/debug level, and only output the
| standard level to the console?
| Cicero22 wrote:
| Wouldn't that end up slowing down the whole system from
| writing to a file though?
| Jowsey wrote:
| Ah, suppose it depends on how much you're doing and/or
| how verbose your logs are. If you've got lines speeding
| past then yeah I maybe wouldn't write them all.
| grey-area wrote:
| Writing verbose logs is very expensive.
| worik wrote:
| It depends which costs what.
|
| In a case where there are plenty of resources and the
| thing you are logging is very heavy then even verbose
| logging is trivial.
|
| If you are beset by failures it is much cheaper than
| making your developers guess.
| raldi wrote:
| Of course you always keep it on -- the point of the
| different levels is that you can filter them out at _read_
| time (e.g. to eyeball logs, or to alert when the rate of
| >= WARN is too high)
| codemac wrote:
| This traditionally has not been log level's use case.
|
| syslog() and setlogmask() being the obvious examples.
|
| Many (if not all) companies I have worked out filtered
| out everything DEBUG at compile time to improve
| performance in production as well.
|
| The best option I've seen is putting everything in DEBUG
| or lower into a memory ring buffer per process that's
| dumped and collated with other logs at each ERROR/FATAL
| log.
| hurflmurfl wrote:
| I've actually used this idea in production and it worked
| great.
|
| In my web app if nothing unexpected happens, only
| INFO/LOG level stuff is pushed to logs. If, however, I
| get an exception, I will instead print out all the
| possible traces. I.e. I always store all the logs in-
| memory and choose what should be printed out based on the
| success/failure of the request.
|
| Now, of course, this is just a web API running in AWS
| Lambda, and I don't have to care overly much about the
| machine just silently dying mid-request, so this might
| not work for some of you, but this works great for my
| use-case and I'm sure it will be enough for a lot of
| people out there.
| spockz wrote:
| That is actually quite nifty. That way you can carry
| debug logs and only materialise/send them in case of
| errors when you are likely interested in it without
| having to put those logs at error and see them all the
| time etc.
|
| Does anyone know of an implementation for this in any of
| the Java logging frameworks?
| masklinn wrote:
| Fwiw sentry does that out of the box, anything below
| warning (I think) is used as context for "actual events".
| snak wrote:
| Performance degradation is noticeable with DEBUG level in
| production.
| raldi wrote:
| Performance degradation is noticeable with excessive
| logging; the level is irrelevant.
| mplewis wrote:
| Leaving debug levels on in production will bankrupt most
| companies.
| raldi wrote:
| Addressed here:
| https://news.ycombinator.com/item?id=28764834
| dezzeus wrote:
| Even though it can be lost by an optimistic configuration, I
| prefer to use the TRACE level(s) and leave the DEBUG one for,
| well, debugging.
| raldi wrote:
| We're discussing a NullPointerException situation; how is
| that not debugging?
| Leherenn wrote:
| I think OP's case was not debugging a crash, just that if
| a crash happens he would like the log to be before rather
| than after. The log statements would be there no matter
| whether there is a crash or not, just in case.
| BrandoElFollito wrote:
| I prefer to have both information (the actual logs would be
| structured) Connecting to DB
| Connected to DB - OK Reading config file Read
| config file - OK
|
| Even better, but hardly doable because of the nature of logging
| Connecting to DB ... OK Reading config file ... OK
| kulikalov wrote:
| Performance though?
| worik wrote:
| Diagnosis of failure?
| whatupmd wrote:
| Users sure do appreciate people who put love into their logging.
|
| Remember that just because you love to hear what your application
| is doing, it's not always be helpful or useful to your user. Give
| consideration to the Rule of Silence
| (http://www.linfo.org/rule_of_silence.html) and even if you do
| not agree with it remember that some of your users might.
| [deleted]
| ryandrake wrote:
| One thing I noticed in one of the article's examples is that the
| log contained an example user's PII: INFO |
| User registered for newsletter. [user="Thomas",
| email="thomas@tuhrig.de"]
|
| Ruh-roh! You now have a potential privacy incident ready to
| occur, if an inadvertent log leak ever happens. I think, at least
| these days, one vital component of any logging API is the ability
| to flag particular bits of information as PII so that it can get
| handled differently upon output, storage, or ingestion by other
| tools. For example, if the users's actual E-mail address is not
| specifically useful for debugging, do you need to display it in
| the default visualization of the log?
|
| I think in 2020, with companies' privacy practices getting more
| and more scrutiny, any logging best practices doc should at least
| mention treatment of PII.
| tobbebex wrote:
| I used to work with a medical software where we turned off
| logging completely in production, to mitigate the risk of
| accidentally log patient information.
|
| We needed to log that kind of data during development and just
| tagging what messages were sensitive would have been too error
| prone, so we couldnt reliably have any logs once we went live.
|
| We did however have a special crash handler that logged the
| exception type and callstack, but not even the exception
| message could touch the harddrive.
| slaymaker1907 wrote:
| What I find works best is to avoid logging raw datatypes and
| instead limit it to objects/structs with custom toString. Then
| just make sure any PII fields are not included with toString
| either through some sort of annotation or by writing the
| implementation manually. This also encourages minimizing the
| code that actually interacts with PII such as replacing emails
| with user IDs ASAP.
| mholt wrote:
| Ah yeah, that's why in Caddy 2 while we do emit full details
| from HTTP requests in our structured JSON logs, you have the
| ability to traverse into each log message and replace or delete
| data within it:
| https://caddyserver.com/docs/caddyfile/directives/log#filter
| kordlessagain wrote:
| This reminds me of the time a developer turned on debugging to
| fix some credit card thing and was sending us credit card
| numbers in their logs.
| pugworthy wrote:
| Agreed.
|
| I found out a few years back that a large industrial device our
| company made was recording all users passwords due to a logging
| statement that was blindly just logging every piece of user
| input via UI.
| cloverich wrote:
| Don't forget exit logging[1], I am a bit ashamed how far into my
| career I was before understanding this concept.
|
| [1]: https://pinecoder.dev/blog/2020-04-27/node-exit-logging
| boffinism wrote:
| "Best Practices" really is synonymous with "Personal Preferences"
| if it can work in headlines like this, isn't it?
| the_gipsy wrote:
| The author does make some effort to explain why.
| SideburnsOfDoom wrote:
| That author however has some reading to do around structured
| logging and related practices, which might raise their
| personal "best".
| kleiba wrote:
| "my" + "best practices" = "personal preferences"
| worik wrote:
| "my" + "personal preferences" = "best practices"
| jonnycomputer wrote:
| I like this post, and I think the recommendations are sound. I'm
| not quite sold on the technical vs business logic distinction
| between debug and info. I use debug for all those messages I only
| want to see if something has gone wrong, and info for all those I
| want to see to know its working. That _might_ fall along the
| business vs technical distinction, but it might not.
| camel_gopher wrote:
| } catch(e: UnauthorizedException) { log.warn("Request to REST API
| was rejected
|
| Don't throw exceptions for auth failures.
| rond2911 wrote:
| I was expecting way too much in this article. But, I don't see
| anything about logging requestId's etc.
|
| Sharing a few things briefly below I have learned over the past
| few years -
|
| 1. Log every single message for a transaction/request with the
| requestId to look at the entire picture. Ideally, that requestId
| would be used upstream/downstream too internally for finding
| exact point of failure in a sea of micro services. 2. Log all
| essential id's/PK's for that particular transaction. - makes
| debugging easier. 3. Log all the messages possibly in a json
| format, if the log aggregator you are using supports that.
| Parsing later for dashboards makes lives WAY EASIER and is more
| readable too. Might also reduce overall computations being run on
| the cloud to extract values you want -> hence, lesser cost. 4.
| Having error/warn logs is good, but having success/positive logs
| is equally important imo. 5. Oh, and be very very careful about
| what you end up logging. We don't want to log any sort of user
| PII. At one of my previous companies, there were mistakes made
| where we were logging user's phone numbers, addresses etc.
|
| I am sure the community here already knows about these, and might
| have even more to add. Would love to hear what other people are
| doing.
|
| Cheers.
| slaymaker1907 wrote:
| I completely disagree with Thomas' first point. It is better IMO
| to always log before you do something so that if you crash during
| whatever was happening, you know what you were trying to do.
| Additionally, if you have enough logging, you can figure out that
| some call did succeed by implication so long as you have access
| to the source code/know what the order of high level operations
| are.
|
| If for some reason you can't easily figure out whether something
| succeeded, you should probably log before and after the
| operation. I personally try to avoid doing this since it can
| clutter up the log, but it can be useful depending on how code is
| structured.
|
| Another minor point is that I prefer to have INFO cram as much
| information as possible with DEBUG reserved for stuff which is
| ignored unless you explicitly turn it on either due to lack of
| utility vs cost.
| gwbas1c wrote:
| Don't SPAM the logs.
|
| You don't need to log every method call, every variable, every
| "if" statement, ect. That's what the debugger is for.
|
| And, I shouldn't need to say it: Don't log passwords or other
| sensitive information.
|
| Also: Make sure your logger is configured correctly. Log4Net used
| to (and it still might) calculate an entire stack trace with each
| call to the logger. It can be the single highest consumption of
| CPU in an application; even if you aren't actually logging stack
| traces.
| TameAntelope wrote:
| If you're using a tool like Splunk, verbosity of logging is
| usually not much of a problem.
| albertgoeswoof wrote:
| But if I don't log everything, why did I build a massive ELK
| stack for my big data logs
| tombert wrote:
| I'm surprised there's no mention of tagging with some kind of
| correlation ID. I find that even generating a quick one-off GUID
| at the beginning of the request (to be used across the entire
| request) makes tracing server logs substantially easier. I just
| tag every log (and exception) with that ID, so when searching
| with grep or Splunk or whatever, I have a quick and easy way to
| disambiguate which log is connected to other logs.
| wainstead wrote:
| And there's a standard for it: https://www.w3.org/TR/trace-
| context/
|
| Many libraries and tools support the standard now.
| tombert wrote:
| And now I feel embarrassed for not knowing about
| this...Thanks for the link!
| wainstead wrote:
| Don't, I only learned about it recently, and the standard
| itself is quite young!
| kerblang wrote:
| Just an opinion, but I tend to think of this as distributed
| monolithic development... the more log correlation is needed,
| the more likely a requirement has been chopped up and then
| duct-taped back together over HTTP.
| jniedrauer wrote:
| What about distributed systems? I had to reach for this tool
| recently to trace requests as they propagate through up to 5
| separate devices. Not everything is a web app.
| NovemberWhiskey wrote:
| Not sure I follow that logic?
|
| If you have a heavily used service, it's likely to have many
| concurrent requests logging into the same log stream and you
| absolutely need a request correlation identifier to be able
| to thread the log statements for a single request back
| together.
|
| That's essential even if you don't actually call any other
| services as part of your implementation.
| comeonseriously wrote:
| > Log after, not before
|
| I'd say this is good for production. During development, I
| sometimes like to log the name of the function I'm in as the
| first line of the function. That way I'm assured to know if the
| function was even called. I will also put before and after logs
| if I want to see how data is changed (Yeah, there are things
| called debuggers that let you watch variables, but sometimes a
| log is just so much easier - such as in heavily threaded code or
| code with a number of timers and events firing).
| raldi wrote:
| That's why the author wrote:
|
| _> I make exceptions for DEBUG._
| jonnycomputer wrote:
| Yes, but I think the author could have expanded on that tbh.
| 8008135 wrote:
| But that makes me engage my monkey brain to decide what level
| and where to put the message. Easier to have a consistent
| rule that all logging is before/after (I'm strongly in the
| before camp).
| progx wrote:
| In asynchronous apps it could be better to log both, before and
| after. You don't know what will be possible logged in the
| maentime.
|
| With info and success level you could log info before and success
| after execution.
|
| In dev mode you show info, in production you could filter
| success.
| chriswarbo wrote:
| Will Sargent did a nice talk on logging at Scalacon. My biggest
| takeaway was that pre-choosing a 'log level' for what to skip is
| making the decision too early (it's hard to guess what messages
| we'll want), and too coarse-grained (we don't want verbose
| messages when we follow the happy-path, but tricky/exceptional
| paths may want more logging).
|
| His suggestion is to accumulate verbose messages in memory/disk,
| so we can _later_ decide whether to dump them out (e.g. in a top-
| level error handler).
|
| The talk suggested an SQLite DB, with dynamically-configurable
| filters for what to keep, etc. but that's a step beyond ;)
| worik wrote:
| I really do not think that logs belong in any sort of database.
| Perhaps there is something I have not thought of?
|
| Structured logs (of course structured) in a simple text file.
| Rotate the files if your antiquated system cannot handle big
| files or if you wildly successful crypto currency scam
| generates many many gigabytes a time period.....
|
| `grep` is your friend!
| adrianmsmith wrote:
| > His suggestion is to accumulate verbose messages in
| memory/disk, so we can later decide whether to dump them out
| (e.g. in a top-level error handler).
|
| Interesting; I've definitely seen that with unit test
| frameworks. Where in addition to assertEquals(...) there was an
| annotate(...) method which just took a string, and was stored
| for that unit test. If the unit test eventually failed, all the
| previous annotations for that unit test were printed, otherwise
| they weren't.
|
| https://metacpan.org/pod/Test::Unit::TestCase#annotate-(MESS...
| bjt2n3904 wrote:
| Another thing to think about is how the log looks on info, vs on
| warn, and debug.
|
| In production, if you set to warn only, will you have enough
| data? If you set to info, will you have too much?
| spaetzleesser wrote:
| I totally disagree with the "log after ". I have seen it multiple
| times where the log ended but it wasn't possible to tell in what
| code section because some call had thrown an exception. I much
| prefer to see a log where I see lines like "attempting to do x
| with parameters a,b,c".
|
| In general I set the bar for " too much logging" very high. It's
| a better problem to have too much log to store and sift through
| vs having no data.
| ziml77 wrote:
| I've also seen issues with logging after.
|
| It feels more natural to log when something is complete instead
| of constantly saying when you're starting to do something, but
| it leads to issues where you have no idea what the system was
| actually doing when it failed. It's very easy to have the
| process die without logging what made it die, especially if
| you're using a language that doesn't provide stack traces or
| you have them turned off for performance.
|
| Logging before an action also means it's possible to know
| what's going on when a process hangs. It might lead me to where
| I should look to resolve the problem instead of forcing me to
| just restart the service and hope for the best.
| raldi wrote:
| The author does log before, just at DEBUG level. In the
| scenario you describe, the information you desire would be
| available.
| SideburnsOfDoom wrote:
| In that case, the stack trace from the exception should be able
| to tell you that you failed on the line _before_ the "log
| after".
|
| It won't be nearly as convenient to dig that fact out, though.
| [deleted]
| pugworthy wrote:
| That assumes that the all fail cases will create an exception
| and properly pass on that a failure occurred, with enough
| information so that the caller can decide what to do next.
| SideburnsOfDoom wrote:
| The assumption upfront is that you don't reach the "log
| after" point. IMHO, given that, not having an exception
| will be rare, and probably involve code that looks "off"
| and fixing that is the first task. YMMV though, depending
| on language
| georgekin wrote:
| Another option is to log after, but in a cross cutting way,
| e.g. using a middleware or filter. Extra properties can be
| added along the way, and if there's an exception the middleware
| can handle it and add the problem to the log event.
|
| The power of structured logging here is in enriching events to
| make them queryable later, so you could e.g. view all events
| for a request.
| xtracto wrote:
| A friend of mine wrote a similar article back in 2006:
| https://nerds.kueski.com/better-logging/
|
| We used that as a standard for logging for at some point, it has
| some pretty good insights.
| macfaran wrote:
| I always try to think of the log like I think of the UI, i.e.
| make it usable with as little knowledge as possible. If the
| user/1st line support can use it without calling you that's the
| goal.
| rhn_mk1 wrote:
| > Log after, not before
|
| This is already weird, what about operations that take a long
| time? If the session is interactive, the user should definitely
| know that nothing hung up. Or is it out of scope somehow?
| raldi wrote:
| The author later explains that in such situations, you log
| before at DEBUG level, and after at INFO.
| namanyayg wrote:
| Yeah, it's best to log BOTH before and after.
| stinos wrote:
| Or if you think both is too much, probably only before? Might
| depend on te application but I've seen enough native
| application logfiles with "did FOO" as very last line meaning
| the crash happened on the action executed next, but that
| wasn't logged because 'log after!'...
| rhn_mk1 wrote:
| Logging before has the disadvantage of being noisier than
| after only, at least if you want to inspect the result of
| the operation.
| piaste wrote:
| I tend to use VERBOSE severity if I'm just logging "we've
| reached this line in the code and local variables are set to
| $whatever".
|
| Since you generally only want to see them when diagnosing a
| frozen job or something equally obtuse.
| SideburnsOfDoom wrote:
| For that there is is usually some log wrapper that times the
| operation, e.g.
| using(_log.TimeOperation("Doing the thing")) {
| DoTheThing(); }
|
| So that you get a start message, and an end message with
| timing, and if you exited with normal control flow vs. with an
| exception.
|
| Author doesn't seem to have reached that level of best practice
| yet.
| m_ke wrote:
| + Use structured logging (ex:
| https://www.structlog.org/en/stable/why.html)
| no_circuit wrote:
| Totally agree. One huge advantage not mentioned specifically at
| that structured logging link is handling privacy. A logging
| pipeline can be set up to have a collector that automatically
| redacts privacy-related or sensitive fields. Only give
| developers access to redacted logs to help prevent leaks.
| xxpor wrote:
| Who would have access in the first place besides said devs?
| wainstead wrote:
| The ops people. In many companies access to production is
| limited.
| [deleted]
| closeparen wrote:
| It's important for data to be limited to those with
| business need to know, which is why developers (who are
| responsible for the application) should have access to
| its data while ops (who are responsible for core
| infrastructure) must not.
|
| I would not put an email address in logs, but definitely
| some kind of account identifier so that we can
| investigate bug reports from specific users.
| Spivak wrote:
| Structured logging is just tracing but less powerful. "Context"
| and "binds" are just poor-man's spans. If you're going to go
| through the effort to redo your whole application's logging
| infrastructure just bite the bullet and use OpenTelemetry.
|
| Edit: Good lord people, set the collector to never sample and
| use one of the many file exporters and you get completely bog
| standard logs but with a huge amount of work already done for
| you and enough context to follow complicated multi-
| threaded/multi-process programs without making your head spin.
| billti wrote:
| Indeed. For those unfamiliar, take a quick read of
| https://github.com/open-telemetry/opentelemetry-
| specificatio...
|
| There are a few concepts to understand, but instrumenting it
| right from the start is valuable as it grows into a bigger
| system. Being a cross-cutting concern, the more code that
| standardizes on this the more useful it will be.
| closeparen wrote:
| Maybe it's just a budget pathology, but we sample our traces.
| Browsing the available traces can be informative, but you
| can't just pull a trace for an arbitrary request. We do keep
| and index all the logs (for a few days).
| jeffbee wrote:
| So much this. What's important to know is that structured
| logging is _as expensive_ as tracing. If you are forced into
| the cost of a trace span for every message, you won't do much
| logging. Maybe this makes sense to Python programmers to whom
| a few microseconds here and there don't seem significant, but
| to C++ programmers it is pure lunacy. The cost of
| informational logging with something like glog or spdlog can
| be as little as a few CPU cycles.
| CodesInChaos wrote:
| I don't see why basic structured logging would be more
| significantly more expensive or complex that unstructured
| logging.
|
| In the simplest case you could just output a sequence of
| values (for example as a csv, or one json array per line),
| instead of an interpolated string. The size and CPU
| overhead of this approach is minimal, but makes it much
| easier and more robust to parse the log afterwards.
|
| Such basic structured logging is quite similar to what the
| article proposes in "Separate parameters and messages", but
| more consistent and unambitious if one of the arguments
| contains your separator character.
| handrous wrote:
| Correctly and safely serializing to CSV and JSON is more
| overhead than spitting out bytes. It's also easier to
| spit or ingest bytes as a stream--yes, there are tools
| that understand streams of json objects, but you still
| have to have all the data in memory at once for a given
| record, to serialize it, and then you've basically got it
| in memory _twice_ at the same time (original + serialized
| text). Your minimum record size to consider becomes
| "this entire object" rather than "this field". You've
| also now got the code for CSV or JSON serialization in
| memory.
| staticassertion wrote:
| Serializing is very fast, and ingesting JSON as a stream
| is trivial when it's newline delineated. No one emits a
| giant array of json log objects, they emit a stream of
| newline delineated json.
|
| You almost certainly also have it in memory more than
| twice since you're copying a buffer out to IO.
|
| If you care about record size use protobuf, compression,
| backend parsers, etc.
| no_circuit wrote:
| Unless we're talking about something performance critical
| up front like an embedded system, odds are by the time an
| organization is going to care about logging performance
| overhead, they probably already have a few engineers
| dedicated to just optimizing logging.
|
| Of course some developers may just go ahead and create a
| JSON/CSV string from scratch, which is about as safe as
| string interpolation instead of parameter bindings for
| SQL statements /s. But provided they're using a proper
| JSON/CSV/proto object builder library that uses String
| references, no data is going to get copied until a
| logging component needs to write it to a file or a
| network buffer. Therefore the only overhead are the
| pointers that form the structure.
| CodesInChaos wrote:
| You can json or csv encode data in a streaming manner,
| you just need to escape some special characters (only "
| for csv) and add a separator. But I never worked on an
| application where log records were big enough for
| streaming to make sense.
|
| If you want to preserve the "one line = one record"
| property typically expected from logs, standard csv will
| not be suitable (since it doesn't escape linebreaks), but
| json without unnecessary linebreaks will handle that
| (known as json-lines).
| jeffreygoesto wrote:
| Wrong. Structured logging in C++ done right [0] is the most
| efficient you can do. Plus you get all mentioned benefits.
|
| [0] https://youtu.be/FyJI4Z6jD4w
| IggleSniggle wrote:
| Structured logging is just taking whatever information _you
| already intended to log anyway_ and giving it a consistent
| structure. Ie, instead of
|
| "$TIMESTAMP $LEVEL: Hey look index $INDEX works!"
|
| you have
|
| $TIMESTAMP $LEVEL $FORMATTER $MSG $INDEX
|
| I'm quite intentionally drawing a parallel to `echo` vs
| `printf` here, where printf has an interface that
| communicates intended format while echo does not.
|
| The only overhead of structured logs done right is that you
| need to be explicit about your intended formatting on some
| level, but now with the bonus of reliable structure.
| Whether you use glog or spdlog or whatever is somewhat
| inconsequential. It's about whether you have an essentially
| unknown format for your data, having thrown away the
| context of whatever you wish to convey, or whether you
| embed the formatting with the string itself so that it can
| later be parsed and understood by more than just a human.
|
| If you're concerned about the extra volume of including
| something like "[%H:%M:%S %z] [%n] [%^---%L---%$] [thread
| %t] %v" on every log entry, then you use eg (in GLOG
| equivalent for your language) LOG_IF, DLOG, CHECK, or
| whatever additional macros you need for performance.
|
| If I'm wrong or just misunderstanding, please do correct
| me.
| jeffbee wrote:
| LOG_IF and VLOG, and related concepts from other
| libraries, have their places but I don't consider not
| logging to be a solution to expensive logging. In fact,
| that was my whole original point.
|
| With structured logging the outputs may be trivial in
| which case the compiler can do a good job. But if you are
| inserting a std::string into a log line and your log
| output format is JSON, there's nothing the compiler can
| do about the fact that the string must be scanned for
| metacharacters and possibly escaped, which will be
| strictly more costly than just copying the string into
| the output.
| bpicolo wrote:
| I think there's room for both.
|
| Tracing is usually sampled, or much more time limited, and
| aggregation tooling isn't nearly as good compared to logs
| with most providers. Much better for deep dives into a narrow
| context. For a single request drilldown or aggregated
| performance/error metrics, tracing every time.
|
| Structured logging tends to give you rich search against
| broader contexts (e.g. long spans of time, many requests in
| aggregate, random non performance/error attributes you may
| care about) without the same concerns of data cardinality.
|
| I felt nowhere near the same power with tracing tooling for
| random broad context as I've had with e.g. Splunk for logs.
| That's not to say you can't somehow have the best of both
| worlds, but I haven't seen it in the ecosystem just yet.
| bcrosby95 wrote:
| I use WARN for "might be a problem". Only ERROR definitely
| requires action.
|
| An elevated number of WARNs indicates a problem. So this is a
| metric we monitor. And obviously ERRORs since that means there is
| a problem.
| radus wrote:
| For python logging specifically, here's an excellent article:
| https://blog.guilatrova.dev/how-to-log-in-python-like-a-pro/
| golergka wrote:
| Great post! It's also the first time that I see a concise
| explanation of the difference between info and debug levels.
| lbriner wrote:
| + Use correlation ids on more complex applications so you can
| track more than a single failed call into an application.
|
| Getting an error like "this thing fell over because of a number
| parsing error" might not be very useful when you can't recreate
| it.
|
| With correlation, you then realise that the user did something
| specific beforehand and then the cause is more obvious.
| ishitatsuyuki wrote:
| Short, clear, and pragmatic. This is the kind of post I'd love to
| see every day.
___________________________________________________________________
(page generated 2021-10-05 23:00 UTC)