hngopher.com

       [HN Gopher] Why pipes sometimes get "stuck": buffering
       ___________________________________________________________________
        
       Why pipes sometimes get "stuck": buffering
        
       Author : tanelpoder
       Score  : 230 points
       Date   : 2024-11-29 16:43 UTC (6 hours ago)
        
 (HTM) web link (jvns.ca)
 (TXT) w3m dump (jvns.ca)
        
       | Twirrim wrote:
       | This is one of those things where, despite some 20+ years of
       | dealing with _NIX systems, I_ know* it happens, but always forget
       | about it until I've sat puzzled why I've got no output for
       | several moments.
        
       | ctoth wrote:
       | Feels like a missed opportunity for a frozen pipes joke.
       | 
       | Then again...
       | 
       | Frozen pipes are no joke.
        
       | hiatus wrote:
       | > Some things I didn't talk about in this post since these posts
       | have been getting pretty long recently and seriously does anyone
       | REALLY want to read 3000 words about buffering?
       | 
       | I personally would.
        
         | CoastalCoder wrote:
         | It depends on the writing.
         | 
         | I've read that sometimes wordy articles are mostly fluff for
         | SEO.
        
           | TeMPOraL wrote:
           | In case of this particular author, those 3000 words would be
           | dense, unbuffered wisdom.
        
           | penguin_booze wrote:
           | Summarizing is one area where I'd consider using AI. I
           | haven't explored what solutions exist yet.
        
       | Veserv wrote:
       | The solution is that buffered accesses should almost always flush
       | after a threshold number of bytes or after a period of time if
       | there is at least one byte, "threshold or timeout". This is
       | pretty common in hardware interfaces to solve similar problems.
       | 
       | In this case, the library that buffers in userspace should set
       | appropriate timers when it first buffers the data. Good choices
       | of timeout parameter are: passed in as argument, slightly below
       | human-scale (e.g. 1-100 ms), proportional to {bandwidth /
       | threshold} (i.e. some multiple of the time it would take to reach
       | the threshold at a certain access rate), proportional to target
       | flushing overhead (e.g. spend no more than 0.1% time in
       | syscalls).
       | 
       | Also note this applies for both writes and reads. If you do
       | batched/coalesced reads then you likely want to do something
       | similar. Though this is usually more dependent on your data
       | channel as you need some way to query or be notified of "pending
       | data" efficiently which your channel may not have if it was not
       | designed for this use case. Again, pretty common in hardware to
       | do interrupt coalescing and the like.
        
         | asveikau wrote:
         | I think doing those timeouts transparently would be tricky
         | under the constraints of POSIX and ISO C. It would need to have
         | some cooperation from the application layer
        
           | jart wrote:
           | The only way you'd be able to do it is by having functions
           | like fputc() call clock_gettime(CLOCK_MONOTONIC_COARSE) which
           | will impose ~3ns overhead on platforms like x86-64 Linux
           | which have a vDSO implementation. So it can be practical sort
           | of although it'd probably be smarter to just use line
           | buffered or unbuffered stdio. In practice even unbuffered i/o
           | isn't that bad. It's the default for stderr. It actually is
           | buffered in practice, since even in unbuffered mode,
           | functions like printf() still buffer internally. You just get
           | assurances whatever it prints will be flushed by the end of
           | the call.
        
             | asveikau wrote:
             | That's just for checking the clock. You'd also need to have
             | a way of getting called back when the timeout expires,
             | after fputc et al are long gone from the stack and your
             | program is busy somewhere else, or maybe blocked.
             | 
             | Timeouts are usually done with signals (a safety nightmare,
             | so no thanks) or an event loop. Hence my thought that you
             | can't do it really transparently while keeping current
             | interfaces.
        
               | jart wrote:
               | Signals aren't a nightmare it's just that fflush() isn't
               | defined by POSIX as being asynchronous signal safe. You
               | could change all your stdio functions to block signals
               | while running, but then you'd be adding like two system
               | calls to every fputc() call. Smart thing to do would
               | probably be creating a thread with a for (;;) {
               | usleep(10000); fflush(stdout); } loop.
        
               | asveikau wrote:
               | Signals are indeed a nightmare. Your example of adding
               | tons of syscalls to make up for lack of safety shows that
               | you understand that to be true.
               | 
               | And no, creating threads to solve this fringe problem in
               | a spin loop with a sleep is not what I'd call "smart".
               | It's unnecessary complexity and in most cases, totally
               | wasted work.
        
         | toast0 wrote:
         | I think this is the right approach, but any libc setting
         | automatic timers would lead to a lot of tricky problems because
         | it would change expectations.
         | 
         | I/O errors could occur at any point, instead of only when you
         | write. Syscalls everywhere could be interrupted by a timer,
         | instead of only where the program set timers, or when a signal
         | arrives. There's also a reasonable chance of confusion when the
         | application and libc both set timer, depending on how the timer
         | is set (although maybe this isn't relevant anymore... kernel
         | timer apis look better than I remember). If the application
         | specifically pauses signals for critical sections, that impacts
         | the i/o timers, etc.
         | 
         | There's a need to be more careful in accessing i/o structures
         | because of when and how signals get handled.
        
           | nine_k wrote:
           | I don't follow. Using a pipe sets an expectation of some
           | amount of asynchronicity, because we only control one end of
           | the pipe. I don't see a dramatic difference between an error
           | occurring because of the process on the other end is having
           | trouble, or because of a timeout handler is trying to push
           | the bytes.
           | 
           | On the reading end, the error may occur at the attempt to
           | read the pipe.
           | 
           | On the writing end, the error may be signaled at the _next_
           | attempt to write to or close the pipe.
           | 
           | In either case, a SIGPIPE can be sent asynchronously.
           | 
           | What scenario am I missing?
        
             | toast0 wrote:
             | > In either case, a SIGPIPE can be sent asynchronously.
             | 
             | My expectation (and I think this is an accurate expecation)
             | is that a) read does not cause a SIGPIPE, read on a widowed
             | pipe returns a zero count read as indication of EOF. b)
             | write on a widowed pipe raises SIGPIPE before the write
             | returns. c) write to a pipe that is valid will not raise
             | SIGPIPE if the pipe is widowed without being read from.
             | 
             | Yes, you _could_ get a SIGPIPE from anywhere at anytime,
             | but unless someone is having fun on your system with random
             | kills, you won 't actually get one except immediately after
             | a write to a pipe. With a timer based asynchronous write,
             | this changes to potentially happening any time.
             | 
             | This could be fine if it was well documented and expected,
             | but it would be a mess to add it into the libcs at this
             | point. Probably a mess to add it to basic output buffering
             | in most languages.
        
           | Veserv wrote:
           | You will generally only stall indefinitely if you are waiting
           | for new data. So, you will actually handle almost every use
           | case if your blocking read/wait also respects the timeout and
           | does the flush on your behalf. Basically, do it synchronously
           | at the top of your event loop and you will handle almost
           | every case.
           | 
           | You could also relax the guarantee and set a timeout that is
           | only checked during your next write. This still allows
           | unbounded latency, but as long as you do one more write it
           | will flush.
           | 
           | If neither of these work, then your program issues a write
           | and then gets into a unbounded or unreasonably long
           | loop/computation. At which point you can manually flush what
           | is likely the last write your program is every going to make
           | which would be a trivial overhead since that is a single
           | write compared to a ridiculously long computation. That or
           | you probably have bigger problems.
        
             | toast0 wrote:
             | Yeah, these are all fine to do, but a libc can really only
             | do the middle one. And then, at some cost.
             | 
             | If you're already using an event loop library, I think it's
             | reasonable for that to manage flushing outputs while
             | waiting for reads, but I don't think any of the utilities
             | in this example do; maybe tcpdump does, but I don't know
             | why grep would.
        
         | vlovich123 wrote:
         | Typical Linux alarms are based on signals and are very
         | difficult to manage and rescheduling them may have a
         | performance impact since it requires thunking into the kernel.
         | If you use io_uring with userspace timers things can scale much
         | better, but it still requires you to do tricks if you want to
         | support a lot of fast small writes (eg > ~1 million writes per
         | second timer management starts to show up more and more and you
         | have to do some crazy tricks I figured out to get up to 100M
         | writes per second)
        
           | Veserv wrote:
           | You do not schedule a timeout on each buffered write. You
           | only schedule one timeout on the transition from empty to
           | non-empty that is retired either when the timeout occurs or
           | when you threshold flush (you may choose to not clear on
           | threshold flush if timeout management is expensive). So, you
           | program at most one timeout per timeout duration/threshold
           | flush.
           | 
           | The point is to guarantee data gets flushed promptly which
           | only fails when not enough data gets buffered. The timeout is
           | a fallback to bound the flush latency.
        
             | vlovich123 wrote:
             | Yes that can work but as I said that has trade offs.
             | 
             | If you flush before the buffer is full, you're sacrificing
             | throughput. Additionally the timer firing has additional
             | performance degradation especially if you're in libc land
             | and only have a sigalarm available.
             | 
             | So when an additional write is added, you want to push out
             | the timer. But arming the timer requires reading the
             | current time among other things and at rates of 10-20Mhz
             | and up reading the current wall clock gets expensive. Even
             | rdtsc approaches start to struggle at 20-40Mhz. You
             | obviously don't want to do it on every write but you want
             | to make sure that you never actually trigger the timer if
             | you're producing data at a relatively fast enough clip to
             | otherwise fill the buffer within a reasonable time.
             | 
             | Source: I implemented write coalescing in my nosql database
             | that can operate at a few gigahertz for 8 byte writes/s
             | into an in memory buffer. Once the buffer is full or a
             | timeout occurs, a flush to disk is triggered and I net out
             | at around 100M writes/s (sorting the data for the LSM is
             | one of the main bottlenecks). By comparison DBs like
             | RocksDB can do ~2M writes/s and SQLite can do ~800k.
        
               | Veserv wrote:
               | You are not meaningfully sacrificing throughput because
               | the timeout only occurs when you are not writing enough
               | data; you have no throughput to sacrifice. The threshold
               | and timeout should be chosen such that high throughput
               | cases hit the threshold, not the timeout. The timeout
               | exists to bound the worst-case latency of low access
               | throughput.
               | 
               | You only lose throughput in proportion to the handling
               | cost of a single potentially spurious timeout/timeout
               | clear per timeout duration. You should then tune your
               | buffering and threshold to cap that at a acceptable
               | overhead.
               | 
               | You should only really have a problem if you want both
               | high throughput and low latency at which point general
               | solutions are probably not not fit for your use case, but
               | you should remain aware of the general principle.
        
               | vlovich123 wrote:
               | > You should only really have a problem if you want both
               | high throughput and low latency at which point general
               | solutions are probably not not fit for your use case, but
               | you should remain aware of the general principle.
               | 
               | Yes you've accurately summarized the end goal. Generally
               | people want high throughput AND low latency, not to just
               | cap the maximum latency.
               | 
               | The one shot timer approach only solves a livelock risk.
               | I'll also note that your throughput does actually drop at
               | the same time as the latency spike because your buffer
               | stays the same size but you took longer to flush to disk.
               | 
               | Tuning correctly turns out to be really difficult to
               | accomplish in practice which is why you really want self
               | healing/self adapting systems that behave consistently
               | across all hardware and environments.
        
       | BoingBoomTschak wrote:
       | Also made a post some time ago about the issue: https://world-
       | playground-deceit.net/blog/2024/09/bourne_shel...
       | 
       | About the commands that don't buffer, this is either
       | implementation dependent or even wrong in the case of cat (cf
       | https://pubs.opengroup.org/onlinepubs/9799919799/utilities/c...
       | and `-u`). Massive pain that POSIX never included an official way
       | to manage this.
       | 
       | Not mentioned is input buffering, that would gives you this
       | strange result:                 $ seq 5 | { v1=$(head -1);
       | v2=$(head -1); printf '%s=%s\n' v1 "$v1" v2 "$v2"; }       v1=1
       | v2=
       | 
       | The fix is to use `stdbuf -i0 head -1`, in this case.
        
         | jagrsw wrote:
         | I don't believe a process reading from a
         | pipe/socketpair/whatever can enforce such constraints on a
         | writing process (except using heavy hackery like ptrace()).
         | While it might be possible to adjust the pipe buffer size, I'm
         | not aware of any convention requiring standard C I/O to respect
         | this.
         | 
         | In any case, stdbuf doesn't seem to help with this:
         | $ ./a | stdbuf -i0 -- cat            #include <stdio.h>
         | #include <unistd.h>       int main(void) {        for (;;) {
         | printf("n");         usleep(100000);        }       }
        
           | BoingBoomTschak wrote:
           | I'm sorry, but I don't understand what you're meaning. The
           | issue in your example is the output buffering of a, not the
           | input buffering of cat. You'd need `stdbuf -o0 ./a | cat`
           | there.
        
       | Rygian wrote:
       | Learned two things: `unbuffer` exists, and "unnecessary" cats are
       | just fine :-)
        
         | wrsh07 wrote:
         | I like unnecessary cat because it makes the rest of the pipe
         | reusable across other commands
         | 
         | Eg if I want to test out my greps on a static file and then
         | switch to grepping based on a tail -f command
        
           | chatmasta wrote:
           | Yep. I use unnecessary cats when I'm using the shell
           | interactively, and especially when I'm building up some
           | complex pipeline of commands by figuring out how to do each
           | step before moving onto the next.
           | 
           | Once I have the final command, if I'm moving it into a shell
           | script, then _maybe_ I'll switch to file redirection.
        
       | Joker_vD wrote:
       | > I think this problem is probably unavoidable - I spent a little
       | time with strace to see how this works and grep receives the
       | SIGINT before tcpdump anyway so even if tcpdump tried to flush
       | its buffer grep would already be dead.
       | 
       | I believe quite a few utilities actually _do_ try to flush their
       | stdout on receiving SIGINT... but as you 've said, the other side
       | of the pipe may also very well have received a SIGINT, and nobody
       | does a short-timed wait on stdin on SIGINT: after all, the whole
       | reason you've been sent SIGINT is because the user wants your
       | program to stop working _now_.
        
       | mpbart wrote:
       | Wow I had no idea this behavior existed. Now I'm wondering how
       | much time I've wasted trying to figure out why my pipelined greps
       | don't show correct output
        
       | why-el wrote:
       | Love it.
       | 
       | > this post is only about buffering that happens inside the
       | program, your operating system's TTY driver also does a little
       | bit of buffering sometimes
       | 
       | and if the TTY is remote, so do the network switches! it's
       | buffering all the way down.
        
       | BeefWellington wrote:
       | AFAIK, signal order generally propagates backwards, so the last
       | command run will always receive the signal first, provided it is
       | a foreground command.
       | 
       | But also, the example is not a great one; grepping tcpdump output
       | doesn't make sense given its extensive and well-documented
       | expression syntax. It's obviously just used as an example here to
       | demonstrate buffering.
        
         | toast0 wrote:
         | > grepping tcpdump output doesn't make sense given its
         | extensive and well-documented expression syntax.
         | 
         | I dunno. If doesn't make sense in the world where everyone
         | makes the most efficient pipelines for what they want; but in
         | that world, they also always remember to use --line-buffered on
         | grep when needed, and the line buffered output option for
         | tcpdump.
         | 
         | In reality, for a short term thing, grepping on the grepable
         | parts of the output can be easier than reviewing the docs to
         | get the right filter to do what you really want. Ex, if you're
         | dumping http requests and you want to see only lines that match
         | some url, you can use grep. Might not catch everything, but
         | usually I don't need to see everything.
        
         | Joker_vD wrote:
         | > grepping tcpdump output doesn't make sense given its
         | extensive and well-documented expression syntax
         | 
         | Well. Personally, every time I've tried to learn its expression
         | syntax from its extensive documentation my eyes would start to
         | glaze over after about 60 seconds; so I just stick with grep --
         | at worst, I have to put the forgotten "-E" in front of the
         | pattern and re-run the command.
         | 
         | By the way, and slightly off-tangent: if anyone ever wanted
         | grep to output only some part of the captured pattern, like -o
         | but only for the part inside the parentheses, then one way to
         | do it is to use a wrapper like this:
         | #!/bin/sh -e              GREP_PATTERN="$1"
         | SED_PATTERN="$(printf '%s\n' "$GREP_PATTERN" | sed
         | 's;/;\\/;g')"         shift              grep -E
         | "$GREP_PATTERN" --line-buffered "$@" | sed -r
         | 's/^.*'"$SED_PATTERN"'.*$/\1/g'
         | 
         | Not the most efficient way, I imagine, but it works fine for my
         | use cases (in which I never need more than one capturing group
         | anyway). Example invocation:                   $ xgrep
         | '(^[^:]+):.*:/nonexistent:' /etc/passwd         nobody
         | messagebus         _apt         tcpdump         whoopsie
        
           | chatmasta wrote:
           | ChatGPT has eliminated this class of problem for me. In fact
           | it's pretty much all I use it for. Whether it's ffmpeg,
           | tcpdump, imagemagick, SSH tunnels, Pandas, numpy, or some
           | other esoteric program with its own DSL... ChatGPT can
           | construct the arguments I need. And if it gets it wrong, it's
           | usually one prompt away from fixing it.
        
       | Thaxll wrote:
       | TTY, console, shell, stdin/out, buffer, pipe, I wish there was a
       | clear explanation somewhere of how all of those are glue/work
       | together.
        
         | MathMonkeyMan wrote:
         | Here's a resource for at least the first one:
         | https://www.linusakesson.net/programming/tty/
        
       | toast0 wrote:
       | > when you press Ctrl-C on a pipe, the contents of the buffer are
       | lost
       | 
       | I _think_ most programs will flush their buffers on SIGINT... But
       | for that to work from a shell, you 'd need to deliver SIGINT to
       | only the first program in the pipeline, and I guess that's not
       | how that works.
        
         | akdev1l wrote:
         | The last process gets sigint and everything else gets sigpipe
         | iirc
        
           | toast0 wrote:
           | That makes sense to me, but the article implied everything
           | got a sigint, but the last program got it first. Eitherway,
           | you'd need a different way to ask the shell to do it the
           | otherway...
           | 
           | Otoh, do programs routinely flush if they get SIGINFO? dd(1)
           | on FreeBSD will output progress if you hit it with SIGINFO
           | and continue it's work, which you can trigger with ctrl+T if
           | you haven't set it differently. But that probably goes to the
           | foreground process, so probably doesn't help. And, there's
           | the whole thing where SIGINFO isn't POSIX and isn't really in
           | Linux, so it's hard to use there...
           | 
           | This article [1] says tcpdump will output the packet counts,
           | so it _might_ also flush buffers, I 'll try to check and
           | report a little later today.
           | 
           | [1] https://freebsdfoundation.org/wp-
           | content/uploads/2017/10/SIG...
        
             | toast0 wrote:
             | > This article [1] says tcpdump will output the packet
             | counts, so it might also flush buffers, I'll try to check
             | and report a little later today.
             | 
             | I checked, tcpdump doesn't seem to flush stdout on siginfo,
             | and hitting ctrl+T doesn't deliver it a siginfo in the
             | tcpdump | grep case anyway. Killing tcpdump with sigint
             | does work: tcpdump's output is flushed and it closes, and
             | then the grep finishes too, but there's not a button to hit
             | for that.
        
           | tolciho wrote:
           | No, INTR "generates a SIGINT signal which is sent to all
           | processes in the foreground process group for which the
           | terminal is the controlling terminal" (termios(4) on OpenBSD,
           | other what passes for unix these days are similar), as
           | complicated by what exactly is in the foreground process
           | group (use tcgetpgrp(3) to determine that) and what signal
           | masking or handlers those processes have (which can vary over
           | the lifetime of a process, especially for a shell that does
           | job control), or whether some process has disabled ISIG--the
           | terminal being shared "global" state between one or more
           | processes--in which case none of the prior may apply.
           | $ make pa re ci       cc -O2 -pipe    -o pa pa.c       cc -O2
           | -pipe    -o re re.c       cc -O2 -pipe    -o ci ci.c       $
           | ./pa | ./re | ./ci > /dev/null       ^Cci (2) 66241 55611
           | 55611       pa (2) 55611 55611 55611       re (2) 63366 55611
           | 55611
           | 
           | So with "pa" program that prints "y" to stdout, and "re" and
           | "ci" that are basically cat(1) except that these programs all
           | print some diagnostic information and then exit when a
           | SIGPIPE or SIGINT is received, here showing that (on OpenBSD,
           | with ksh, at least) a SIGINT is sent to each process in the
           | foreground process group (55611, also being logged is the
           | getpgrp which is also 55611).                 $ kill -l |
           | grep INT        2    INT Interrupt                     18
           | TSTP Suspended
        
       | two_handfuls wrote:
       | Maybe that's why my mbp sometimes appears not to see my keyboard
       | input for a whole second even though nothing much is running.
        
       | mg wrote:
       | Related: Why pipes can be indeterministic.
       | 
       | https://www.gibney.org/the_output_of_linux_pipes_can_be_inde...
        
       | calibas wrote:
       | Buffers are there for good reason, it's extremely slow
       | (relatively speaking) to print output on a screen compared to
       | just writing it to a buffer. Printing something character-by-
       | character is incredibly inefficient.
       | 
       | This is an old problem, I encounter it often when working with
       | UART, and there's a variety of possible solutions:
       | 
       | Use a special character, like a new line, to signal the end of
       | output (line-based).
       | 
       | Use a length-based approach, such as waiting for 8KB of data.
       | 
       | Use a time-based approach, and print the output every X
       | milliseconds.
       | 
       | Each approach has its own strengths and weaknesses, depends upon
       | the application which one works best. I believe the article is
       | incorrect when mentioning certain programs that don't use
       | buffering, they just don't use an obvious length-based approach.
        
         | PhilipRoman wrote:
         | Also, it's not just the work needed to actually handle the
         | write on the backend - even just making that many syscalls to
         | /dev/null can kill your performance.
        
         | qazxcvbnmlp wrote:
         | Having a layer or two above the interface aware of the
         | constraint works the best (when possible). Line based approach
         | does this but requires agreement on the character (new line).
        
           | akira2501 wrote:
           | Which is exactly why setbuf(3) and setvbuf(3) exists.
        
       | pixelbeat wrote:
       | Nice article. See also:
       | https://www.pixelbeat.org/programming/stdio_buffering/
       | 
       | It's also worth mentioning a recent improvement we made (in
       | coreutils 8.28) to the operation of the `tail | grep` example in
       | the article. tail now notices if the pipe goes away, so one could
       | wait for something to appear in a log, like:
       | tail -f /log/file | grep -q match         then_do_something
       | 
       | There are lots of gotchas to pipe handling really. See also:
       | https://www.pixelbeat.org/programming/sigpipe_handling.html
        
       | emcell wrote:
       | one of the reasons why i hate computers :D
        
       | jakub_g wrote:
       | In our CI we used to have some ruby commands that were piped to
       | prepend "HH:MM:SS" to each line to track progress (because GitLab
       | still doesn't support this out of the box, though it's supposed
       | to land in 17.0), but it would sometimes lead to some logs being
       | flushed with a large delay.
       | 
       | I knew it had something to do with buffers and it drove me nuts,
       | but couldn't find a fix, all solutions tried didn't really work.
       | 
       | (Problem got solved when we got rid of ruby in CI - it was
       | legacy).
        
       | josephcsible wrote:
       | I've ran into this before, and I've always wondered why programs
       | don't just do this: when data gets added to a previously-empty
       | output buffer, make the input non-blocking, and whenever a read
       | comes back with EWOULDBLOCK, flush the output buffer and make the
       | input blocking again. (Or in other words, always make sure the
       | output buffer is flushed before waiting/going to sleep.) Wouldn't
       | this fix the problem? Would it have any negative side effects?
        
       | kreetx wrote:
       | From experience, `unbuffer` is the tool to use to turn buffering
       | off reliably.
        
       | radarsat1 wrote:
       | Side note maybe, but is there an alternative to chaining two
       | greps?
        
         | ykonstant wrote:
         | The most portable way to do it with minimal overhead is with
         | sed:                 sed -e '/pattern1/!d' -e '/pattern2/!d'
         | 
         | which generalizes to more terms. Easier to remember and just as
         | portable is                 awk '/pattern1/ && /pattern2/'
         | 
         | but now you need to launch a full awk.
         | 
         | For more ways see
         | https://unix.stackexchange.com/questions/55359/how-to-run-gr...
        
       | SG- wrote:
       | how important is buffering in 2024 on modern ultra fast single
       | user systems I wonder? I'd be interested in seeing it disabled
       | for testing purposes.
        
         | chatmasta wrote:
         | It depends what the consumer is doing with the data as it exits
         | the buffer. If it's a terminal program printing every
         | character, then it's going to be slow. Or more generally if
         | it's any program that doesn't have its own buffering, then it
         | will become the bottleneck so the slowdown will depend on how
         | it processes input.
         | 
         | Ultimately even "no buffer" still has a buffer, which is the
         | number of bits it reads at a time. Maybe that's 1, or 64, but
         | it still needs some boundary between iterations.
        
         | akira2501 wrote:
         | > on modern ultra fast single user systems I wonder?
         | 
         | The latency of a 'syscall' is on the order of a few hundred
         | instructions. You're switching to a different privilege mode,
         | with a different memory map, and where your data ultimately has
         | to leave the chip to reach hardware.
         | 
         | It's absurdly important and it will never not be.
        
       | londons_explore wrote:
       | I'd like all buffers to be flushed whenever the systemwide CPU
       | becomes idle.
       | 
       | Buffering generally is a CPU-saving technique. If we had infinite
       | CPU, all buffers would be 1 byte. Buffers are a way of collecting
       | together data to process in a batch for efficiency.
       | 
       | However, when the CPU becomes idle, we shouldn't have any work
       | "waiting to be done". As soon as the kernel scheduler becomes
       | idle, all processes should be sent a "flush your buffers" signal.
        
       | frogulis wrote:
       | Hopefully not a silly question: in the original example, even if
       | we had enough log data coming from `tail` to fill up the first
       | `grep` buffer, if the logfile ever stopped being updated, then
       | there would likely be "stragglers" left in the `grep` buffer that
       | were never outputted, right?
        
       ___________________________________________________________________
       (page generated 2024-11-29 23:00 UTC)